Multimedia Databases Multimedia Databases
Wolf-Tilo Balke Silviu Homoceanu
Institut für Informationssysteme
Technische Universität Braunschweig
http://www.ifis.cs.tu-bs.de
3 Using Texture for Image Retrieval
3.1 Textures, basics
3.2 Low-Level Features 3.3 High-Level Features
3 Using Textures for Image Retrieval
• Textures describe the nature of typical, recurrent patterns in pictures
• Important for the description of images
3.1 Texture Analysis
• Important for the description of images
– Type of representation (raster image, etc.) – Image objects
• Natural things: grass, gravel, etc.
• Artificial things: stone walls, wallpaper, etc.
• Various ordered and random textures
3.1 Example
• Texture segmentation
– Find areas of the image (decomposition) with homogeneous textures
• Texture classification
3.1 Texture Research
– Describe and denote homogeneous textures in image regions
• Texture synthesis
– Create textures for increased realism in images
(texture mapping, etc.)
• Find image regions with a certain texture
– Here: wine grapes
– Used for scene decomposition
3.1 Texture Segmentation
• Colors and texture are often related
3.1 Texture Segmentation
– Texture decomposition alone often provides no
meaningful (semantically related) areas
• Extract the (segmented) regions with a certain predominant texture
– Medical images: tomography, etc.
– Satellite images: ice, water, etc.
3.1 Texture Classification
– ...
• Describe the corresponding texture with
appropriate features, suitable for comparisons in
similarity search queries
• Classification can be semantic
– Textures represent objects in the real world – Strongly dependent on the application
• Or based on purely descriptive characteristics
3.1 Texture Classification
• Or based on purely descriptive characteristics
– Usually it has no direct significance for people – Ensures comparability between different image
collections
– Query-by-Example
• Example: Satellite image (semantically)
3.1 Texture Classification
Sand
Water
• How to describe textures for similarity measures?
– Low-level features use basic building blocks (e.g., Julesz‘ Textons), the Tamura-measure, etc.
– High-level features use Gabor-Filters, Fourier- Transformations, etc.
3.1 Texture Features
Transformations, etc.
• How do people distinguish textures?
3.1 Texture Features
• (Rao / Lohse, 1993) give three main criteria:
– Repetition – Orientation – Complexity
3.1 Low Level Texture Features
Complexity
• Is this measurable?
– Grey level features
– The Tamura measure
– Random field models
• 60ies and 70ies: investigating texture meant mainly grey-level analysis
– Grey value histograms provide information on pixel intensity
– Allows comparison using statistical
3.2 Grey Level Features
– Allows comparison using statistical measures like expected value,
standard deviation, etc.
– Idea: Similar patterns produce
similar distributions of grey values
• Moments of the first order do not consider the position of the pixels
– Periodicity poorly detectable
3.2 Grey Level Features
• Solution: Grey-level co-occurrence
– Pixel at position s has intensity q: I(s) = q
– (Julesz, 1961): Calculate the empirical probability distribution for the intensity change of the value m at pixel shift with d pixels to the right:
3.2 Grey Level Features
at pixel shift with d pixels to the right:
• (Julesz, 1975): Generalization to shifts in any direction
– As two-dimensional distribution function (for every d)
use the Grey-level co-occurrence matrix
• Grey-level co-occurrence matrix
– Consider all pixel pairs (x 1 , y 1 ), (x 2 , y 2 ) with Euclidean distance d and assume point (x 1 , y 1 ) has grey value i, and point (x 2 , y 2 ) has grey value j, for i, j ∈ {1, …, N}
– Define C = [c (i, j)] as grey-level co-occurrence
3.2 Grey Level Features
∈
– Define C d = [c d (i, j)] as grey-level co-occurrence
matrix, where c d (i, j) is the number of pixel pairs, which have distance d and intensity i, respectively j – Problem: rather complicated to calculate all (N x N) -
matrices for different distances d
• Many measures for texture recognition were derived from these grey-level co-occurrence
matrices
– Thesis of Julesz (Julesz and others, 1973):
People can not distinguish textures,
3.2 Grey Level Features
People can not distinguish textures, if they have identical grey-level
co-occurrence matrices
– Perception psychology: nope…
But still useful as a rule of thumb!
(Julesz, 1981)
Bela Julesz
• In the Tamura Measure (Tamura and others, 1978) image textures are evaluated along six different
dimensions
– Granularity (coarseness): gravel vs. sand – Contrast: clear-cut shapes, shadows
3.2 The Tamura Measure
– Contrast: clear-cut shapes, shadows – Directionality: predominant directions – Line-Likeness
– Regularity – Roughness
• The last three properties are rarely used and
appear to be correlated to the others
• Granularity (coarseness)
– Image resolution: e.g., aerial photographs from different heights
3.2 Granularity
• Examine the neighborhood of each pixel for brightness changes
– Lay over each pixel, a window of size
2 i x 2 i (e.g., 1 x 1 to 32 x 32 in IBM's QBIC)
– Determine for each i and each pixel, the average gray
3.2 Granularity Extraction
– Determine for each i and each pixel, the average gray
level in the corresponding window
• Compute δ δ δ δ iiii = = max = = max max max((((δ δ δ δ iiii h h h h , , , , δ δ δ δ iiii v v v v ) ) ) ) for each pixel
– δ i h is the difference of means of gray levels belonging to the left and right horizontally adjacent windows (of size 2 i x 2 i )
– δ i v analogous, between the vertically adjacent windows
3.2 Granularity Extraction
– δ i v analogous, between the vertically adjacent windows
• Determine for each pixel, the maximum window
size 2 j x 2 j , whose δ j has the maximum difference
(or which lies within a certain tolerance from the
maximum of δ i )
• The granularity of the entire image, is the mean of the maximum window sizes of all pixels
• Alternatively a histogram which maps the
number of pixels corresponding to each window
3.2 Granularity Extraction
number of pixels corresponding to each window can be used
– This allows for better comparison between images
containing different granularities
• Problem: image selections, whose granularity needs to be determined, may be too small to calculate meaningful averages in large operator windows
3.2 Granularity Extraction
– Small image sections would therefore always have small granularity
– Solution: estimation of a maximum δ i from the smaller
values (Equitz / Niblack, 1994)
• Contrast evaluates the clarity of an image
– Sharpness of the color transitions – Exposure, shadows
3.2 Contrast
Low Contrast High Contrast
• Extraction of the contrast values
– Consider higher moments of the distribution of gray-level histogram – The contrast is
3.2 Contrast Extraction
– Where σ is the standard deviation of the image collection and α 4 is the kurtosis
and µ 4 as the fourth central moment
– Uni- and bi-modal distributions can be differentiated
through the use of the kurtosis
• Directionality
– Senses predominant directions of elements in the image
3.2 Directionality
Highly directional Weak directional
• As a measure determine the magnitude and direction (angle) of the gradient in each pixel, e.g., with a Sobel edge detector
– IBM's QBIC uses 16 different directions
3.2 Directionality Extraction
• Create histograms, where each angle is assigned the number of pixels with gradients above a certain threshold
3.2 Directionality Extraction
– A dominant direction in the image is represented by a peak in the histogram
– If the measure has to be rotation invariant, then do
not use the location (angle), but the number and
amplitude of such peaks for the calculation of the
average directionality D
• No correlations were found between the first three Tamura features
– Similarity can be implemented as distance measurement in a three-dimensional space:
3.2 Tamura-Measure Matching
G G
G
• Pattern (fur) on heraldic images
3.2 Example
• Random-Field Models
– Observation: textures are periodically repeated – Generating textures requires stochastic models
• Provides ability to predict the brightness of a pixel in some image sample
3.2 Using Stochastic Models
image sample
• Includes probability that a pixel has a certain brightness value
– A fixed model creates different, but still very similar
textures
• The same trick (reversed) can be used for the texture description and matching, too
– Which model (parameter) generates the presented textures the best?
– Assuming a fixed model has created all the textures in
3.2 Random-Field Models
– Assuming a fixed model has created all the textures in
the collection, the parameters of the model serve as
descriptive features for each image
• Assume the same model X has generated all textures in images of a collection
• Given some pixel and its surroundings: What is the expected intensity value?
3.2 Example
the expected intensity value?
– Obviously different for different images, therefore model X has different parameters for each image
shaded irregular
• Each image can be seen as an observation
– Described by a matrix F, where values in the matrix correspond to pixel intensities
• Probabilistic model
– Matrix F is a random variable (called a random field) – The basic distribution of class F is known, but the specific
3.2 Random Field Approach
F (
– The basic distribution of class F is known, but the specific parameters of the distribution are not
– Question: we have an image and assume that it is an implementation of F. What are the parameters for the corresponding distribution of F?
• Idea: Perform a maximum likelihood estimation and
describe each image by the estimated parameters
• What does the expected intensity of a pixel depend on?
– For "sufficiently regular" textures the following locality statement is valid:
3.2 Exploiting Locality
„If the neighbors to the left
and right are white and the
up and down neighbors are
black, then the pixel under
the red square is with a high
probability also white”
• We can usually assume that the value of a pixel s, does not depend on the value of all pixels in the image, but on the pixels in the neighborhood of s
– This is called the Markov property
• Formalization
F(r) r N s
s
3.2 Markov Property
s
• Formalization
– Let F(r) be the (random) value of the pixel r and N s the set of pixels in the neighborhood of pixel s
– For each pixel s (and color values k, k (0, 0) , k (0, 1) , ...):
P[F(s) = k | for all r ≠ s: pixel r has value k r ]
=: P[F(s) = k | for all r ≠ s, r ∈ N , r ∈ N , r ∈ N , r ∈ N ssss : pixel r has value k r ]
– Thus, the probabilities of all values of pixel s depend only
• Now, a model must be defined, which best reproduces the observed distribution
– There are many classes of texture models
• We must fix a common model for each collection
3.2 Choosing Texture Models
• We must fix a common model for each collection and then calculate the best parameters for
each image of the collection
– Simplification: the neighborhood N s is defined by a set N of shifts: N s = {s + t | t ∈ N }
– Generalization: N := {(0, 1), (1, 0), (0, –1), (–1, 0),
(1, 1), (1, –1),(–1, –1), (–1, 1)}
• A popular class of models for texture description is the Simultaneous AutoRegressive model (SAR):
F(s) s
W(s)
3.2 Choosing Texture Models
– F(s) is the intensity value of pixel s
– W(s) is a special random variable reflecting white noise with mean 0 and variance 1
• θ(t) and β are characteristic parameters
and are used as features for later matching
• Still, there is a problem: the best size of the neighborhood of a pixel is different for different periodicities of textures
– Unfortunately, the solution is not trivial
1992
3.2 Choosing Texture Models
– On possibility are multi-resolution simultaneous
autoregressive models (Mao and Jain, 1992)
• Random-field models provide a good low- dimensional description of textures
• Assumptions
– The Markov condition is valid, i.e. the intensity of
3.2 Random Field Models
– The Markov condition is valid, i.e. the intensity of each pixel is described with sufficient accuracy by its neighborhood
– The size of the neighborhood has been well chosen
for the collection
• Transform domain features
– In the case of low-level features, descriptors are
chosen for certain aspects such as the coarseness or contrast
– High-level features describe the complete picture
3.3 Transform Domain Features
– High-level features describe the complete picture in a different domain (no loss of information)
– Basically, the image is interpreted as a signal and
transformed mathematically
• Well known for images: Fourier transform
3.3 Transform Domain Features
• Idea: by transforming to another representation gain information - “see other things“
local space frequency space
• Typical features used in the texture analysis are
– Discrete Fourier Transform (DFT) – Discrete Cosine Transform (DCT) – Wavelet Transform (WT)
3.3 Transform Domain Features
• A transform is the conversion of a mathematical object into a different representation
– Transforms are reversible and information preserving
3.3 Transforms
• E.g.: a straight line can be described by...
– Two arbitrary points
– A point and the gradient
• Both give the same information
• More general result from algebra:
– For any set of n points (x 0 , y 0 ), …, (x n–1 , y n–1 ) in ℜ 2 there is exactly one polynom of degree n-1, which passes through all of these points
• This polynom can thus be represented as ...
(x , y ), …,(x , y )
3.3 Example: Polynomial interpolation
n-1
• This polynom can thus be represented as ...
– The set of points (x 0 , y 0 ), …,(x n – 1 , y n – 1 ) – The equation of the polynom
for all k = 0, ..., n-1 with suitable a a a a 0 0 0 0 , a , a , a , a 1 1 1 1 , …,a , …,a , …,a , …,a n n n n – – – – 1 1 1 1
• Special case: x 0 := 0, x 1 := 1, …, x n–1 := n–1
– That means an equidistant sampling over the x-axis – Exactly the case when reading intensity values of
some image row by row
y 0 , y 1 ,... , y n-1 3.3 Example: Polynomial interpolation
• Then, any sequence of real numbers y 0 , y 1 ,... , y n-1 can be transformed into a sequence of
coefficients a 0 , a 1 , ..., a n-1 where
is valid for all k := 0, …, n – 1
• Idea: an image is a discrete function which assigns each pixel ( x , y ) with an intensity I(x, y)
– In the case of color images, an intensity is assigned to each color channel (RGB, HSV, ...)
– Therefore, each row of an image can be interpreted as a
3.3 Images as Signals
– Therefore, each row of an image can be interpreted as a sequence of real numbers
• As seen, these rows can be transformed into the polynomial coefficients presentation form
– This representation is not suitable for texture description because although textures exhibit periodic grey value
variation polynoms, polynoms are not periodic
• Solution by Jean Fourier (1768-1830):
„Every sequence y 0, y 1, ..., y n-1 of real numbers can be transformed into a sequence of
coefficients a 0 , a 1 , …, a ⌊ n/2 ⌋ , b 0 , b 1 , …, b ⌊ n/2 ⌋ with
for k=0, …, n-1
3.3 Discrete Fourier Transform
a 0 , a 1 , …, a ⌊ n/2 ⌋ , b 0 , b 1 , …, b ⌊ n/2 ⌋ with
for k=0, …, n-1”
– This sequence can also be described, by the overlap of harmonic oscillations
– The coefficients are typical for periodic patterns
• Real space: representation as y 0, y 1, ..., y n-1
• Frequency space: representation as a 0 , a 1 , …, a ⌊ n/2 ⌋ , b 0 , b 1 , …, b ⌊ n/2 ⌋
• The discrete Fourier transformation can also be
3.3 DFT
a ⌊ n/2 ⌋ , b 0 , b 1 , …, b ⌊ n/2 ⌋
• The discrete Fourier transformation can also be generalized to two-dimensional data in real space (e.g., pixel coordinates)
– We then have sine and cosine waves, in the frequency
domain, each with a direction
• For each i=0, ..., w-1 and each j=0, ..., h-1 there is an oscillation of the form shown below.
Parameters i and j indicate the direction and wavelength of the oscillation
3.3 Two-dimensional DFT (formal)
i j
• w: width of the image
• h: height of the image
• f(x, y): intensity of pixel (x, y)
• Coefficients A(i, j) and B(i, j) indicate the
amplitude of the corresponding cosine and sine waves of the (i, j) parameter pair
3.3 Two-dimensional DFT (formal)
waves of the (i, j) parameter pair
• They can be calculated as follows:
• For graphic illustration of the amplitudes
instead of the two matrices A and B, a picture can be used
– The frequency space picture is derived as follows:
the value at position (i, j) represents the length of (A(i, j), B(i, j))
3.3 Two-dimensional DFT
the value at position (i, j) represents the length of vector (A(i, j), B(i, j))
• This length defines the Fourier spectrum
– i.e. how much of which frequency is contained in
the signal
• Horizontal frequencies are plotted horizontally in the frequency image
3.3 Frequency Space
– Vertical frequencies
are plotted vertically
• High/low periodicity in the real world is also reflected in feature space
3.3 Frequency Space
• Properties
– Symmetrically towards the origin – Harmonics
– Main oscillation
3.3 Frequency Space
Main oscillation – Brightness reflects
amplitude (strength)
– Size of periodicity
• How to compute DFTs?
– Fast Fourier Transform is an efficient algorithm class for computing DFT
– …DFT complexity is O(N 2 )
3.3 FFT
O(N 2 )
– FFT implementations
• Cooley-Tukey algorithm, Prime Factor FFT, Bruun’s FFT, …
• Cooley-Tukey algorithm
– Based on divide-and-conquer paradigm – Recursively expresses N=N 1 *N 2
– Reduces the complexity O(N*log N)
3.3 FFT
N=N 1 *N 2 Reduces the complexity
of calculating DFT to O(N*log 2 N)
• FFT in Matlab
• FFT in Matlab is easy to compute
3.3 FFT
3.3 Examples
Remove noise by masking pixels Lossy image
compression
© John M. Brayer, University of New Mexico
• The Discrete Cosine Transform (DCT) works analogously to DFT, only using cosine functions
– E.g., used in the encoding of JPEG images for compression purposes
3.3 Discrete Cosine Transform
• In the case of DFT and DCT the power spectrum (i.e. the coefficients) are used for comparisons
• Little problem: the spectrum produced by Fourier
transform shows all contained frequencies, but not
when (or where in the image) they occur
• Wavelet transforms approximate the intensity function through a different class of functions
– Approximation of the intensity function using a local base function (mother wavelet) in different resolutions and shifts
3.3 Wavelet Transform
and shifts
– Wavelets are thus local by frequency (through scaling) and time (through shifts)
– This solves the locality problem of DFT/DCT
• The function classes are locally integrable functions with integral = 0
3.3 Wavelet Transform
• Having some wavelet Ψ(x) we can generate a base B through appropriate shifting and scaling
3.3 Wavelet Transform
– Usually special values for the wavelet basis are
considered: a = 2 –j and b = k· 2 -j for integers j and k
– These values are called “critical sampling”
• The most simple example: the Haar wavelet
– Definition: 1 on [0, ½) and –1 on [½, 1)
– The corresponding functions
3.3 Wavelet Transform
– The corresponding functions form an orthogonal basis
in L 2 (ℜ) (all quadratically integrable functions)
– Can be made orthonormal
by a factor of 2 j/2 graph of (mother wavelet) Ψ
0, 0( x )
• Baby-Wavelets: Ψ j, k (x)
3.3 Wavelet Transform
Ψ 1, 0 ( x ) Ψ 1, 1 ( x )
Scaled by a factor of 2 j Shifted by k ·2
–j
Ψ 1, 0 ( x ) Ψ 1, 1 ( x )
Ψ 2, 0 ( x ) Ψ 2, 1 ( x ) Ψ 2, 2 ( x ) Ψ 2, 3 ( x )
The smaller the scale the more shifts
(exponentially)
• The base can also be represented using a scaling function φ 0,0
• For Haar wavelets, the scaling function is the
2 , y = { y , …, y }
3.3 Wavelet Transform
• For Haar wavelets, the scaling function is the characteristic function of the interval [0, 1)
– Each data set of cardinality 2 n , y = { y 0 , …, y 2 n – 1 }
can then be represented on [0, 1) by a piecewise
continuous function:
• Our intensity values for image rows are basically discrete step functions
– Since step functions are finite, they can be expressed through the scaling function and Haar wavelets
3.3 Wavelet Transform
• Example: Describe the step function given by y = (1, 0, -3, 2, 1, 0, 1, 2)
3.3 Wavelet Transform
• Resolution
e.g., j = 0, 1, 2 2 0,5·j = e.g., j = 0, 1, 2
• Base with orthonormalization factor 2 0,5·j =
{1, 2 1/2 , 2} for {d 0,k , d 1,k , d 2,k }
3.3 Wavelet Transform
Scaling function
Mother Wavelet
Baby-Wavelets (
j
= 1)Baby-Wavelets (