• Keine Ergebnisse gefunden

Multimedia Databases Multimedia Databases

N/A
N/A
Protected

Academic year: 2021

Aktie "Multimedia Databases Multimedia Databases"

Copied!
72
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Multimedia Databases Multimedia Databases

Wolf-Tilo Balke Silviu Homoceanu

Institut für Informationssysteme

Technische Universität Braunschweig

http://www.ifis.cs.tu-bs.de

(2)

3 Using Texture for Image Retrieval

3.1 Textures, basics

3.2 Low-Level Features 3.3 High-Level Features

3 Using Textures for Image Retrieval

(3)

• Textures describe the nature of typical, recurrent patterns in pictures

• Important for the description of images

3.1 Texture Analysis

• Important for the description of images

– Type of representation (raster image, etc.) – Image objects

• Natural things: grass, gravel, etc.

• Artificial things: stone walls, wallpaper, etc.

(4)

• Various ordered and random textures

3.1 Example

(5)

• Texture segmentation

– Find areas of the image (decomposition) with homogeneous textures

• Texture classification

3.1 Texture Research

– Describe and denote homogeneous textures in image regions

• Texture synthesis

– Create textures for increased realism in images

(texture mapping, etc.)

(6)

• Find image regions with a certain texture

– Here: wine grapes

– Used for scene decomposition

3.1 Texture Segmentation

(7)

• Colors and texture are often related

3.1 Texture Segmentation

– Texture decomposition alone often provides no

meaningful (semantically related) areas

(8)

• Extract the (segmented) regions with a certain predominant texture

– Medical images: tomography, etc.

– Satellite images: ice, water, etc.

3.1 Texture Classification

– ...

• Describe the corresponding texture with

appropriate features, suitable for comparisons in

similarity search queries

(9)

• Classification can be semantic

– Textures represent objects in the real world – Strongly dependent on the application

• Or based on purely descriptive characteristics

3.1 Texture Classification

• Or based on purely descriptive characteristics

– Usually it has no direct significance for people – Ensures comparability between different image

collections

– Query-by-Example

(10)

• Example: Satellite image (semantically)

3.1 Texture Classification

Sand

Water

(11)

• How to describe textures for similarity measures?

– Low-level features use basic building blocks (e.g., Julesz‘ Textons), the Tamura-measure, etc.

– High-level features use Gabor-Filters, Fourier- Transformations, etc.

3.1 Texture Features

Transformations, etc.

(12)

• How do people distinguish textures?

3.1 Texture Features

(13)

• (Rao / Lohse, 1993) give three main criteria:

– Repetition – Orientation – Complexity

3.1 Low Level Texture Features

Complexity

• Is this measurable?

– Grey level features

– The Tamura measure

– Random field models

(14)

• 60ies and 70ies: investigating texture meant mainly grey-level analysis

– Grey value histograms provide information on pixel intensity

– Allows comparison using statistical

3.2 Grey Level Features

– Allows comparison using statistical measures like expected value,

standard deviation, etc.

– Idea: Similar patterns produce

similar distributions of grey values

(15)

• Moments of the first order do not consider the position of the pixels

– Periodicity poorly detectable

3.2 Grey Level Features

(16)

• Solution: Grey-level co-occurrence

– Pixel at position s has intensity q: I(s) = q

– (Julesz, 1961): Calculate the empirical probability distribution for the intensity change of the value m at pixel shift with d pixels to the right:

3.2 Grey Level Features

at pixel shift with d pixels to the right:

• (Julesz, 1975): Generalization to shifts in any direction

– As two-dimensional distribution function (for every d)

use the Grey-level co-occurrence matrix

(17)

• Grey-level co-occurrence matrix

– Consider all pixel pairs (x 1 , y 1 ), (x 2 , y 2 ) with Euclidean distance d and assume point (x 1 , y 1 ) has grey value i, and point (x 2 , y 2 ) has grey value j, for i, j ∈ {1, …, N}

– Define C = [c (i, j)] as grey-level co-occurrence

3.2 Grey Level Features

– Define C d = [c d (i, j)] as grey-level co-occurrence

matrix, where c d (i, j) is the number of pixel pairs, which have distance d and intensity i, respectively j – Problem: rather complicated to calculate all (N x N) -

matrices for different distances d

(18)

• Many measures for texture recognition were derived from these grey-level co-occurrence

matrices

– Thesis of Julesz (Julesz and others, 1973):

People can not distinguish textures,

3.2 Grey Level Features

People can not distinguish textures, if they have identical grey-level

co-occurrence matrices

– Perception psychology: nope…

But still useful as a rule of thumb!

(Julesz, 1981)

Bela Julesz

(19)

• In the Tamura Measure (Tamura and others, 1978) image textures are evaluated along six different

dimensions

– Granularity (coarseness): gravel vs. sand – Contrast: clear-cut shapes, shadows

3.2 The Tamura Measure

– Contrast: clear-cut shapes, shadows – Directionality: predominant directions – Line-Likeness

– Regularity – Roughness

• The last three properties are rarely used and

appear to be correlated to the others

(20)

• Granularity (coarseness)

– Image resolution: e.g., aerial photographs from different heights

3.2 Granularity

(21)

• Examine the neighborhood of each pixel for brightness changes

– Lay over each pixel, a window of size

2 i x 2 i (e.g., 1 x 1 to 32 x 32 in IBM's QBIC)

– Determine for each i and each pixel, the average gray

3.2 Granularity Extraction

– Determine for each i and each pixel, the average gray

level in the corresponding window

(22)

• Compute δ δ δ δ iiii = = max = = max max max((((δ δ δ δ iiii h h h h , , , , δ δ δ δ iiii v v v v ) ) ) ) for each pixel

– δ i h is the difference of means of gray levels belonging to the left and right horizontally adjacent windows (of size 2 i x 2 i )

– δ i v analogous, between the vertically adjacent windows

3.2 Granularity Extraction

– δ i v analogous, between the vertically adjacent windows

• Determine for each pixel, the maximum window

size 2 j x 2 j , whose δ j has the maximum difference

(or which lies within a certain tolerance from the

maximum of δ i )

(23)

• The granularity of the entire image, is the mean of the maximum window sizes of all pixels

• Alternatively a histogram which maps the

number of pixels corresponding to each window

3.2 Granularity Extraction

number of pixels corresponding to each window can be used

– This allows for better comparison between images

containing different granularities

(24)

• Problem: image selections, whose granularity needs to be determined, may be too small to calculate meaningful averages in large operator windows

3.2 Granularity Extraction

– Small image sections would therefore always have small granularity

– Solution: estimation of a maximum δ i from the smaller

values (Equitz / Niblack, 1994)

(25)

• Contrast evaluates the clarity of an image

– Sharpness of the color transitions – Exposure, shadows

3.2 Contrast

Low Contrast High Contrast

(26)

• Extraction of the contrast values

– Consider higher moments of the distribution of gray-level histogram – The contrast is

3.2 Contrast Extraction

– Where σ is the standard deviation of the image collection and α 4 is the kurtosis

and µ 4 as the fourth central moment

– Uni- and bi-modal distributions can be differentiated

through the use of the kurtosis

(27)

• Directionality

– Senses predominant directions of elements in the image

3.2 Directionality

Highly directional Weak directional

(28)

• As a measure determine the magnitude and direction (angle) of the gradient in each pixel, e.g., with a Sobel edge detector

– IBM's QBIC uses 16 different directions

3.2 Directionality Extraction

(29)

• Create histograms, where each angle is assigned the number of pixels with gradients above a certain threshold

3.2 Directionality Extraction

– A dominant direction in the image is represented by a peak in the histogram

– If the measure has to be rotation invariant, then do

not use the location (angle), but the number and

amplitude of such peaks for the calculation of the

average directionality D

(30)

• No correlations were found between the first three Tamura features

– Similarity can be implemented as distance measurement in a three-dimensional space:

3.2 Tamura-Measure Matching

G G

G

(31)

• Pattern (fur) on heraldic images

3.2 Example

(32)

• Random-Field Models

– Observation: textures are periodically repeated – Generating textures requires stochastic models

• Provides ability to predict the brightness of a pixel in some image sample

3.2 Using Stochastic Models

image sample

• Includes probability that a pixel has a certain brightness value

– A fixed model creates different, but still very similar

textures

(33)

• The same trick (reversed) can be used for the texture description and matching, too

– Which model (parameter) generates the presented textures the best?

– Assuming a fixed model has created all the textures in

3.2 Random-Field Models

– Assuming a fixed model has created all the textures in

the collection, the parameters of the model serve as

descriptive features for each image

(34)

• Assume the same model X has generated all textures in images of a collection

• Given some pixel and its surroundings: What is the expected intensity value?

3.2 Example

the expected intensity value?

– Obviously different for different images, therefore model X has different parameters for each image

shaded irregular

(35)

• Each image can be seen as an observation

– Described by a matrix F, where values in the matrix correspond to pixel intensities

• Probabilistic model

– Matrix F is a random variable (called a random field) – The basic distribution of class F is known, but the specific

3.2 Random Field Approach

F (

– The basic distribution of class F is known, but the specific parameters of the distribution are not

– Question: we have an image and assume that it is an implementation of F. What are the parameters for the corresponding distribution of F?

• Idea: Perform a maximum likelihood estimation and

describe each image by the estimated parameters

(36)

• What does the expected intensity of a pixel depend on?

– For "sufficiently regular" textures the following locality statement is valid:

3.2 Exploiting Locality

„If the neighbors to the left

and right are white and the

up and down neighbors are

black, then the pixel under

the red square is with a high

probability also white”

(37)

• We can usually assume that the value of a pixel s, does not depend on the value of all pixels in the image, but on the pixels in the neighborhood of s

– This is called the Markov property

• Formalization

F(r) r N s

s

3.2 Markov Property

s

• Formalization

– Let F(r) be the (random) value of the pixel r and N s the set of pixels in the neighborhood of pixel s

– For each pixel s (and color values k, k (0, 0) , k (0, 1) , ...):

P[F(s) = k | for all r ≠ s: pixel r has value k r ]

=: P[F(s) = k | for all r ≠ s, r ∈ N , r ∈ N , r ∈ N , r ∈ N ssss : pixel r has value k r ]

– Thus, the probabilities of all values of pixel s depend only

(38)

• Now, a model must be defined, which best reproduces the observed distribution

– There are many classes of texture models

• We must fix a common model for each collection

3.2 Choosing Texture Models

• We must fix a common model for each collection and then calculate the best parameters for

each image of the collection

– Simplification: the neighborhood N s is defined by a set N of shifts: N s = {s + t | t ∈ N }

– Generalization: N := {(0, 1), (1, 0), (0, –1), (–1, 0),

(1, 1), (1, –1),(–1, –1), (–1, 1)}

(39)

• A popular class of models for texture description is the Simultaneous AutoRegressive model (SAR):

F(s) s

W(s)

3.2 Choosing Texture Models

– F(s) is the intensity value of pixel s

– W(s) is a special random variable reflecting white noise with mean 0 and variance 1

• θ(t) and β are characteristic parameters

and are used as features for later matching

(40)

• Still, there is a problem: the best size of the neighborhood of a pixel is different for different periodicities of textures

– Unfortunately, the solution is not trivial

1992

3.2 Choosing Texture Models

– On possibility are multi-resolution simultaneous

autoregressive models (Mao and Jain, 1992)

(41)

• Random-field models provide a good low- dimensional description of textures

• Assumptions

– The Markov condition is valid, i.e. the intensity of

3.2 Random Field Models

– The Markov condition is valid, i.e. the intensity of each pixel is described with sufficient accuracy by its neighborhood

– The size of the neighborhood has been well chosen

for the collection

(42)

• Transform domain features

– In the case of low-level features, descriptors are

chosen for certain aspects such as the coarseness or contrast

– High-level features describe the complete picture

3.3 Transform Domain Features

– High-level features describe the complete picture in a different domain (no loss of information)

– Basically, the image is interpreted as a signal and

transformed mathematically

(43)

• Well known for images: Fourier transform

3.3 Transform Domain Features

• Idea: by transforming to another representation gain information - “see other things“

local space frequency space

(44)

• Typical features used in the texture analysis are

– Discrete Fourier Transform (DFT) – Discrete Cosine Transform (DCT) – Wavelet Transform (WT)

3.3 Transform Domain Features

(45)

• A transform is the conversion of a mathematical object into a different representation

– Transforms are reversible and information preserving

3.3 Transforms

• E.g.: a straight line can be described by...

– Two arbitrary points

– A point and the gradient

• Both give the same information

(46)

• More general result from algebra:

– For any set of n points (x 0 , y 0 ), …, (x n–1 , y n–1 ) in ℜ 2 there is exactly one polynom of degree n-1, which passes through all of these points

• This polynom can thus be represented as ...

(x , y ), …,(x , y )

3.3 Example: Polynomial interpolation

n-1

• This polynom can thus be represented as ...

– The set of points (x 0 , y 0 ), …,(x n – 1 , y n – 1 ) – The equation of the polynom

for all k = 0, ..., n-1 with suitable a a a a 0 0 0 0 , a , a , a , a 1 1 1 1 , …,a , …,a , …,a , …,a n n n n – – 1 1 1 1

(47)

• Special case: x 0 := 0, x 1 := 1, …, x n–1 := n–1

– That means an equidistant sampling over the x-axis – Exactly the case when reading intensity values of

some image row by row

y 0 , y 1 ,... , y n-1 3.3 Example: Polynomial interpolation

• Then, any sequence of real numbers y 0 , y 1 ,... , y n-1 can be transformed into a sequence of

coefficients a 0 , a 1 , ..., a n-1 where

is valid for all k := 0, …, n – 1

(48)

• Idea: an image is a discrete function which assigns each pixel ( x , y ) with an intensity I(x, y)

– In the case of color images, an intensity is assigned to each color channel (RGB, HSV, ...)

– Therefore, each row of an image can be interpreted as a

3.3 Images as Signals

– Therefore, each row of an image can be interpreted as a sequence of real numbers

• As seen, these rows can be transformed into the polynomial coefficients presentation form

– This representation is not suitable for texture description because although textures exhibit periodic grey value

variation polynoms, polynoms are not periodic

(49)

• Solution by Jean Fourier (1768-1830):

„Every sequence y 0, y 1, ..., y n-1 of real numbers can be transformed into a sequence of

coefficients a 0 , a 1 , …, a n/2 , b 0 , b 1 , …, b n/2 with

for k=0, …, n-1

3.3 Discrete Fourier Transform

a 0 , a 1 , …, a n/2 , b 0 , b 1 , …, b n/2 with

for k=0, …, n-1”

– This sequence can also be described, by the overlap of harmonic oscillations

– The coefficients are typical for periodic patterns

(50)

• Real space: representation as y 0, y 1, ..., y n-1

• Frequency space: representation as a 0 , a 1 , …, a n/2 , b 0 , b 1 , …, b n/2

• The discrete Fourier transformation can also be

3.3 DFT

a n/2 , b 0 , b 1 , …, b n/2

• The discrete Fourier transformation can also be generalized to two-dimensional data in real space (e.g., pixel coordinates)

– We then have sine and cosine waves, in the frequency

domain, each with a direction

(51)

• For each i=0, ..., w-1 and each j=0, ..., h-1 there is an oscillation of the form shown below.

Parameters i and j indicate the direction and wavelength of the oscillation

3.3 Two-dimensional DFT (formal)

i j

• w: width of the image

• h: height of the image

• f(x, y): intensity of pixel (x, y)

(52)

• Coefficients A(i, j) and B(i, j) indicate the

amplitude of the corresponding cosine and sine waves of the (i, j) parameter pair

3.3 Two-dimensional DFT (formal)

waves of the (i, j) parameter pair

• They can be calculated as follows:

(53)

• For graphic illustration of the amplitudes

instead of the two matrices A and B, a picture can be used

– The frequency space picture is derived as follows:

the value at position (i, j) represents the length of (A(i, j), B(i, j))

3.3 Two-dimensional DFT

the value at position (i, j) represents the length of vector (A(i, j), B(i, j))

• This length defines the Fourier spectrum

– i.e. how much of which frequency is contained in

the signal

(54)

• Horizontal frequencies are plotted horizontally in the frequency image

3.3 Frequency Space

– Vertical frequencies

are plotted vertically

(55)

• High/low periodicity in the real world is also reflected in feature space

3.3 Frequency Space

(56)

• Properties

– Symmetrically towards the origin – Harmonics

– Main oscillation

3.3 Frequency Space

Main oscillation – Brightness reflects

amplitude (strength)

– Size of periodicity

(57)

• How to compute DFTs?

– Fast Fourier Transform is an efficient algorithm class for computing DFT

– …DFT complexity is O(N 2 )

3.3 FFT

O(N 2 )

– FFT implementations

• Cooley-Tukey algorithm, Prime Factor FFT, Bruun’s FFT, …

(58)

• Cooley-Tukey algorithm

– Based on divide-and-conquer paradigm – Recursively expresses N=N 1 *N 2

– Reduces the complexity O(N*log N)

3.3 FFT

N=N 1 *N 2 Reduces the complexity

of calculating DFT to O(N*log 2 N)

• FFT in Matlab

(59)

• FFT in Matlab is easy to compute

3.3 FFT

(60)

3.3 Examples

Remove noise by masking pixels Lossy image

compression

© John M. Brayer, University of New Mexico

(61)

• The Discrete Cosine Transform (DCT) works analogously to DFT, only using cosine functions

– E.g., used in the encoding of JPEG images for compression purposes

3.3 Discrete Cosine Transform

• In the case of DFT and DCT the power spectrum (i.e. the coefficients) are used for comparisons

• Little problem: the spectrum produced by Fourier

transform shows all contained frequencies, but not

when (or where in the image) they occur

(62)

• Wavelet transforms approximate the intensity function through a different class of functions

– Approximation of the intensity function using a local base function (mother wavelet) in different resolutions and shifts

3.3 Wavelet Transform

and shifts

– Wavelets are thus local by frequency (through scaling) and time (through shifts)

– This solves the locality problem of DFT/DCT

(63)

• The function classes are locally integrable functions with integral = 0

3.3 Wavelet Transform

(64)

• Having some wavelet Ψ(x) we can generate a base B through appropriate shifting and scaling

3.3 Wavelet Transform

– Usually special values for the wavelet basis are

considered: a = 2 –j and b = k· 2 -j for integers j and k

– These values are called “critical sampling”

(65)

• The most simple example: the Haar wavelet

– Definition: 1 on [0, ½) and –1 on [½, 1)

– The corresponding functions

3.3 Wavelet Transform

– The corresponding functions form an orthogonal basis

in L 2 (ℜ) (all quadratically integrable functions)

– Can be made orthonormal

by a factor of 2 j/2 graph of (mother wavelet) Ψ

0, 0

( x )

(66)

• Baby-Wavelets: Ψ j, k (x)

3.3 Wavelet Transform

Ψ 1, 0 ( x ) Ψ 1, 1 ( x )

Scaled by a factor of 2 j Shifted by k ·2

j

Ψ 1, 0 ( x ) Ψ 1, 1 ( x )

Ψ 2, 0 ( x ) Ψ 2, 1 ( x ) Ψ 2, 2 ( x ) Ψ 2, 3 ( x )

The smaller the scale the more shifts

(exponentially)

(67)

• The base can also be represented using a scaling function φ 0,0

• For Haar wavelets, the scaling function is the

2 , y = { y , …, y }

3.3 Wavelet Transform

• For Haar wavelets, the scaling function is the characteristic function of the interval [0, 1)

– Each data set of cardinality 2 n , y = { y 0 , …, y 2 n 1 }

can then be represented on [0, 1) by a piecewise

continuous function:

(68)

• Our intensity values for image rows are basically discrete step functions

– Since step functions are finite, they can be expressed through the scaling function and Haar wavelets

3.3 Wavelet Transform

(69)

• Example: Describe the step function given by y = (1, 0, -3, 2, 1, 0, 1, 2)

3.3 Wavelet Transform

• Resolution

e.g., j = 0, 1, 2 2 0,5·j = e.g., j = 0, 1, 2

• Base with orthonormalization factor 2 0,5·j =

{1, 2 1/2 , 2} for {d 0,k , d 1,k , d 2,k }

(70)

3.3 Wavelet Transform

Scaling function

Mother Wavelet

Baby-Wavelets (

j

= 1)

Baby-Wavelets (

j

= 2)

• The solution of the equation system determines

the coefficients for each wavelet

(71)

• Solution

3.3 Wavelet Transform

• Obtained function

• Test

(72)

• Texture Analysis

– Multi-Resolution Analysis

• Retrieval with Shape-based Features

Next lecture

Features

Referenzen

ÄHNLICHE DOKUMENTE

– Identification of audio signals (audio as query) – Classification and search of similar signals. (matching

• Good results when the pitch of the analyzed signal is close to the pitch of one of the ideal. 7.3 Maximum

• The frame-based representation leads to a time series of pitch values. • Point wise comparison of the sound contour leads to very poor

based on data that you maintain, such as a name, number, or description; or by data that the DB2 Video Extender maintains, such as the format of the video or the date and time

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 4.. 10.1

– Naive video similarity: the total number of frames of a video, which are similar to at least one frame in the other video, divided by the total number

– Segment text regions in frames at the beginning of the video (high contrast, contiguous region with low color variance, often moved linearly over

• If an object is inserted in a full node, then the M+1 objects will be divided among two new nodes. • The goal in splitting is that it should rarely be needed to traverse