Multimedia Databases

(1)

Multimedia Databases

Wolf-Tilo Balke Janus Wawrzinek

Institut für Informationssysteme

(2)

• Audio Retrieval

- Query by Humming

- Melody: Representation and Matching

• Parsons-Codes

• Dynamic Time Warping

- Hidden Markov Models (Introduction)

9 Previous Lecture

(3)

9 Audio Retrieval

9.1 Hidden Markov Models (continued from last lecture)

9 Video Retrieval

9.2 Introduction into Video Retrieval

9 Video Retrieval

(4)

• A HMM has at any time additional time- invariant observation probabilities

• A HMM consists of

– A homogeneous Markov process with state set

– Transition probabilities

9.1 Hidden Markov Model

(5)

– Start distribution – Stochastic process

of observations with basic sets

– And observation probabilities of observation o

_k

in state q

_j

9.1 Hidden Markov Model

(6)

9.1 HMM Example

Type 1 Type 2 Type 3 Type 4

Observations:

0.8 0.2

1

0.4 0.6

1 0.7

0.3

(7)

• Observation probability

– Given the observation sequence

and a fixed HMM λ

– How high is the probability that λ has generated the observation sequence?

=?

– Important for selecting between different models

9.1 Evaluation

(8)

• Let be a state sequence Then:

9.1 Evaluation

(9)

• Furthermore

is also valid

• And is valid for

9.1 Evaluation

(10)

• Thus the total probability for observation O is:

• Substituting in our previous results we obtain:

9.1 Evaluation

(11)

• Most probable state sequence

– Given the observation sequence

and a fixed HMM λ – What is the state sequence

which generates the observation sequence o, with the highest probability?

– Maximum likelihood estimator: maximize

9.1 Evaluation

(12)

• Because we know that

and that is constant for fixed sequences of observations, instead of maximizing

we can also maximize

• Definition:

– Maximal for the “most probable” path leading to the state q

_i

(at time t)

9.1 Evaluation

(13)

• is valid

• Therefore corresponds to a state sequence assuming that the occurrence of the observation sequence O, is the most likely

• Such a path can be constructed in steps by means of dynamic programming, via

9.1 Evaluation

(14)

• The corresponding algorithm is the Viterbi algorithm (Viterbi, 1967)

Initial step: for set

For inductively set

9.1 Evaluation

(15)

Termination:

Recursive path identification for probability p:

for t ∊ [1 : T - 1] set

9.1 Evaluation

(16)

• Given a fixed HMM, for each sequence of

observations, the Viterbi algorithm provides a sequence of states which has most probably

caused the observations

(Maximum likelihood estimator)

9.1 Evaluation

(17)

• Problem: Transition-, observation- and start probabilities are often

unknown

• Idea: training the parameters of the HMM λ

– Given an observation sequence

→ training sequence

– Task: determine the model parameters λ = (A, B, π) to maximize

9.1 Training of HMMs

(18)

• The training sequence should not be too short

• The maximization of the probability

leads to a high-dimensional optimization problem

– Solved e.g., through the Baum-Welch algorithm, which calculates a local optimum (Baum and

others, 1970)

9.1 Training of HMMs

(19)

• Baum-Welch algorithm:

– Begin with an initial estimate of parameters: either arbitrary or based on additional knowledge

9.1 Training of HMMs

(20)

• These statistics can be used for an iterative re- estimation of the parameters

• Define forward variables:

• Then:

and

is valid, for

9.1 Training of HMMs

(21)

• And backward variables:

• for

and for

9.1 Training of HMMs

(22)

• Then, the probability to be in q _i at time t if o has been observed, is:

9.1 Training of HMMs

(conditional probability)

(23)

• And the probability to be at time t in state q _i and at time t+1 in state q _j is:

9.1 Training of HMMs

(conditional probability)

(24)

• Then, the expected value of the number of times, state q _i was left from, is:

• And the expected value of the number of transitions from q _i to q _j is given by:

9.1 Training of HMMs

(25)

• The Baum-Welch algorithm then sets in the r-th iteration (r ≥ 0):

9.1 Training of HMMs

(26)

• , , are the initial values

• The new estimated values are defined by:

= (#expected transitions from q _i to q _j )/

(#expected transitions from q )

9.1 Training of HMMs

(27)

• Where has the value 1 if state o _k was observed at time t in the training sequence, and otherwise the value 0

• If there are several training sequences,

indicates the corresponding relative

9.1 Training of HMMs

(28)

• Now if we build for each parameter re-estimation HMM then we can shown that:

• Thus, models with newer estimates get better until a (at least local) maximum is reached

9.1 Training of HMMs

(29)

• Back to music recognition

– Feature extraction tries to convert a signal into a string

• Encoding acoustic events

– Music signals are sequences of acoustic events

• Segment the audio file, and determine the acoustic events for each segment

– The implementation of any acoustic event can

9.1 Applications of HMMs

(30)

• If there are several models for acoustic events, we can use a maximum likelihood estimator, to identify the model which generates the

observation sequence with the highest overall probability

9.1 Back to Music Recognition

(31)

• Train H HMMs

– Training by manual (small H) or automatic (large H) mapping between acoustic events and segments

(observation sequences) of a signal

– Each HMM represents a specific acoustic event

– Then determine the most probable producer and attribute to each segment, the corresponding event

9.1 Idea

(32)

• Assignment of two segments to events A and B

9.1 Training example

(33)

• Extracted segments are used as training

examples for the HMMs of the corresponding events

– If the following feature sequence belongs to A, after appropriate quantization

then this observation sequence will be used:

9.1 Training example

(34)

• Complete HMM (ergodic model); e.g., with 3 states:

9.1 Often used HMMs

(35)

• Left-right model (Bakis model); e.g., with 3 states

9.1 Often used HMMs

(36)

• Given: a sequence of observations from feature sequence of length T:

• Goal: find sequence

so that feature subsequence is associated to HMM

• Example: given o

₁

, o

₂

, o

₃

, o

₄

, o

₅

and 3 HMM models λ

₁

, λ

₂

, λ

₃

• (2, 1), (1, 3), (3, 4) which associates:

– λ

₂

with (o

₁

, o

₂

) – λ with (o ) and

9.1 Feature Extraction with HMMs

(37)

• Combine all H HMM graphs completely (with equal probability)

– Embedded in a macro-HMM

• We can start with any model and migrate to any other model

– Any sequence of acoustic events can be represented in the macro-HMM

9.1 Realization

(38)

• The macro-HMM behaves like a normal HMM

– All possible events are equally represented in the macro-HMM

– With the Viterbi algorithm we can establish the most probable state sequence

9.1 Realization

(39)

• Macro-HMM with as the graph corresponding to HMM

– H = 4

9.1 Illustration

(40)

• For each single HMM graph we need of course only to completely connect the states in the macro graphs, which may occur as a start and end states

9.1 Illustration

(41)

• The data stream can be an infinite sequence

feature, and it is not clear exactly when an event begins

– Build a sequence of sub-sequences

and apply the Viterbi algorithm to each subsequence

– Unfortunately, with complexity of

9.1 Problems in Real Applications

(42)

• How do we choose the best ”window size” w for the sub-sequences?

– Choose w sufficiently large so that a higher number of HMM graphs can be traversed

– If while passing through the various paths of the

macro-HMM with the Viterbi algorithm, a sufficiently large probability value occurs, then break the

computation and return the HMMs we have traversed until then and the corresponding time points

9.1 Problems in Real Applications

(43)

• Classify pig coughs into morbid or healthy by using HMM

9.1 HMM by example

(44)

– Extract features e.g., spectrogram of a pig cough

9.1 HMM by example

(45)

– Find a HMM which represents a pig cough as good as possible

9.1 HMM by example

(46)

– Train (Baum-Welch algorithm) the HMM for different cough types (vary diseases), e.g., Pasteurella disease

9.1 HMM by example

(47)

– When a pig coughs…use the Viterbi algorithm, to establish if the pig is ill or not!

9.1 HMM by example

(48)

• Video data

– Increasingly important for exchanging information

• Illustrative clips (simulations, animations, etc.)

• Presentations, lectures

• Video conferencing

• ...

– Particularly more frequently on the internet e.g., YouTube, video on demand, ...

9.2 Video Retrieval

(49)

• YouTube Statistics

– Over 6 billion hours of video are watched each month on YouTube

• That's almost an hour for every person on Earth, and 50% more than in 2012

– 100 hours of video are uploaded to YouTube every minute

• Management of video data is among others a

9.2 Video Data

(50)

• Regarding video data, it is necessary to efficiently:

– Store it

– Make it accessible

– And be able to recover it

• Today's databases

– Blobs, smart blobs

– Retrieval on metadata – Splitting into key-frames

9.2 Video Data

(51)

• Example: IBM AIV Extenders for IBM DB2 UDB

(Development now discontinued)

• Incorporating the

QBIC prototype into a commercial database

9.2 Database support

(52)

• Description on the IBM Web site:

– “DB2 Video Extender adds the power of video

retrieval to SQL queries. You can integrate video data and traditional business data in a single query. For

example, you can query a news database for video

news clips about a specific subject, and list the playing time of each video clip. Then use the Video Extender to play the video clips.”

9.2 Database support

(53)

• Example usage:

– “For example, an advertising agency stores

information about its campaigns in a DB2 database and uses DB2 AIV Extenders to store its print ads, broadcast and video ads. One SQL query can

retrieve multimedia data, such as print or

broadcast ads for a particular year or client, as well as related business data in the database.”

9.2 Database support

(54)

– “Using DB2 Video Extender, you can define new

data types and functions for video data using DB2 Universal Database’s built-in support for user-defined types and user-defined functions.”

– “Secure and recover video data. Video clips and their attributes that you store in a DB2 database are afforded the same security and recovery protection as traditional business data.”

9.2 Database support

(55)

• Goal: “… allows you to store and

query video data as easily as you can traditional data”

– “Import and export video clips and their attributes into and out of a database. When you

import a video clip, the DB2 Video Extender stores and maintains video attributes such as frame rate, compression format, and number of video tracks.”

9.2 Database support

(56)

– “Query video clips based on related business data or by video attributes. You can search for video clips

based on data that you maintain, such as a name, number, or description; or by data that the DB2 Video Extender maintains, such as the format of the video or the date and time that it was last

updated.”

9.2 Database support

(57)

– “Play video clips. You can use the DB2 Video

Extender to retrieve a video clip. You can then use the DB2 Video Extender to invoke your favorite video

browser to play the video clip.”

– “The DB2 Video Extender supports a

variety of video file formats, and can work with different file-based video servers.”

9.2 Database support

(58)

• Main problems: continuous medium

– Composed of several streams

– Image stream with visual information

(often different views / camera angle of a scene) – Audio stream (usually more than 1,

e.g., synchronous tracks on DVDs) – Stream of text (subtitles, news flash,

...)

9.2 Video Retrieval Topics

(59)

• Main problems: organization

– Video is a structured medium in time and space – Videos can not be seen as a set of individual frames,

but as a document

– Video abstraction decomposes video clips into structured parts (visual table of contents)

9.2 Video Retrieval Topics

(60)

• How do queries work in video retrieval?

– Specification of certain features in SQL (as in the IBM extender)

– Specification of semantic content?

• E.g., via keywords: news about “politics”

– Query by example?

• One possibility are sample images:

“Find all the movies with this actor“

9.2 Video Retrieval Topics

(61)

• Retrieval technology: comprises all the other problem groups

– Image Retrieval for the description of independent images, key frames, etc.

– Audio Retrieval for the evaluation of the sound track, voice recognition, etc.

– Text retrieval for the search in any subtitles, summaries, transcriptions of the audio track, etc.

9.2 Video Retrieval Topics

(62)

• However, these techniques can also be combined in the case of video data

– Person recognition using segmentation and detection of subtitles

– Assignment based on the actor's voice – Classify objects by shape and

speech information

– Detect exciting sports scenes by the audience's applause, etc.

9.2 Retrieval Technology

(63)

• Other features which don’t occur in image, audio and text retrieval, affect the detection of the temporal behavior of objects in space

– Movement of objects (direction, speed), instead of simple recognition as in the image retrieval

• E.g., “Car moves slowly from left to right” or “two people walk together”

9.2 Other Features

(64)

• Recognition of movement is normally done through the comparison of shapes in a sequence of images

– In one frame e.g., by edge detection

– Transforming shapes in successive images using translation, rotation and scaling

– If successful, the type of transformation provides information about the parameters of motion

9.2 Other Features

(65)

• The extraction of moving objects

is supported in MPEG-4 encoded streams

– Separate compression of background and the (moving) foreground objects

– Since fore-/background elements change only little, we only need to detect shifts and possibly changes in the camera angle

9.2 Other Features

(66)

• Detection of camera movement

– Changes in camera angle (zooming, fade in/out, ...) – Movement of the camera itself

(e.g., through background analysis – Recognition through various

models for the individual effects

9.2 Other Features

(67)

• Time/place relationships

– Object motion results in trajectories in time and space

– Intersection of trajectories

(e.g., car accidents on the observed crossings) – Comparisons between different media

• E.g., all the videos with a movement from the upper right corner to lower left corner

9.2 Other Features

(68)

• Retrieval of the best videos is very

expensive, the quality is difficult to evaluate

• Retrieval result as a set of video abstractions

– Summary sequences provide an overview of the content, usually with annotated

key frames

– Highlights are scene cuts of certain video passages (trailer)

9.2 Result Presentation

(69)