• Keine Ergebnisse gefunden

Multimedia Databases

N/A
N/A
Protected

Academic year: 2021

Aktie "Multimedia Databases"

Copied!
71
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Multimedia Databases

Wolf-Tilo Balke Janus Wawrzinek

Institut für Informationssysteme

(2)

• Audio Retrieval

- Query by Humming

- Melody: Representation and Matching

• Parsons-Codes

• Dynamic Time Warping

- Hidden Markov Models (Introduction)

9 Previous Lecture

(3)

9 Audio Retrieval

9.1 Hidden Markov Models (continued from last lecture)

9 Video Retrieval

9.2 Introduction into Video Retrieval

9 Video Retrieval

(4)

• A HMM has at any time additional time- invariant observation probabilities

• A HMM consists of

– A homogeneous Markov process with state set

– Transition probabilities

9.1 Hidden Markov Model

(5)

– Start distribution – Stochastic process

of observations with basic sets

– And observation probabilities of observation o

k

in state q

j

9.1 Hidden Markov Model

(6)

9.1 HMM Example

Type 1 Type 2 Type 3 Type 4

Observations:

0.8 0.2

1

0.4 0.6

1 0.7

0.3

(7)

Observation probability

– Given the observation sequence

and a fixed HMM λ

– How high is the probability that λ has generated the observation sequence?

=?

– Important for selecting between different models

9.1 Evaluation

(8)

• Let be a state sequence Then:

9.1 Evaluation

(9)

• Furthermore

is also valid

• And is valid for

9.1 Evaluation

(10)

• Thus the total probability for observation O is:

• Substituting in our previous results we obtain:

9.1 Evaluation

(11)

Most probable state sequence

– Given the observation sequence

and a fixed HMM λ – What is the state sequence

which generates the observation sequence o, with the highest probability?

Maximum likelihood estimator: maximize

9.1 Evaluation

(12)

• Because we know that

and that is constant for fixed sequences of observations, instead of maximizing

we can also maximize

• Definition:

– Maximal for the “most probable” path leading to the state q

i

(at time t)

9.1 Evaluation

(13)

• is valid

• Therefore corresponds to a state sequence assuming that the occurrence of the observation sequence O, is the most likely

• Such a path can be constructed in steps by means of dynamic programming, via

9.1 Evaluation

(14)

• The corresponding algorithm is the Viterbi algorithm (Viterbi, 1967)

Initial step: for set

For inductively set

9.1 Evaluation

(15)

Termination:

Recursive path identification for probability p:

for t ∊ [1 : T - 1] set

9.1 Evaluation

(16)

• Given a fixed HMM, for each sequence of

observations, the Viterbi algorithm provides a sequence of states which has most probably

caused the observations

(Maximum likelihood estimator)

9.1 Evaluation

(17)

Problem: Transition-, observation- and start probabilities are often

unknown

Idea: training the parameters of the HMM λ

– Given an observation sequence

→ training sequence

– Task: determine the model parameters λ = (A, B, π) to maximize

9.1 Training of HMMs

(18)

• The training sequence should not be too short

• The maximization of the probability

leads to a high-dimensional optimization problem

– Solved e.g., through the Baum-Welch algorithm, which calculates a local optimum (Baum and

others, 1970)

9.1 Training of HMMs

(19)

Baum-Welch algorithm:

– Begin with an initial estimate of parameters: either arbitrary or based on additional knowledge

9.1 Training of HMMs

(20)

• These statistics can be used for an iterative re- estimation of the parameters

• Define forward variables:

• Then:

and

is valid, for

9.1 Training of HMMs

(21)

• And backward variables:

• for

and for

9.1 Training of HMMs

(22)

• Then, the probability to be in q i at time t if o has been observed, is:

9.1 Training of HMMs

(conditional probability)

(23)

• And the probability to be at time t in state q i and at time t+1 in state q j is:

9.1 Training of HMMs

(conditional probability)

(24)

• Then, the expected value of the number of times, state q i was left from, is:

• And the expected value of the number of transitions from q i to q j is given by:

9.1 Training of HMMs

(25)

• The Baum-Welch algorithm then sets in the r-th iteration (r ≥ 0):

9.1 Training of HMMs

(26)

• , , are the initial values

• The new estimated values are defined by:

= (#expected transitions from q i to q j )/

(#expected transitions from q )

9.1 Training of HMMs

(27)

• Where has the value 1 if state o k was observed at time t in the training sequence, and otherwise the value 0

• If there are several training sequences,

indicates the corresponding relative

9.1 Training of HMMs

(28)

• Now if we build for each parameter re-estimation HMM then we can shown that:

• Thus, models with newer estimates get better until a (at least local) maximum is reached

9.1 Training of HMMs

(29)

• Back to music recognition

Feature extraction tries to convert a signal into a string

• Encoding acoustic events

– Music signals are sequences of acoustic events

• Segment the audio file, and determine the acoustic events for each segment

– The implementation of any acoustic event can

9.1 Applications of HMMs

(30)

• If there are several models for acoustic events, we can use a maximum likelihood estimator, to identify the model which generates the

observation sequence with the highest overall probability

9.1 Back to Music Recognition

(31)

• Train H HMMs

Training by manual (small H) or automatic (large H) mapping between acoustic events and segments

(observation sequences) of a signal

– Each HMM represents a specific acoustic event

– Then determine the most probable producer and attribute to each segment, the corresponding event

9.1 Idea

(32)

• Assignment of two segments to events A and B

9.1 Training example

(33)

• Extracted segments are used as training

examples for the HMMs of the corresponding events

– If the following feature sequence belongs to A, after appropriate quantization

then this observation sequence will be used:

9.1 Training example

(34)

• Complete HMM (ergodic model); e.g., with 3 states:

9.1 Often used HMMs

(35)

• Left-right model (Bakis model); e.g., with 3 states

9.1 Often used HMMs

(36)

Given: a sequence of observations from feature sequence of length T:

Goal: find sequence

so that feature subsequence is associated to HMM

• Example: given o

1

, o

2

, o

3

, o

4

, o

5

and 3 HMM models λ

1

, λ

2

, λ

3

• (2, 1), (1, 3), (3, 4) which associates:

– λ

2

with (o

1

, o

2

) – λ with (o ) and

9.1 Feature Extraction with HMMs

(37)

• Combine all H HMM graphs completely (with equal probability)

– Embedded in a macro-HMM

• We can start with any model and migrate to any other model

– Any sequence of acoustic events can be represented in the macro-HMM

9.1 Realization

(38)

• The macro-HMM behaves like a normal HMM

– All possible events are equally represented in the macro-HMM

– With the Viterbi algorithm we can establish the most probable state sequence

9.1 Realization

(39)

• Macro-HMM with as the graph corresponding to HMM

– H = 4

9.1 Illustration

(40)

• For each single HMM graph we need of course only to completely connect the states in the macro graphs, which may occur as a start and end states

9.1 Illustration

(41)

• The data stream can be an infinite sequence

feature, and it is not clear exactly when an event begins

– Build a sequence of sub-sequences

and apply the Viterbi algorithm to each subsequence

– Unfortunately, with complexity of

9.1 Problems in Real Applications

(42)

• How do we choose the best ”window size” w for the sub-sequences?

– Choose w sufficiently large so that a higher number of HMM graphs can be traversed

– If while passing through the various paths of the

macro-HMM with the Viterbi algorithm, a sufficiently large probability value occurs, then break the

computation and return the HMMs we have traversed until then and the corresponding time points

9.1 Problems in Real Applications

(43)

• Classify pig coughs into morbid or healthy by using HMM

9.1 HMM by example

(44)

– Extract features e.g., spectrogram of a pig cough

9.1 HMM by example

(45)

– Find a HMM which represents a pig cough as good as possible

9.1 HMM by example

(46)

– Train (Baum-Welch algorithm) the HMM for different cough types (vary diseases), e.g., Pasteurella disease

9.1 HMM by example

(47)

– When a pig coughs…use the Viterbi algorithm, to establish if the pig is ill or not!

9.1 HMM by example

(48)

Video data

– Increasingly important for exchanging information

• Illustrative clips (simulations, animations, etc.)

• Presentations, lectures

• Video conferencing

• ...

– Particularly more frequently on the internet e.g., YouTube, video on demand, ...

9.2 Video Retrieval

(49)

YouTube Statistics

– Over 6 billion hours of video are watched each month on YouTube

That's almost an hour for every person on Earth, and 50% more than in 2012

100 hours of video are uploaded to YouTube every minute

• Management of video data is among others a

9.2 Video Data

(50)

• Regarding video data, it is necessary to efficiently:

– Store it

– Make it accessible

– And be able to recover it

Today's databases

– Blobs, smart blobs

– Retrieval on metadata – Splitting into key-frames

9.2 Video Data

(51)

• Example: IBM AIV Extenders for IBM DB2 UDB

(Development now discontinued)

• Incorporating the

QBIC prototype into a commercial database

9.2 Database support

(52)

• Description on the IBM Web site:

– “DB2 Video Extender adds the power of video

retrieval to SQL queries. You can integrate video data and traditional business data in a single query. For

example, you can query a news database for video

news clips about a specific subject, and list the playing time of each video clip. Then use the Video Extender to play the video clips.”

9.2 Database support

(53)

Example usage:

– “For example, an advertising agency stores

information about its campaigns in a DB2 database and uses DB2 AIV Extenders to store its print ads, broadcast and video ads. One SQL query can

retrieve multimedia data, such as print or

broadcast ads for a particular year or client, as well as related business data in the database.”

9.2 Database support

(54)

– “Using DB2 Video Extender, you can define new

data types and functions for video data using DB2 Universal Database’s built-in support for user-defined types and user-defined functions.”

“Secure and recover video data. Video clips and their attributes that you store in a DB2 database are afforded the same security and recovery protection as traditional business data.”

9.2 Database support

(55)

Goal: “… allows you to store and

query video data as easily as you can traditional data”

“Import and export video clips and their attributes into and out of a database. When you

import a video clip, the DB2 Video Extender stores and maintains video attributes such as frame rate, compression format, and number of video tracks.”

9.2 Database support

(56)

– “Query video clips based on related business data or by video attributes. You can search for video clips

based on data that you maintain, such as a name, number, or description; or by data that the DB2 Video Extender maintains, such as the format of the video or the date and time that it was last

updated.”

9.2 Database support

(57)

– “Play video clips. You can use the DB2 Video

Extender to retrieve a video clip. You can then use the DB2 Video Extender to invoke your favorite video

browser to play the video clip.”

– “The DB2 Video Extender supports a

variety of video file formats, and can work with different file-based video servers.”

9.2 Database support

(58)

• Main problems: continuous medium

– Composed of several streams

– Image stream with visual information

(often different views / camera angle of a scene) – Audio stream (usually more than 1,

e.g., synchronous tracks on DVDs) – Stream of text (subtitles, news flash,

...)

9.2 Video Retrieval Topics

(59)

• Main problems: organization

– Video is a structured medium in time and space – Videos can not be seen as a set of individual frames,

but as a document

Video abstraction decomposes video clips into structured parts (visual table of contents)

9.2 Video Retrieval Topics

(60)

• How do queries work in video retrieval?

– Specification of certain features in SQL (as in the IBM extender)

– Specification of semantic content?

• E.g., via keywords: news about “politics”

– Query by example?

• One possibility are sample images:

“Find all the movies with this actor“

9.2 Video Retrieval Topics

(61)

Retrieval technology: comprises all the other problem groups

Image Retrieval for the description of independent images, key frames, etc.

Audio Retrieval for the evaluation of the sound track, voice recognition, etc.

Text retrieval for the search in any subtitles, summaries, transcriptions of the audio track, etc.

9.2 Video Retrieval Topics

(62)

• However, these techniques can also be combined in the case of video data

– Person recognition using segmentation and detection of subtitles

– Assignment based on the actor's voice – Classify objects by shape and

speech information

– Detect exciting sports scenes by the audience's applause, etc.

9.2 Retrieval Technology

(63)

• Other features which don’t occur in image, audio and text retrieval, affect the detection of the temporal behavior of objects in space

Movement of objects (direction, speed), instead of simple recognition as in the image retrieval

• E.g., “Car moves slowly from left to right” or “two people walk together”

9.2 Other Features

(64)

Recognition of movement is normally done through the comparison of shapes in a sequence of images

– In one frame e.g., by edge detection

– Transforming shapes in successive images using translation, rotation and scaling

– If successful, the type of transformation provides information about the parameters of motion

9.2 Other Features

(65)

• The extraction of moving objects

is supported in MPEG-4 encoded streams

– Separate compression of background and the (moving) foreground objects

– Since fore-/background elements change only little, we only need to detect shifts and possibly changes in the camera angle

9.2 Other Features

(66)

Detection of camera movement

– Changes in camera angle (zooming, fade in/out, ...) – Movement of the camera itself

(e.g., through background analysis – Recognition through various

models for the individual effects

9.2 Other Features

(67)

Time/place relationships

– Object motion results in trajectories in time and space

– Intersection of trajectories

(e.g., car accidents on the observed crossings) – Comparisons between different media

• E.g., all the videos with a movement from the upper right corner to lower left corner

9.2 Other Features

(68)

• Retrieval of the best videos is very

expensive, the quality is difficult to evaluate

• Retrieval result as a set of video abstractions

Summary sequences provide an overview of the content, usually with annotated

key frames

Highlights are scene cuts of certain video passages (trailer)

9.2 Result Presentation

(69)

Personal news (all clips on interesting topics)

Entertainment

– Automatic recognition of film genres (love story, action movie, comedy, ...)

Detection of advertising in TV

– Automatic recording of material from the television

9.2 Application of Video Retrieval

(70)

• Hidden Markov Models

(continued from last lecture)

• Introduction into Video Retrieval

9 This Lecture

(71)

• Video Abstraction

• Shot Detection

9 Next lecture

Referenzen

ÄHNLICHE DOKUMENTE

The text is sometimes slightly reordered: instead of starting with sütra 5, which would in any case become clear only after having read the commentary, Sähib Kaul

It is contained in the interval [0.63987, 0.79890] — the probability that the Jacobian variety of a randomly chosen hyperelliptic curve has a prime number of rational points is a

Candidates for “y ∈ B ⇔ Q A accepts y“ change („injury“) but only a finite number of times:.. • namely when some P<Q terminates („priority“) and, once settled,

Economists like use discrete-time models more than continuous-time model in economic modeling because, on the one hand, economic data are reported in terms of discrete-time such

En búsqueda del perfeccionamiento del sistema GES para los privados, es posible considerar un estudio realizado por la Superintendencia de Salud con un censo en relación a

Das Blut hängt rum in meinen Gliedern - diese Beine waren die Bestechung und ich gab dafür meinen seelischen Halt, Adams schwache Rippe.. Diese Nächte ängstigen mich,

The induction method in Question 2 (b) has clearly the disadvantage that one has to know in advance the formula for the sum, in order to prove it.. How many are the triangles,

To match the market stochasticity we introduce the new market-based price probability measure entirely determined by probabilities of random market time-series of the