• Keine Ergebnisse gefunden

Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

N/A
N/A
Protected

Academic year: 2021

Aktie "Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de"

Copied!
11
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Multimedia Databases

Wolf-Tilo Balke Silviu Homoceanu

Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

• Lecture

22.10.2009 – 04.02.2010

– 15:00-17:15 (3 lecture hours with a break)

Exercises, detours, and home work

discussion integrated into lecture

• 4 Credits

• Exams

Oral exam

50% of exercise points needed to be eligible for the exam

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2

0. Organizational Issues

• Recommended literature

Schmitt: Ähnlichkeitssuche in

Multimedia-Datenbanken, Oldenbourg, 2005

Steinmetz: Multimedia-Technologie:

Grundlagen, Komponenten und Systeme, Springer, 1999

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 3

0. Organizational Issues

Castelli/Bergman: Image Databases, Wiley, 2002

Khoshafian/Baker: Multimedia and Imaging Databases, Morgan Kaufmann, 1996

Sometimes: original papers (on our Web page)

Relational Database Systems 1 – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 4

0. Organizational Issues

• Course Web page

http://www.ifis.cs.tu-bs.de/teaching/ws-0910/mmdb

Contains slides, excercises, related papers and a video of the lecture

Any questions? Just drop us an email…

0. Organizational Issues

1. Introduction

1.1 What are multimedia databases?

1.2 Multimedia database applications 1.3 Evaluation of retrieval techniques

1. Introduction

(2)

• What are multimedia databases (MMDB)?

Databases + multimedia = MMDB

• Key words: databases and multimedia

• We already know databases, so what is multimedia?

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 7

1.1 Multimedia Databases

• Multimedia

The concept of multimedia expresses the integration of different digital media types

The integration is usually performed in a document

Basic media types are text, image, vector graphics,

audio and video

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 8

1.1 Basic Definitions

• Text

Text data, Spreadsheets, E-Mail, …

• Image

Photos (Bitmaps), Vector graphics, CAD, …

• Audio

Speech- and music records, annotations, wave files, MIDI, MP3, …

• Video

Dynamical image record, frame-sequences, MPEG, AVI, …

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 9

1.1 Data Types

• Earliest definition of information retrieval:

“Documents are logically interdependent digitally encoded texts“

• Extension to multimedia documents allows the additional integration of other media types as images, audio or video

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 10

1.1 Documents

• Document types

Media objects are documents which are of only one type (not necessarily text)

Multimedia objects are general documents

which allow an arbitrary combination of different types

• Multimedia data is transferred through the use of a medium

1.1 Documents

• Medium

A medium is a carrier of information in a communication connection

It is independent of the transported information

The used medium can also be

changed during information transfer

1.1 Basic Definitions

(3)

• Book

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 13

1.1 Medium Example

Communication between author and reader

Independent from content

Hierarchically built on text

and images

Reading out loud represents medium change to sound/audio

• Based on receiver type

Visual/optical medium

Acoustic mediums

Haptical medium – through tactile senses

Olfactory medium – through smell

Gustatory medium – through taste

• Based on time

Dynamic

Static

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 14

1.1 Medium Classification

• We now have seen…

What multimedia is

And how it is transported (through some medium)

• But… why do we need databases?

Most important operations of databases are data storage and data retrieval

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 15

1.1 Multimedia Databases

• Persistent storage of multimedia data, e.g.:

Text documents

Vector graphics, CAD

Images, audio, video

• Content-based retrieval

Efficient content based search

Standardization of meta-data (e. g., MPEG-7, MPEG-21)

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 16

1.1 Multimedia Databases

• Stand-alone vs. database storage model?

Special retrieval functionality as well as corresponding optimization can be provided in both cases…

But in the second case we also get the general advantages of databases

•Declarative query language

•Orthogonal combination of the query functionality

•Query optimization, Index structures

•Transaction management, recovery

•...

1.1 Multimedia Databases 1.1 Historical Overview

Retrieval procedures for text documents (Information Retrieval)

Relational Databases and SQL

Presence of multimedia objects intensifies

SQL-92 introduces BLOBs First Multimedia-Databases 1960

1970

1980

1990

2000

(4)

• Relational Databases use the data type BLOB (binary large object)

Uninterpreted data

Retrieval through metadata like e.g., file name, size, author, …

• Object-relational extensions feature enhanced retrieval functionality

Semantic search

IBM DB2 Extenders, Oracle Cartridges, …

Integration in DB through UDFs, UDTs, Stored

Procedures, …

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 19

1.1 Commercial Systems

• Requirements for multimedia databases (Christodoulakis, 1985)

Classical database functionality

Maintenance of unformatted data

Consideration of

special storage and presentation devices

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 20

1.1 Requirements

• To comply with these requirements the following aspects need to be considered

Software architecture – new or extension of existing databases?

Content addressing – identification of the objects through content-based features

Performance – improvements using indexes, optimization, etc.

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 21

1.1 Requirements

User interface – how should the user interact with the system? Separate structure from content!

Information extraction – (automatic) generation of content-based features

Storage devices – very large storage capacity, redundancy control and compression

Information retrieval – integration of some extended search functionality

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 22

1.1 Requirements

• Retrieval means the choice between data objects which can be based on…

a SELECT condition (exact match)

or a defined similarity connection

(best match)

• Retrieval may cover the delivery of the results to the user, too

1.1 Retrieval

• Closer look at the search functionality

„Semantic“ search functionality

Orthogonal integration of classical and extended functionality

Search does not directly access the media objects

Extraction, normalization and indexing of content-

based features

Meaningful similarity/distance measures

1.1 Retrieval

(5)

• “Retrieve all images showing a sunset !”

• What exactly do these images have in common?

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 25

1.1 Content-based Retrieval

• Usually 2 main steps

Example: image databases

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 26

1.1 Schematic View

Digitization Image

collection Image analysis

extraction Image analysis

and feature extraction

Image database

Digitization Image

query Image analysis

extraction Image analysis

and feature extraction

Similarity search

Search result Querying the database

Creating the database

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 27

1.1 Detailed View

Query Result

MM-Objects + relational data 3. Query

preparation 5. Result preparation

4. Similarity computation & query processing

2. Extraction of features

1. Insert into the database MM-Database Query plan & feature values

Feature values Raw & relational data Result data

Raw data

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 28

1.1 More Detailed View

Query Result

MM Database MM-Database BLOBs/CLOBs

Similarity computation Query processing

Result preparation Medium transformation

Format transformation

Result data Query preparation

Normalization Segmentation Feature extraction

Optimization

Query plan Feature values

Feature values

Feature index

Feature extraction Feature recognition Feature preparation

Relational DB

Structure data Relational DB Metadata

Profile Structure data Pre-processing

MM-Objects Relational data Decomposition

Normalization Segmentation

Relevance feedback

• Lots of multimedia content on the Web

Social networking e.g., Facebook, MySpace, Hi5, Orkut, etc.

Photo sharing e.g., Flickr, Photobucket, Imeem, Picasa, etc.

Video sharing e.g., YouTube, Megavideo, Metacafe, blip.tv, Liveleak, etc.

1.2 Applications

• Cameras are everywhere

In London “there are at least

500,000 cameras in the city, and one study showed that in a single

day a person could expect to be filmed 300 times”

1.2 Applications

(6)

• Picasa face recognition

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 31

1.2 Applications

• Picasa, face recognition example

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 32

1.2 Applications

• Picasa, learning phase

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 33

1.2 Applications

• Picasa example

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 34

1.2 Applications

• Consider a police investigation of a large-scale drug operation

• Possible generated data:

–Videodata captured by surveillance cameras –Audiodata captured

–Imagedata consisting of still photographs taken by investigators

–Structured relational data containing background information

–Geographic information system data

1.2 Sample Scenario

• Possible queries

Image query by example:

police officer has a photograph and wants to find the identity of the person in the picture

•Query: “retrieve all images from the image library in which the person appearing in the (currently displayed) photograph appears”

Image query by keywords: police officer wants to examine pictures of “Tony Soprano”

•Query: “retrieve all images from the image library in which

‘Tony Soprano’ appears"

1.2 Sample Scenario

(7)

Video Query:

•Query: “Find all video segments in which Jerry appears”

•By examining the answer of the above query, the police officer hopes to find other people who have previously interacted with the victim

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 37

1.2 Sample Scenario

Heterogeneous Multimedia Query:

•Find all individuals who have been photographed with “Tony Soprano” and who have been convicted of attempted murder in New Jersey and who have recently had electronic fund transfers made into their bank accounts from ABC Corp.

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 38

1.2 Sample Scenario

• Basic difference –Static

High number of search queries (read access), few modifications of the data state

–Dynamic

Often modifications of the data state –Active

Database functionality lead to application operations –Passive

Database reacts only at requests from outside –Standard search

Queries are answered through the use of metadata e.g., Google-image search

–Retrieval functionality

Content based search on the multimedia repository e.g., Picasa face recognition

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 39

1.2 Characteristics

• Passive static retrieval

Art historical use case

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 40

1.2 Example

Possible hit in a multimedia database

1.2 Example

• Active dynamic retrieval

Wetter warning through evaluation of satellite photos

1.2 Example

Typhoon-Warning for the Philippines

Extraction

(8)

• Standard search

Queries are answered through the use of metadata e.g., Google-image search

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 43

1.2 Example

• Retrieval functionality

Content based e.g., Picasa face recognition

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 44

1.2 Example

• Basic evaluation of retrieval techniques

Efficiency of the system

•Efficient utilization of system resources

•Scalable also over big collections –

Effectivity of the retrieval process

•High quality of the result

•Meaningful usage of the system –

Weighting is application specific

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 45

1.3 Towards Evaluation

• Characteristic values to measure efficiency are e.g.:

Memory usage

CPU-time

Number of I/O-Operations

Response time

• Depends on the (Hardware-) environment

• Goal: the system should be efficient enough!

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 46

1.3 Evaluating Efficiency

• Measuring effectivity is more difficult and always depending on the query

• Goal: define some query-dependent evaluation measures!

Objective quality metrics

Result evaluation based on the query

Independent from used querying interface

and retrieval procedure

Leads to comparability of different systems/algorithms

1.3 Evaluating Effectivity

• Effectivity can be measured regarding some explicit query

Main focus on evaluating the behavior of the system with respect to a query

Relevance of the result set

• But effectivity also needs to consider implicit information needs

Main focus on evaluating the usefulness, usability and user friendliness of the system

1.3 Evaluating Effectivity

(9)

• To evaluate a retrieval system over some query, each document will be classified binary as relevant or irrelevant with respect to the query

This classification is performed by “experts”

The response of the system to the query will be compared to this manual classification

•Compare the obtained response with the “ideal” result

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 49

1.3 Relevance

• Subjective measure estimating to what degree the information need of the user is satisfied

Difficult to measure (empirical studies)

Questionable instrument for comparing

procedures/systems

• Attention: useful documents can be irrelevant when considering the query (serendipity)

• In this lecture:

explicit query evaluation measures only!

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 50

1.3 Usefulness (Pertinence)

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 51

1.3 Involved Sets

searched for (= relevant)

collection

found (= query result)

• Irrelevant documents, classified as relevant by the system

False alarms, false drops, …

• Needlessly increase the result set

• Usually inevitable (ambiguity)

• Can be easily eliminated by the user

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 52

1.3 False Positives

• Relevant documents classified by the system as irrelevant

False dismissals

• Dangerous, since they can not be detected easily by the user

Does the collection contain “better” documents?

False positives are usually not as bad as false negatives

1.3 False Negatives

• Correct positives (correct alarms)

All documents correctly classified by the

system as relevant

• Correct negatives (correct dismissals)

All documents correctly classified by the system as irrelevant

• All sets are disjunctive and their reunion is the entire document collection

1.3 Remaining Sets

(10)

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 55

1.3 Overview

cd irrelevant fa

fd relevant ca

irrelevant relevant

System- evaluation User-

evaluation

• {Relevant results} = fd + ca

• {Retrieved results} = ca + fa

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 56

1.3 Interpretation

searched for collection

found

ca

cd

fa fd

• Precision measures the ratio of correctly returned documents relative to all returned documents

P = ca / (ca + fa)

• Value between [0, 1]

(1 representing the best value)

• High number of false alarms mean worse results

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 57

1.3 Precision

• Recall measures the ratio of correctly returned documents relative to all relevant documents

R = ca / (ca + fd)

• Value between [0, 1]

(1 representing the best value)

• High number of false drops mean worse results

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 58

1.3 Recall

• Both measures only make sense, if considered at the same time

E.g., get perfect recall by simply returning all documents, but then the precision is extremely low…

• Can be balanced by tuning the system

E.g., smaller result sets lead to better precision rates at the cost of recall

• Usually the average precision-recall of more queries is considered (macro evaluation)

1.3 Precision-Recall Analysis

• Alarms (returned elements) can be easily divided in ca and fa

Precision is easy to calculate

• Dismissals (not returned elements) are not so trivial to divide in cd und fd, because the entire collection has to be classified

Recall is difficult to calculate

• Standardized Benchmarks

Provided connections and queries

Annotated result sets

1.3 Actual Evaluation

(11)

• Text REtrieval Conference

• De-Facto-Standard since 1992

• Establish average precision for 11 fixed recall points (0; 0,1; 0,2; …; 1) according to defined procedures (trec_eval)

• Different tracks, extended also for video data,

´Web retrieval and Question Answering

• Other initiatives: e.g., CLEF (cross-language retrieval) or INEX (XML-Documents)

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 61

1.3 TREC

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 62

1.3 Example

8 4 cd

0,5 0,8 0,2 P

0,525 0,8 0,25

R

Average 2 8 2 Q2

6 2 8 Q1

fd ca fa Query

• Precision-Recall-Graphs

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 63

1.3 Representation

System 1 System 2 System 3

Average precision of the system 3 at a recall-level of 0,2

Which system is the best?

What is more important: recall or precision?

• Retrieval of images by color

• Introduction to color spaces

• Color histograms

• Matching

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 64

Next lecture

Referenzen

ÄHNLICHE DOKUMENTE

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 7.. 1.1

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 72. 2.1 Multimedia

You can search for video clips based on data that you maintain, such as a name, number, or description; or by data that the DB2 Video Extender maintains, such as the format of

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2?.

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 3!. 2.1 Multimedia

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2?. 3 Using Textures for

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2?. 4

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2.. 5