• Keine Ergebnisse gefunden

Multimedia Databases Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

N/A
N/A
Protected

Academic year: 2021

Aktie "Multimedia Databases Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de"

Copied!
10
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Multimedia Databases

Wolf-Tilo Balke Younès Ghammad

Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

• Shape-based Features - Chain Codes - Area-based Retrieval - Moment Invariants - Query by Example

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2

Previous Lecture

6 Audio Retrieval 6.1 Basics of Audio Data

6.2 Audio Information in Databases 6.3 Audio Retrieval

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 3

6 Audio Retrieval

• Information transfer through sound – Audio (Latin, "I hear")

• Three different types of data:

MusicSpoken textNoise

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 4

6.1 Basics of Audio Data

Auditory perception

through pressure fluctuations in the air

• Eardrum vibrates synchronously

• Ear bones amplify and transmit the vibrations

• Auditory hair cells in the ear cochlea, are stimulated by the vibrations

• Neurons produce electrical impulses

6.1 Basics

• 3D model of the human ear

6.1 Basics

(2)

• Our brain only interprets two major properties of sound:

PitchVolume

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 7

6.1 Basics

• Quantitative performance of the sound wave

Amplitude as volume – Logarithmic perception

(tenfold increase in amplitude doubles the perceived loudness)

Frequency as pitch

– Number of periods per unit time

is known as frequency (measured in hertz) – Hearing range between 20 Hz and 20 kHz

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 8

6.1 Basics

• Audio signals are time-dependent (overlapping) waveforms

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 9

6.1 Basics

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 10

6.1 Basics

• Constructive and destructive interference

6.1 Basics

• Musical instruments are classified based on the vibration generator

– E.g., string-, blowing-, percussion

• Acoustics depends on the vibration generator – E.g., strings-, air, membrane instruments

Synthetic creation needs an oscillator

• The oscillator generates voltage oscillations

Speaker transmit the voltage changes on a membrane

6.1 Sound Creation

(3)

• Influence of the oscillator

– Higher voltage ⟹ Higher frequency (Moog, 1964)

• Amplifier influences the amplitude thus the volume

• ADSR (attack-decay-sustain-release) envelope influences the loudness of a sound in time

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 13

6.1 Sound Creation

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 14

6.1 Sound Creation

Moog 901B (1964) Modular Moog Synthesizer (1967)

Emerson, Lake & Palmer:

The Great Gates of Kiev (1974)

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 15

6.1 Sound Creation

• Synthesized sounds seem rather metallic. For producing a single synthesized sound, consider four typical phases:

Attack: speed and strength of the signal rise – Decay: lowering the level

Sustain: actual pitch – Release: end of the

signal

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 16

6.1 ADSR envelope

• Transformation of the continuous sound wave into a discrete representation

Sampling: save at regular intervals, the current amplitude value of vibration

• Clearly, we have to reconstruct audio signals from these values

6.1 Digitalization of Audio Data

Amplitude

• Basic characteristics

Sampling rate: how many times per unit time is the value of the continuous signal tapped?

Resolution: which accuracy are the values recorded with?

• Often, a resolution of 16 bits is used (2

16

different amplitude values)

• The sampling rate is application dependent:

– Audio CD: 44100 Hz – Phone: 8000 Hz

6.1 Sampling

(4)

• It is very important to uniquely reconstruct the initial oscillation

• The higher the sampling frequency, the more values must be saved

• Minimum sampling frequency?

Sampling theorem (Nyquist, 1928) Sampling rate must be at least twice as large as the highest frequency occurring in the signal

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 19

6.1 Sampling Rate

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 20

6.1 Sampling Rate

• Phone: 8000 Hz

• DVD audio: 96,000 Hz or 192,000 Hz

Audio CD: 44,100 measurements per second for two stereo channels with

16 bits per measurement results in 176.4 kB/s

⟹ ca 10 MB/min, i.e., 635 MB/h

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 21

6.1 Sampling Rate

• For space reasons, is digital data usually stored in compressed form

• Known uncompressed formats:

– AIFF: *. aif (Apple Inter opportunity File Format) – Wave: *. wav (Windows)

– IRCAM: *. snd (Institut de Recherche et Coordination Acoustique / Musique)

– AU: *. au (Sun audio)

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 22

6.1 Audio Formats

Data reduction: with (lossy) or without information loss (lossless)

Lossless compression methods generally don’t compress very much

– Free Lossless Audio Codec

•50–60% of their original size

– Apple Lossless

– WavPack – ...

6.1 Compression

Lossy compression algorithms typically are based on simple transformations

– Modified Discrete Cosine

Transformation (MDCT) or wavelets

Encoding: transformation of the waveform in frequency sequences (sampling)

Decoding: Reconstruction of waveform from these values

• What data can we afford loosing?

6.1 Compression

(5)

• Change of the data without changing the subjective perception

– Omit very high/low frequencies

– Save superimposed frequencies with less precision – Use of other effects according to psychoacoustic

model, e.g., low tones before/after very loud sounds and frequency changes at a minimum distance are impossible to hear ...

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 25

6.1 Compression

• MPEG-1 Audio Layers I, II and III (MP3)

• CD quality at bit rates of 128 kb/s

• Coarse approach to MP3

– Channel coupling of the stereo signal by using the difference

– Cut off inaudible frequencies

– Eliminate redundancy by considering the psychoacoustic effects

– Compress data using Huffman coding

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 26

6.1 Compression

• AAC (Advanced Audio Coding) – Industry-driven improvement of the MP3

format (supported by the MPEG)

– Used in TV-/radio broadcasts, Apple iTunes Store, ...

– Better quality at same file size – Support for multi-channel audio

– Supports 48 main sound channels with up to 96 kHz sampling rate, 16 low-frequency channels (limited to 120 Hz) and 15 data flows

• Ogg Vorbis, Real Audio, WMA 9, ...

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 27

6.1 Compression

• Lossless compression, important factors are:

– De-/compression speed

– Compression rate

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 28

6.1 Compression

• Lossy compression, important factors are:

– De-/compression speed – Compression rate

– Most important, the compressed audio quality!

6.1 Compression

• Lossy compression, results

6.1 Compression

(6)

• Lossy compression, results

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 31

6.1 Compression

• Communication protocol

– For transmission, recording and

playing musical control information between digital instruments and the PC

– Statements are not sounds, but commands that can be used e.g., by sound cards

– Some commands: Note-on, note-off, key velocity, pitch, type of instrument

• Example:

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 32

6.1 The MIDI Format

MIDI: Original:

• 10 minutes music are about 200 KB of MIDI data

– Significant savings compared to sampling, but no original sound

• Data are input to the PC via keyboard and output via synthesizer

Sequencer for caching data and changes

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 33

6.1 The MIDI Format

• Audio data – Music, CDs

– Sound effects, “Earcons”

• Audio data represent most of information transfer

– Storage of historical speeches

– Recordings of conversations, phone calls or negotiations

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 34

6.2 Audio Information in Databases

• Three main applications of audio signals in the context of databases

– Identification of audio signals (audio as query) – Classification and search of similar signals

(matching of audio) – Phonetic synchronization

6.2 Special Applications

• Find the title, etc. for this music piece

• Monitoring of audio streams

– Control of broadcasting of advertisements on radio – Copyright Control (GEMA)

– (Remote) diagnosis based on noise

• Audio on Demand

6.2 Identification of Audio Signals

(7)

• Find perceptionally similar audio signals – E.g., similar pieces of music, the same quotation, ...

• Recommendation – E.g., bands with similar

music

• Genre classification (rock, classical, ...)

– E.g., in audio libraries

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 37

6.2 Classification and Matching

• Synchronization of audio streams – Speech ⟺ text, notes ⟺ audio, ...

• Retrieval of text from or to speech – Find specific points in a speech – Verbal query to text documents

• Following of audio scores in concerts, etc.

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 38

6.2 Synchronization

• Identification

– The simplest of the three problems, in recent years, successful research

• Classification and Matching – Often still manual annotations

– Automatic classification only works roughly, on small collections

– Matching is still largely unresolved

• Synchronization

– Meanwhile, tolerable error rates (language ⟺ text)

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 39

6.2 State of the Art

• (Compressed) audio files are stored in the database as (smart) BLOBs

• Additionally, are metadata and feature vectors stored for the realization of the

search functionality

– Language: transcription as text – Music: musical notation or MIDI

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 40

6.2 Persistent Storage

• Search in audio data: metadata describe the audio file

– Semantic metadata: difficult to generate title, artist, speaker, keywords, ...

– File information: can be automatically generated e.g., time/place of recording, filename, file size, ...

– Widely used, e.g., music exchange markets, online shops, ...

6.3 Audio Retrieval

Manual indexing is extremely labor intensive and expensive

• Information is often incomplete, partial and subjective (e.g., genre classification)

• No possibility to Query by Example ( "Sounds like ...")

• Search only with SQL, approximate string search, etc.

6.3 Metadata-based Search

(8)

Using content of audio files

• Compare measure vs., measure – Not very promising, inefficient

– Differences in sampling rate and resolution

• Sounds can be differentiated by certain characteristics

– Low Level Features – Logical Features

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 43

6.3 Content-based Search

Acoustic features

– Same basic idea as in image databases – Description of signal information by means of

characteristic features

– In contrast to image information we don’t use a single feature vector, but a time-dependent vector function

– “Time-point” of the acoustic characteristics, rather than being contained in the audio file

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 44

6.3 Low Level Features

Typical Low Level Features – Mean amplitude, loudness – Frequency distribution – Pitch

– Brightness – Bandwidth

• Measured in the ...

…Time domain (amplitude versus time)

... Frequency domain (intensity versus frequency)

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 45

6.3 Low Level Features

Amplitude

– Pressure fluctuations around the zero point – Silence is equivalent to 0 amplitude

Average energy

– Characterizes the volume of the signal

with N as the total number of measurements and x

n

as n

th

measurement

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 46

6.3 Features in the Time Domain

Zero-Crossing Rate

– Frequency of the sign change in the signal

with sgn as a sign function (signum)

6.3 Features in the Time Domain

Silence Ratio

– Proportion of values that belong to a period of complete silence

• We must first establish:

– The amplitude value below which a pitch is considered to be silence

– The minimum number of consecutive readings that need to be silent, to form a silence period

6.3 Features in the Time Domain

(9)

Fourier transform of the signal – Decomposition into frequency

components with coefficients (Fourier coefficients) – Representation of frequency

spectrum of the signal

•Size of the coefficients of the frequency (represents the amount of energy per frequency)

•Usually measured in decibels (dB)

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 49

6.3 Features in the Frequency Domain

• "Ahhh" sound and Fourier spectrum

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 50

6.3 Frequency Spectrum

Bandwidth

– Interval of occurring frequencies

– Difference between the largest and smallest frequency in the spectrum (the minimum frequency is

considered to be the first frequency above the silence threshold)

– Can also be used for classification, e.g., bandwidth in music is higher than for voice

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 51

6.3 Features in the Frequency Domain

Power Distribution – Can be read directly from the

spectrum

– Distinction of frequencies with high/low energy

– Calculation of frequency bands with high/low energy – Centroid as the center of the spectral energy

distribution (brightness)

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 52

6.3 Features in the Frequency Domain

Harmony

– The lowest of all the “loud “ frequencies is called the fundamental frequency

Harmony of a signal increases when the dominant components in the spectrum are multiples of the fundamental frequency

– E.g., standard pitch A, as the fundamental frequency (440 Hz) produced on a violin creates harmonic oscillations at 880 Hz, 1320 Hz, 1760 Hz, etc.

6.3 Features in the Frequency Domain 6.3 Harmony

Harmonic oscillations Frequency spectrum of a sound played on an instrument

(10)

Pitch

– Can be approximated by means of the Fourier spectrum

– The value is calculated from the frequencies and amplitudes of the peaks

– Related to the fundamental frequency, which is often used as an approximation

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 55

6.3 Features in the Frequency Domain

• Audio Retrieval - Basics of Audio Data

- Audio Information in Databases - Audio Retrieval

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 56

This Lecture

• Classification and Retrieval of Audio – Low level Audio Features

– Difference Limen – Pitch Detection

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 57

Next lecture

Referenzen

ÄHNLICHE DOKUMENTE

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2..

Data Warehousing & OLAP –Wolf-Tilo Balke–Institut für Informationssysteme –TU Braunschweig

– Basic classifiers may individually achieve a precision just better than random classification on difficult training data. – But if independent classifiers are used together, they

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 7?. 1.1

Multimedia Databases – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 3!. 2.1 Multimedia

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2?. 3 Using Textures for

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2?. 4

Multimedia Databases– Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig 2.. 5