• Keine Ergebnisse gefunden

Multimedia Databases

N/A
N/A
Protected

Academic year: 2021

Aktie "Multimedia Databases"

Copied!
58
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Multimedia Databases

Wolf-Tilo Balke Janus Wawrzinek

Institut für Informationssysteme

Technische Universität Braunschweig

(2)

• Shape-based Features

- Chain Codes

- Area-based Retrieval - Moment Invariants - Query by Example

Previous Lecture

(3)

6 Audio Retrieval

6.1 Basics of Audio Data

6.2 Audio Information in Databases 6.3 Audio Retrieval

6 Audio Retrieval

(4)

• Information transfer through sound

– Audio (Latin, "I hear")

• Three different types of data:

Music

Spoken textNoise

6.1 Basics of Audio Data

(5)

Auditory perception

through pressure fluctuations in the air

• Eardrum vibrates synchronously

• Ear bones amplify and transmit the vibrations

• Auditory hair cells in the ear cochlea, are stimulated by the vibrations

• Neurons produce

6.1 Basics

(6)

6.1 Basics

(7)

• Our brain

only interprets two major

properties of sound:

PitchVolume

6.1 Basics

(8)

• Quantitative performance of the sound wave

Amplitude as volume

– Logarithmic perception (tenfold increase in

amplitude doubles

the perceived loudness)

Frequency as pitch

– Number of periods per unit time

6.1 Basics

(9)

• Audio signals are time-dependent (overlapping) waveforms

6.1 Basics

(10)

6.1 Basics

(11)

• Constructive and destructive interference

6.1 Basics

(12)

• Audio examples

6.1 Basics

(13)

• Musical instruments are classified based on the vibration generator

– E.g., string-, blowing-, percussion

• Acoustics depends on the vibration generator

– E.g., strings-, air, membrane instruments

Synthetic creation needs an oscillator

The oscillator generates voltage oscillations

Speaker transmit the voltage changes on a membrane

6.1 Sound Creation

(14)

• Influence of the oscillator

– Higher voltage ⟹ Higher frequency (Moog, 1964)

• Amplifier influences the amplitude thus the volume

• ADSR (attack-decay-sustain-release) envelope influences the loudness of a sound in time

6.1 Sound Creation

(15)

6.1 Sound Creation

Moog 901B (1964) Modular Moog

Synthesizer (1967)

(16)

Emerson, Lake & Palmer:

The Great Gates of Kiev (1974)

6.1 Sound Creation

(17)

• Synthesized sounds seem rather metallic. For producing a single synthesized sound, consider four typical phases:

Attack: speed and strength of the signal rise Decay: lowering the level

Sustain: actual pitch Release: end of the

signal

6.1 ADSR envelope

(18)

• Transformation of the continuous sound wave into a discrete representation

Sampling: save at regular intervals, the current amplitude value of vibration

• Clearly, we have to reconstruct audio signals from these values

6.1 Digitalization of Audio Data

Amplitude

(19)

• Basic characteristics

Sampling rate: how many times per unit time is the value of the continuous signal tapped?

Resolution: which accuracy are the values recorded with?

• Often, a resolution of 16 bits is used (2

16

different amplitude values)

• The sampling rate is application dependent:

– Audio CD: 44100 Hz – Phone: 8000 Hz

6.1 Sampling

(20)

It is very important to uniquely reconstruct the initial oscillation

• The higher the sampling frequency, the more values must be saved

• Minimum sampling frequency?

Sampling theorem (Nyquist, 1928)

Sampling rate must be at least twice as large

as the highest frequency occurring in the signal

6.1 Sampling Rate

(21)

6.1 Sampling Rate

(22)

• Phone: 8000 Hz

• DVD audio: 96,000 Hz or 192,000 Hz

Audio CD: 44,100 measurements per second for two stereo channels with

16 bits per measurement results in 176.4 kB/s

⟹ ca 10 MB/min, i.e., 635 MB/h

6.1 Sampling Rate

(23)

• For space reasons, is digital data usually stored in compressed form

Known uncompressed formats:

– AIFF: *. aif (Apple Inter opportunity File Format) – Wave: *. wav (Windows)

– IRCAM: *. snd (Institut de Recherche et Coordination Acoustique / Musique)

– AU: *. au (Sun audio)

6.1 Audio Formats

(24)

Data reduction: with (lossy) or without information loss (lossless)

Lossless compression methods generally don’t compress very much

– Free Lossless Audio Codec

• 50–60% of their original size

– Apple Lossless – WavPack

6.1 Compression

(25)

Lossy compression algorithms typically are based on simple transformations

– Modified Discrete Cosine

Transformation (MDCT) or wavelets

Encoding: transformation of the waveform in frequency sequences (sampling)

Decoding: Reconstruction of waveform from these values

• What data can we afford loosing?

6.1 Compression

(26)

• Change of the data without changing the subjective perception

– Omit very high/low frequencies

– Save superimposed frequencies with less precision – Use of other effects according to psychoacoustic

model, e.g., low tones before/after very loud sounds and frequency changes at a minimum distance are

impossible to hear ...

6.1 Compression

(27)

• MPEG-1 Audio Layers I, II and III (MP3)

• CD quality at bit rates of 128 kb/s

• Coarse approach to MP3

– Channel coupling of the stereo signal by using the difference

– Cut off inaudible frequencies

– Eliminate redundancy by considering the psychoacoustic effects

– Compress data using Huffman coding

6.1 Compression

(28)

• AAC (Advanced Audio Coding)

– Industry-driven improvement of the MP3 format (supported by the MPEG)

– Used in TV-/radio broadcasts, Apple iTunes Store, ...

– Better quality at same file size – Support for multi-channel audio

– Supports 48 main sound channels with up to 96 kHz sampling rate, 16 low-frequency channels (limited to

6.1 Compression

(29)

• Lossless compression, important factors are:

– De-/compression speed

Compression rate

6.1 Compression

(30)

• Lossy compression, important factors are:

– De-/compression speed – Compression rate

Most important, the compressed audio quality!

6.1 Compression

(31)

• Lossy compression, results

6.1 Compression

(32)

• Lossy compression, results

6.1 Compression

(33)

• Communication protocol

For transmission, recording and

playing musical control information between digital instruments and the PC

Statements are not sounds, but commands that can be used e.g., by sound cards

– Some commands: Note-on, note-off, key velocity, pitch, type of instrument

• Example:

6.1 The MIDI Format

(34)

• 10 minutes music are

about 200 KB of MIDI data

– Significant savings compared to sampling, but no original sound

• Data are inputted to the PC via keyboard and outputted via synthesizer

Sequencer for caching data and changes

6.1 The MIDI Format

(35)

• Audio data

– Music, CDs

– Sound effects, “Earcons”

Audio data represent most of information transfer

– Storage of historical speeches

– Recordings of conversations, phone calls or negotiations

6.2 Audio Information in Databases

(36)

• Three main applications of audio signals in the context of databases

– Identification of audio signals (audio as query) – Classification and search of similar signals

(matching of audio)

– Phonetic synchronization

6.2 Special Applications

(37)

• Find the title, etc. for this music piece

• Monitoring of audio streams

– Control of broadcasting of advertisements on radio – Copyright Control (GEMA)

– (Remote) diagnosis based on noise

• Audio on Demand

6.2 Identification of Audio Signals

(38)

• Find perceptionally similar audio signals

– E.g., similar pieces of music, the same quotation, ...

• Recommendation

– E.g., bands with similar music

• Genre classification (rock, classical, ...)

– E.g., in audio libraries

6.2 Classification and Matching

(39)

• Synchronization of audio streams

– Speech ⟺ text, notes ⟺ audio, ...

• Retrieval of text from or to speech

– Find specific points in a speech – Verbal query to text documents

• Following of audio scores in concerts, etc.

6.2 Synchronization

(40)

• Identification

– The simplest of the three problems, in recent years, successful research

• Classification and Matching

– Often still manual annotations

– Automatic classification only works roughly, on small collections

– Matching is still largely unresolved

• Synchronization

6.2 State of the Art

(41)

• (Compressed) audio files are stored in the database as (smart) BLOBs

• Additionally, are metadata and feature vectors stored for the realization of the

search functionality

– Language: transcription as text – Music: musical notation or MIDI

6.2 Persistent Storage

(42)

• Search in audio data: metadata describe the audio file

– Semantic metadata: difficult to generate title, artist, speaker, keywords, ...

– File information: can be automatically generated e.g., time/place of recording, filename, file size, ...

– Widely used, e.g.,

music exchange markets, online shops, ...

6.3 Audio Retrieval

(43)

Manual indexing is extremely labor intensive and expensive

• Information is often incomplete, partial and subjective (e.g., genre classification)

• No possibility to Query by Example ( "Sounds like ...")

• Search only with SQL, approximate string search, etc.

6.3 Metadata-based Search

(44)

Using content of audio files

• Compare measure vs., measure

– Not very promising, inefficient

– Differences in sampling rate and resolution

• Sounds can be differentiated by certain characteristics

– Low Level Features – Logical Features

6.3 Content-based Search

(45)

Acoustic features

– Same basic idea as in image databases

– Description of signal information by means of characteristic features

– In contrast to image information we don’t use a single feature vector, but a time-dependent vector

function

– “Time-point” of the acoustic characteristics, rather than being contained in the audio file

6.3 Low Level Features

(46)

Typical Low Level Features

– Mean amplitude, loudness – Frequency distribution

– Pitch

– Brightness – Bandwidth

• Measured in the ...

…Time domain (amplitude versus time)

6.3 Low Level Features

(47)

Amplitude

– Pressure fluctuations around the zero point – Silence is equivalent to 0 amplitude

Average energy

– Characterizes the volume of the signal

with N as the total number of measurements and x

n

as n

th

measurement

6.3 Features in the Time Domain

(48)

Zero-Crossing Rate

– Frequency of the sign change in the signal

with sgn as a sign function (signum)

6.3 Features in the Time Domain

(49)

Silence Ratio

– Proportion of values that belong to a period of complete silence

• We must first establish:

– The amplitude value below which a pitch is considered to be silence

– The minimum number of consecutive readings that need to be silent, to form a silence period

6.3 Features in the Time Domain

(50)

Fourier transform of the signal

– Decomposition into frequency components with coefficients (Fourier coefficients)

Representation of frequency spectrum of the signal

• Size of the coefficients of the frequency (represents the amount of energy per frequency)

• Usually measured in decibels (dB)

6.3 Features in the Frequency Domain

(51)

• "Ahhh" sound and Fourier spectrum

6.3 Frequency Spectrum

(52)

Bandwidth

– Interval of occurring frequencies

– Difference between the largest and smallest frequency in the spectrum (the minimum frequency is

considered to be the first frequency above the silence threshold)

– Can also be used for classification, e.g., bandwidth in music is higher than for voice

6.3 Features in the Frequency Domain

(53)

Power Distribution

– Can be read directly from the spectrum

– Distinction of frequencies with high/low energy

– Calculation of frequency bands with high/low energy – Centroid as the center of the spectral energy

distribution (brightness)

6.3 Features in the Frequency Domain

(54)

Harmony

– The lowest of all the “loud “ frequencies is called the fundamental frequency

Harmony of a signal increases when the dominant components in the spectrum are multiples of the fundamental frequency

– E.g., standard pitch A, as the fundamental frequency (440 Hz) produced on a violin creates harmonic

oscillations at 880 Hz, 1320 Hz, 1760 Hz, etc.

6.3 Features in the Frequency Domain

(55)

6.3 Harmony

Harmonic oscillations Frequency spectrum of a sound played on an instrument

(56)

Pitch

– Can be approximated by means of the Fourier spectrum

– The value is calculated from the frequencies and amplitudes of the peaks

– Related to the fundamental frequency, which is often used as an approximation

6.3 Features in the Frequency Domain

(57)

• Audio Retrieval

- Basics of Audio Data

- Audio Information in Databases - Audio Retrieval

This Lecture

(58)

• Classification and Retrieval of Audio

– Low level Audio Features – Difference Limen

– Pitch Detection

Next lecture

Referenzen

ÄHNLICHE DOKUMENTE

Biologically this makes sense, since an export signal needs to bind for effective export and then allow disassembly of the complex in the cytosol (Kutay et al., 2005).

The results confirm that our method is able to significantly reduce different types of artifacts like, e.g., high frequency muscle artifacts or low frequency artifacts caused by

The results confirm that our method is able to significantly reduce different types of artifacts like, e.g., high frequency muscle artifacts or low frequency artifacts caused by

Utilizing this ability to target proteins to specific cellular domains, we could demonstrate that the dimerizer induced translocation of effector proteins to the

Abstract: The paper at hand deals with the optimization of multisine signals in terms of effective value for identification of electric drive trains, considering constraints

The central findings of this work include: the sourcing, cleaning, preparation and well tuning of parameters for diverse spatial algorithms based on data mining and domain

Selena Savić | http://ecam.ch selena.savic@fhnw.ch Governing Natures By Data.. EASST/4S

In this paper, analytic and empirical signal analysis meth- ods are investigated in order to evaluate their ability to reveal color specific patterns in EEG signals produced by