• Keine Ergebnisse gefunden

Frequency spectrum

N/A
N/A
Protected

Academic year: 2022

Aktie "Frequency spectrum"

Copied!
16
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

schmidt@informatik.

haw-hamburg.de

Introduction to Audio Coding

o Fundamentals of Sound o Sampling & Quantization o Compression

o Coding Standards

(2)

schmidt@informatik.

haw-hamburg.de

The Nature of Sound

o Conversion of energy into vibrations in the air (or some other elastic medium)

o Most sound sources vibrate in complex ways leading to sounds with components at several different

frequencies

o Sound is characterized at a very small timescale o

Frequency spectrum

– relative amplitudes of the frequency components

o Range of human hearing: roughly 20Hz–20kHz, falling off with age

(3)

schmidt@informatik.

haw-hamburg.de

Psychoacoustics

(4)

schmidt@informatik.

haw-hamburg.de

Waveforms

o Sounds change over time

- e.g. musical note has attack and decay, speech changes constantly

o Frequency spectrum alters as sound changes o Waveform is a plot of amplitude against time

- Provides a graphical view of characteristics of a changing sound

- Can identify syllables of speech, rhythm of music, quiet and loud passages, etc

(5)

schmidt@informatik.

haw-hamburg.de

Digitization – Sampling

o Audio timescale implies minimum rate of 40kHz to reproduce sound up to limit of hearing

- CD: 44.1kHz

- Sub-multiples often used for low bandwidth - e.g. 22.05kHz for Internet audio

- DAT: 48kHz

- (Hence mixing sounds from CD and DAT will require some resampling, best avoided)

(6)

schmidt@informatik.

haw-hamburg.de

Digitization – Quantization

o 16 bits, 65536 quantization levels, CD quality o 8 bits: audible quantization noise, can only use if some distortion is acceptable, e.g. voice communication

o Dithering – introduce small amount of random noise before sampling

- Noise causes samples to alternate rapidly between quantization levels, effectively smoothing sharp

transitions

(7)

schmidt@informatik.

haw-hamburg.de

Undersampling & Dithering

(8)

schmidt@informatik.

haw-hamburg.de

Data Size

o Sampling rate r is the number of samples per second, Sample size s bits

o Each second of digitized audio requires rs /8 bytes

o CD quality: r = 44100, s = 16, hence each

second requires just over 86 kbytes (k=1024), each minute roughly 5Mbytes (mono)

o CD quality, stereo, 3-minute song requires over

25 Mbytes

(9)

schmidt@informatik.

haw-hamburg.de

Compression

o In general,

lossy

methods required because of complex and unpredictable nature of audio data o Difference in perceiving sound and image means

different approach from image compression:

+ Sampling & Quantization: PCM – DPCM – ADPCM + Special voice coding: Vocoders

+ Transform coding: DFT, DCT, MDCT + Perceptual coding

+ Entropy coding

(10)

schmidt@informatik.

haw-hamburg.de

Companding

o Non-linear quantization

o Higher quantization levels spaced

further apart than lower ones

o Quiet sounds represented in

greater detail than loud ones

o µ-law, A-law

(16 → 8 bit)

(11)

schmidt@informatik.

haw-hamburg.de

Improved Pulse Code Modulation

o Differential Pulse Code Modulation

- Similar to video inter-frame compression

- Compute a predicted value for next sample, store the difference between prediction and actual value

o Adaptive Differential Pulse Code Modulation

- Dynamically vary step size used to store quantized differences

(12)

schmidt@informatik.

haw-hamburg.de

Speech Coders

There is a variety of well known ITU voice coders:

o G.711 – ‘the’ ISDN codec, µ / A-law companding, 8 bit@8kHz, (2:1)

o G.721, G.722, G.723

o G.726, G.727 – ADPCM with linear prediction, variable codewords 2 – 5 bit@8kHz (≤ 8:1) o GSM 06.10 - ADPCM with linear prediction,

1,6bit@8kHz (10:1)

o Current compression values up to 50 : 1, 10 : 1 with

decent quality (see www.speex.org)

(13)

schmidt@informatik.

haw-hamburg.de

Perceptually-Based Compression

o Identify and discard data that doesn't affect the perception of the signal

- Needs a

psycho-acoustical model

, since ear and brain do not respond to sound waves in a simple way

o Threshold of hearing – sounds too quiet to hear

o Masking – sound obscured by some other sound

(14)

schmidt@informatik.

haw-hamburg.de

Compression by Masking

o Split signal into bands of frequencies using filters

- Commonly use 32 bands

o Compute

masking level

for each band, based on its average value and a psycho-acoustical model

- i.e. approximate masking curve by a single value for each band

o Discard signal if it is below masking level

o Otherwise quantize using the minimum number of bits that will mask quantization noise

o Also: Temporal Masking

(15)

schmidt@informatik.

haw-hamburg.de

MPEG Audio Coding

o MPEG Audio, Layer 3

o Three layers of audio compression in MPEG-1 (MPEG-2 essentially identical)

o Layer 1...Layer 3, encoding proces increases in complexity, data rate for same quality decreases

- Layer 2: temporal & frequency masking – Layer 3: MDCT + …

- e.g. Same quality 192kbps at Layer 1, 128kbps at Layer 2, 64kbps at Layer 3 o 10:1 compression ratio at high quality

o Variable bit rate coding (VBR)

(16)

schmidt@informatik.

haw-hamburg.de

References

• Z. Li, M. Drew: Fundamentals of Multimedia, Pearson Prentice Hall, 2004.

• K. Rao, Z. Bojkovic, D. Milavanovic, Multimedia Communication Systems, Prentice Hall, 2002.

• N. Chapman, J. Chapman: Digital Multimedia, 2nd edition, Wiley, Chichester, GB, 2004.

Referenzen

ÄHNLICHE DOKUMENTE

• Capsule summary of technological developments (e.g., magnetic recording, multiple-channel, Dolby) in sound film subsequent to 1927, to stress the separation of production

As a musicologist, scholar of voice and sound studies, singer, and voice teacher, I consider vocal timbre here within a contemporary music context while keeping a keen ear tuned

and Gravity’s Rainbow, in a transi- tional period and also set in many heterogeneous locations; and Vineland, Inherent Vice, and Bleeding Edge, after the advent of a full-scale

In this paper, we consider the low frequency expansion of the resolvent problem correspending to linear thermoelasticity with second sound in a three-dimensional exterior domain..

ISO 3741 Determination of sound power levels of noise sources using sound pressure - Precision methods for reverberant rooms The sound pressure level caused by the machine is

KEVELAER. Beim „Quiz für Besserwisser“ im Hotel Kloster- garten in Kevelaer ist man nicht nur Publikumsjoker, sondern selbst Kandidat. September, kann jeder von 19.30 bis etwa

Die Demo-Route der Gruppe, die sich der internationalen Fri- days for Future – Bewegung nach dem Vorbild von Greta Thun- berg angeschlossen hat, startet mit einer Kundgebung am

The following examines how a specific space in California is recre- ated in acoustic form in an audioplay as well as how the forms of generalization found, and categories