Frequency spectrum

(1)

schmidt@informatik.

haw-hamburg.de

Introduction to Audio Coding

o Fundamentals of Sound o Sampling & Quantization o Compression

o Coding Standards

(2)

schmidt@informatik.

haw-hamburg.de

The Nature of Sound

o Conversion of energy into vibrations in the air (or some other elastic medium)

o Most sound sources vibrate in complex ways leading to sounds with components at several different

frequencies

o Sound is characterized at a very small timescale o

Frequency spectrum

– relative amplitudes of the frequency components

o Range of human hearing: roughly 20Hz–20kHz, falling off with age

(3)

schmidt@informatik.

haw-hamburg.de

Psychoacoustics

(4)

schmidt@informatik.

haw-hamburg.de

Waveforms

o Sounds change over time

- e.g. musical note has attack and decay, speech changes constantly

o Frequency spectrum alters as sound changes o Waveform is a plot of amplitude against time

- Provides a graphical view of characteristics of a changing sound

- Can identify syllables of speech, rhythm of music, quiet and loud passages, etc

(5)

schmidt@informatik.

haw-hamburg.de

Digitization – Sampling

o Audio timescale implies minimum rate of 40kHz to reproduce sound up to limit of hearing

- CD: 44.1kHz

- Sub-multiples often used for low bandwidth - e.g. 22.05kHz for Internet audio

- DAT: 48kHz

- (Hence mixing sounds from CD and DAT will require some resampling, best avoided)

(6)

schmidt@informatik.

haw-hamburg.de

Digitization – Quantization

o 16 bits, 65536 quantization levels, CD quality o 8 bits: audible quantization noise, can only use if some distortion is acceptable, e.g. voice communication

o Dithering – introduce small amount of random noise before sampling

- Noise causes samples to alternate rapidly between quantization levels, effectively smoothing sharp

transitions

(7)

schmidt@informatik.

haw-hamburg.de

Undersampling & Dithering

(8)

schmidt@informatik.

haw-hamburg.de

Data Size

o Sampling rate r is the number of samples per second, Sample size s bits

o Each second of digitized audio requires rs /8 bytes

o CD quality: r = 44100, s = 16, hence each

second requires just over 86 kbytes (k=1024), each minute roughly 5Mbytes (mono)

o CD quality, stereo, 3-minute song requires over

25 Mbytes

(9)

schmidt@informatik.

haw-hamburg.de

Compression

o In general,

lossy

methods required because of complex and unpredictable nature of audio data o Difference in perceiving sound and image means

different approach from image compression:

+ Sampling & Quantization: PCM – DPCM – ADPCM + Special voice coding: Vocoders

+ Transform coding: DFT, DCT, MDCT + Perceptual coding

+ Entropy coding

(10)

schmidt@informatik.

haw-hamburg.de

Companding

o Non-linear quantization

o Higher quantization levels spaced

further apart than lower ones

o Quiet sounds represented in

greater detail than loud ones

o µ-law, A-law

(16 → 8 bit)

(11)

schmidt@informatik.

haw-hamburg.de

Improved Pulse Code Modulation

o Differential Pulse Code Modulation

- Similar to video inter-frame compression

- Compute a predicted value for next sample, store the difference between prediction and actual value

o Adaptive Differential Pulse Code Modulation

- Dynamically vary step size used to store quantized differences

(12)

schmidt@informatik.

haw-hamburg.de

Speech Coders

There is a variety of well known ITU voice coders:

o G.711 – ‘the’ ISDN codec, µ / A-law companding, 8 bit@8kHz, (2:1)

o G.721, G.722, G.723

o G.726, G.727 – ADPCM with linear prediction, variable codewords 2 – 5 bit@8kHz (≤ 8:1) o GSM 06.10 - ADPCM with linear prediction,

1,6bit@8kHz (10:1)

o Current compression values up to 50 : 1, 10 : 1 with

decent quality (see www.speex.org)

(13)

schmidt@informatik.

haw-hamburg.de

Perceptually-Based Compression

o Identify and discard data that doesn't affect the perception of the signal

- Needs a

psycho-acoustical model

, since ear and brain do not respond to sound waves in a simple way

o Threshold of hearing – sounds too quiet to hear

o Masking – sound obscured by some other sound

(14)

schmidt@informatik.

haw-hamburg.de

Compression by Masking

o Split signal into bands of frequencies using filters

- Commonly use 32 bands

o Compute

masking level

for each band, based on its average value and a psycho-acoustical model

- i.e. approximate masking curve by a single value for each band

o Discard signal if it is below masking level

o Otherwise quantize using the minimum number of bits that will mask quantization noise

o Also: Temporal Masking

(15)

schmidt@informatik.

haw-hamburg.de

MPEG Audio Coding

o MPEG Audio, Layer 3

o Three layers of audio compression in MPEG-1 (MPEG-2 essentially identical)

o Layer 1...Layer 3, encoding proces increases in complexity, data rate for same quality decreases

- Layer 2: temporal & frequency masking – Layer 3: MDCT + …

- e.g. Same quality 192kbps at Layer 1, 128kbps at Layer 2, 64kbps at Layer 3 o 10:1 compression ratio at high quality

o Variable bit rate coding (VBR)

(16)

schmidt@informatik.

haw-hamburg.de

References

• Z. Li, M. Drew: Fundamentals of Multimedia, Pearson Prentice Hall, 2004.

• K. Rao, Z. Bojkovic, D. Milavanovic, Multimedia Communication Systems, Prentice Hall, 2002.

• N. Chapman, J. Chapman: Digital Multimedia, 2^nd edition, Wiley, Chichester, GB, 2004.