Introduction to Video Coding

(1)

schmidt@informatik.

haw-hamburg.de

Introduction to Video Coding

o Motivation & Fundamentals o Principles of Video Coding o Coding Standards

Special Thanks to Hans L. Cycon from FHTW Berlin for

providing first-hand knowledge and much of the material !

(2)

schmidt@informatik.

haw-hamburg.de

Video Data – the Problem

o PAL uncompressed

- 768x576 pixels per frame

- x 3 bytes per pixel (24 bit colour) - x 25 frames per second

- ≈ 32 MB per second - ≈ 1.9 GB per minute

→ Raw video data not device compliant!

→ Even cameras need immediate compression

(3)

schmidt@informatik.

haw-hamburg.de

Signal Transmission Scheme

channel

Coder Decoder

Saving of bit rate Reconstruction of signal

(4)

schmidt@informatik.

haw-hamburg.de

Fundamentals

Why don’t we just use *zip?

o Suppose our video-pixels attain N values i with probability p

_i

o and we know nothing about them (just iid random)

o Then (Shannon):

The Entropy

is the minimal bound for data needed (mean of information)

o For individually encoded pixels this results in optimal compression rates around 1.33 …

! Image and video pixels are not iid random, but highly correlated

! Correlations are hidden from the individual pixel level

∑

=

−

=

^N

i

p

p H

1

2

( )

log

(5)

Image Compression Concepts

schmidt@informatik.

haw-hamburg.de

o lossless, by removing redundancies

- spatial redundancies - temporal redundancies

- spatial-temporal correlations - statistical redundancies

o lossy, by removing (visually) irrelevant information

- reduction of accuracy in colors,

contours and motion

(6)

schmidt@informatik.

haw-hamburg.de

Image Quality Measure

⎟⎟ ⎠

⎜⎜ ⎞

⎝

= ⎛

−

=

∑ ^f ^f ^MSE

N PSNR

Q

N

i

cmp i org

i

2

log 255 10

) 1 (

log 255 20

) (

higher performance

(7)

schmidt@informatik.

haw-hamburg.de

The Idea of Transformation

o Mathematically an image can be considered as a matrix in some high dimensional space o Transformations rotate this matrix into an

advantageous position (of sparse population) o This results in ‘compactification of energy’:

most of the coefficients will be (nearly) zero o Leads to simplified separation of irrelevant

information

(8)

schmidt@informatik.

haw-hamburg.de

T Q PC/C

o Transformation: De-correlation, compactification of energy, reversible

o Quantisation: Elimination of psycho-visual irrelevant information, not reversible

o Pre-Coder: Pre-processing for additional elimination of statistical redundancies, reversible

o Coder: Generation of variable length Codes, reversible Bitstream

Initial Image Compressor

Transform Coding

(9)

schmidt@informatik.

haw-hamburg.de

Spatial Decorrelation:

Discrete Cosine Transform - DCT

Transformation of spatial into frequency coordinates

⎪⎩

⎪ ⎨

⎧ =

= Λ

⋅ ⋅

⋅ +

⋅ + Λ

= Λ ∑∑

= =

otherwise for

j i v f

j u

i v

v u u F

i j

....

...

1 0 .

. ...

2 1 )

(

) , 16 (

) 1 2

cos ( 16

) 1 2

cos ( 4

) ( ) ) (

, (

7

0 7

0

ξ ξ

π

(10)

schmidt@informatik.

haw-hamburg.de

Concept of conventional DCT coding (JPEG, MPEG, H.26x)

block DCT

scanning

quanti- sation

zig-zag scanning

channel

VLC

90, 70, 10, 20, 10, 10, 30, 10, 10, 0, 0, 0, ....

8 x 8 x 10 bit

= 640 bit

90 72 11 31 0 0 0 0 14 13 5 0 0 0 0 0 15 6 3 0 0 0 0 0 4 4 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

8 x 8 x 4 bit

= 256 bit

90 70 10 30 0 0 0 0 10 10 100 0 0 0 0 20 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

01, 00111, 01, 01, 01, 01, 010, 01, 000001

000001 = EOB -> 26 bit

8 x 8 x 8 bit

= 512 bit

Source: Schäfer HHI [W2]

compression factor = 512/26 ≈ 20

(11)

schmidt@informatik.

haw-hamburg.de

Transformed Representation

o Concentration of information in few spectral coefficients (decorrelation)

1 2 3 4

5 6

7 8

8 6

4 2

0 50 100 150 200 250 300 350

(12)

schmidt@informatik.

haw-hamburg.de

Transformed Representation

o Concentration of information in few spectral coefficients (decorrelation)

1 2 3 4

5 6

7 8

8 6

4 2

0 50 100 150 200 250 300 350

16 of 64 coefficients

(13)

schmidt@informatik.

haw-hamburg.de

Transformed Representation

o Concentration of information in few spectral coefficients (decorrelation)

1 2 3 4

5 6

7 8

8 6

4 2

0 50 100 150 200 250 300 350

(14)

schmidt@informatik.

haw-hamburg.de

Transformed Representation

o Concentration of information in few spectral coefficients (decorrelation)

1 2 3 4

5 6

7 8

8 6

4 2

0 50 100 150 200 250 300 350

(15)

schmidt@informatik.

haw-hamburg.de

Problem of DCT: Blocking Artefacts

DCT 1:64

Original

(16)

schmidt@informatik.

haw-hamburg.de

Alternative Transformation: DWT

DCT 1:64 WLT 1:64

Original

(17)

schmidt@informatik.

haw-hamburg.de

Transform Coding Decoding (DCT- or Wavelet- based)

T Q

IT IQ IC

Image

Rec.Image

compressed bitstream lossless

decorelation

lossy Quantizer entropy coder C

(18)

Temporal Decorrelation:

Difference Coding

schmidt@informatik.

haw-hamburg.de

In slow moving scenes many subsequent images are nearly alike:

→ Temporal Redundancy is eliminated by coding only the difference of subsequent images (Inter-Frames).

→ To limit accumulating errors full images (Intra-Frames) are coded regularly ( ≈ one of 50 frames)

t

GOP

I = Intra P = Inter

I ^P ^P ^P ^P I ^P ^P ^P

(19)

schmidt@informatik.

haw-hamburg.de

Hybrid Decorrelation: Difference Coding with Motion Prediction

(20)

schmidt@informatik.

haw-hamburg.de

Block Motion Compensation Prediction

1 2

3 4

5 6 7 8

9 10

11 12 13 14 15

15 16 frame k-1

1 2 3 4

5 7 8

9 10 11 12

13 14 15 16

6

frame k

Block Matching

o Decomposition of previous picture into blocks

o Move & match blocks on top of next picture

o Simplify by motion vector discretisation

(21)

schmidt@informatik.

haw-hamburg.de

Bidirektional Prediction Coding

...

I frames - Intracoding (JPEG)

(22)

schmidt@informatik.

haw-hamburg.de

Bidirektional Prediction Coding

... P

I frames - Intracoding (JPEG)

P frames - Uni-directional predictive coding

(23)

schmidt@informatik.

haw-hamburg.de

Bidirektional Prediction Coding

B P

...

I frames - Intracoding (JPEG)

P frames - Uni-directional predictive coding

B frames - Bi-directional predictive coding

(24)

schmidt@informatik.

haw-hamburg.de

Bidirektional Prediction Coding

B B P

...

I frames - Intracoding (JPEG)

P frames - Uni-directional predictive coding

B frames - Bi-directional predictive coding

(25)

schmidt@informatik.

haw-hamburg.de

Bidirektional Prediction Coding

B _B _P _P ...

...

I frames - Intracoding (JPEG)

P frames - Uni-directional predictive coding

B frames - Bi-directional predictive coding

(26)

schmidt@informatik.

haw-hamburg.de

Bidirektional Prediction Coding

B _B _P _B _B _P ...

...

I frames - Intracoding (JPEG)

P frames - Uni-directional predictive coding

B frames - Bi-directional predictive coding

(27)

schmidt@informatik.

haw-hamburg.de

Bi-directional

Prediction

(28)

schmidt@informatik.

haw-hamburg.de

Statistical Coding Principles/

Entropy Coding

Huffmann Coder (variable length symbolic coder)

• Assign to every fixed word a variable length code word

• Frequent words → short code word, rare words → long code

Improvement: Arithmetic Coder

• Map entire sequences of symbols on [0,1] (also binary mapping)

Run-Length Coder

• abbbbbbbbcc → a7b!cc

Pattern Substitution: Dictionary Coding

• Represent repeating sequences of symbols by pointers

Context Modelling (Pre-Coding)

• Determine local conditional probabilities for symbols, instead of global frequencies

(29)

schmidt@informatik.

haw-hamburg.de

Layered Coding

Scalability and adaptability to varying play-out scenarios may be achieved through coding layers:

o Spatial layers → range of (pixel) resolutions

o Data partitioning layers → high and low priority data o SNR layers → range of ‘visual’ resolutions

o Temporal layers → range of frame rates

(30)

schmidt@informatik.

haw-hamburg.de

Video Coding Standards

Video Coding Standards are defined in ranges of applicability (image resolution, bandwidth, computational complexity, power consumption …), initially for specific target groups:

o ISO Moving Pictures Experts Group MPEG

- MPEG-1 (1989): CD-ROM applications at ≈ 1,5 Mb/s - MPEG-2 (1991): High Quality Coding at 2 – 50 Mb/s

- MPEG-4 (1998): Scalable ≈ 64 kb/s – 4 Mb/s – 100 Mb/s (V3) o ITU-T

- H.261 (1991): Video telephony, video conferencing ≈ 64 kb/s – 1 Mb/s - H.263 (1996): Low bit rate coding (ISDN) ≈ 8 kb/s – 1 Mb/s

- H.26L (2001): Low bit rate, low complexity

- H.264/AVC (2003): Joint with ISO, dbld. compr. of MPEG-4, 8 kb/s – 100 Mb/s

(31)

schmidt@informatik.

haw-hamburg.de

Milestones in Video Compression

0 100 200 300 400 500

26 28 PSNR

[dB]

DCT

(Motion JPEG) (1985)

Foreman 10 Hz, QCIF

133 frames encoded

Bit-Rate [kbps]

MPEG1/2 1994 MPEG4/H263

1998 H.120

1988

H.261 1991 H.26L

(2001) H264

2002

Bit rate Reduction 85%

30 32 34 36 38

Visual Gain 10dB

(32)

schmidt@informatik.

haw-hamburg.de

MPEG-2

o Aiming at TV quality (interlacing), but generic picture format: The ‘DVD-Standard’

o Discrete Cosine Transform (8 x 8 blocks)

o Motion compensation and prediction (I, P, B – Frames) o Supports coding layers

o Error resilience by interpolation

o Supports multiple audio

and video flows

(33)

schmidt@informatik.

haw-hamburg.de

H.263 o Aiming at telecommunication: CIF + QCIF formats.

The ‘old’ video conferencing standard

o Discrete Cosine Transform (8 x 8 blocks)

o Improved motion compensation (precision, variable block size, overlapping blocks)

o Prediction with PB-frame (interpolated B component) o Advanced negotiability

o Arithmetic coding

(34)

schmidt@informatik.

haw-hamburg.de

MPEG-4

o Ambitious standard to encode ‘multimedia streams’

(including interactivity)

o Focus of interest on video compression, based on a collection of profiles: Simple, advanced simple, …

o Content based compression, motion prediction, scaling o Concept of Video Object Planes (I/P/B-VOPs)

- Motion estimation and compensation - Shape coding

- Texture coding (DCT, but also wavelet based) - Sprite coding

o Adaptive techniques (motion comp., arithmetic coding,

error resilience …)

(35)

MPEG4

Generic Coding Scheme

schmidt@informatik.

haw-hamburg.de

(36)

schmidt@informatik.

haw-hamburg.de

MPEG4

System

Model

(37)

schmidt@informatik.

haw-hamburg.de

H.264/AVC

o Aiming at full scalability: from 3GPP to HDTV o Approval May 2003 (Editor T. Wiegand, HHI) o New 4x4 integer transform (of DCT kind)

o Many modes:

- Adaptive block size for transform

- Adaptive blocking for motion compensation - Adaptive Intra prediction

- Two VL Entropy codings: CAVLC + CABAC (D. Marpe, HHI)

o Content adaptive deblocking filters o Complexity:

- 8 – 10 times MPEG-2 for encoding

- 3 times MPEG-2 for decoding

(38)

schmidt@informatik.

haw-hamburg.de

H.264: Structure

Deq./Inv.

Transform

Motion- Compensated

Predictor

Control Data Quant.

Transf. coeffs

Motion Data 0

Intra/Inter

Coder Control

Decoder

Motion Estimator Transform/

Quantizer

-

Entropy Coding

(39)

schmidt@informatik.

haw-hamburg.de

Deblocking Filter

(40)

schmidt@informatik.

haw-hamburg.de

What else?

o MPEG-7: Multimedia Content Description Interface - Meta data standard

- Goal: describe multimedia data for search, retrieval and (combined/synchronized) play out

o MPEG-21: Multimedia Framework (just finishing) - Meta data standard for multimedia applications o Proprietary codecs:

- RealNetworks: Helix - Microsoft: VC-1

- a few more …

- (at most) similar performance, similar ‘ideas’ visible

- pay per ???

(41)

schmidt@informatik.

haw-hamburg.de

Introduction to Video Coding