schmidt@informatik.
haw-hamburg.de
Introduction to Video Coding
o Motivation & Fundamentals o Principles of Video Coding o Coding Standards
Special Thanks to Hans L. Cycon from FHTW Berlin for
providing first-hand knowledge and much of the material !
schmidt@informatik.
haw-hamburg.de
Video Data – the Problem
o PAL uncompressed
- 768x576 pixels per frame
- x 3 bytes per pixel (24 bit colour) - x 25 frames per second
- ≈ 32 MB per second - ≈ 1.9 GB per minute
→ Raw video data not device compliant!
→ Even cameras need immediate compression
schmidt@informatik.
haw-hamburg.de
Signal Transmission Scheme
channel
Coder Decoder
Saving of bit rate Reconstruction of signal
schmidt@informatik.
haw-hamburg.de
Fundamentals
Why don’t we just use *zip?
o Suppose our video-pixels attain N values i with probability p
io and we know nothing about them (just iid random)
o Then (Shannon):
The Entropy
is the minimal bound for data needed (mean of information)
o For individually encoded pixels this results in optimal compression rates around 1.33 …
! Image and video pixels are not iid random, but highly correlated
! Correlations are hidden from the individual pixel level
∑
=−
=
Ni
i
i
p
p H
1
2
( )
log
Image Compression Concepts
schmidt@informatik.
haw-hamburg.de
o lossless, by removing redundancies
- spatial redundancies - temporal redundancies
- spatial-temporal correlations - statistical redundancies
o lossy, by removing (visually) irrelevant information
- reduction of accuracy in colors,
contours and motion
schmidt@informatik.
haw-hamburg.de
Image Quality Measure
⎟⎟ ⎠
⎜⎜ ⎞
⎝
= ⎛
−
=
∑ f f MSE
N PSNR
Q
Ni
cmp i org
i
2
2
log 255 10
) 1 (
log 255 20
) (
higher performance
schmidt@informatik.
haw-hamburg.de
The Idea of Transformation
o Mathematically an image can be considered as a matrix in some high dimensional space o Transformations rotate this matrix into an
advantageous position (of sparse population) o This results in ‘compactification of energy’:
most of the coefficients will be (nearly) zero o Leads to simplified separation of irrelevant
information
schmidt@informatik.
haw-hamburg.de
T Q PC/C
o Transformation: De-correlation, compactification of energy, reversible
o Quantisation: Elimination of psycho-visual irrelevant information, not reversible
o Pre-Coder: Pre-processing for additional elimination of statistical redundancies, reversible
o Coder: Generation of variable length Codes, reversible Bitstream
Initial Image Compressor
Transform Coding
schmidt@informatik.
haw-hamburg.de
Spatial Decorrelation:
Discrete Cosine Transform - DCT
Transformation of spatial into frequency coordinates
⎪⎩
⎪ ⎨
⎧ =
= Λ
⋅ ⋅
⋅ +
⋅ + Λ
= Λ ∑∑
= =
otherwise for
j i v f
j u
i v
v u u F
i j
....
...
1
0 .
. ...
2 1 )
(
) , 16 (
) 1 2
cos ( 16
) 1 2
cos ( 4
) ( ) ) (
, (
7
0 7
0
ξ ξ
π
π
schmidt@informatik.
haw-hamburg.de
Concept of conventional DCT coding (JPEG, MPEG, H.26x)
block DCT
scanning
quanti- sation
zig-zag scanning
channel
VLC
90, 70, 10, 20, 10, 10, 30, 10, 10, 0, 0, 0, ....
8 x 8 x 10 bit
= 640 bit
90 72 11 31 0 0 0 0 14 13 5 0 0 0 0 0 15 6 3 0 0 0 0 0 4 4 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 x 8 x 4 bit
= 256 bit
90 70 10 30 0 0 0 0 10 10 100 0 0 0 0 20 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
01, 00111, 01, 01, 01, 01, 010, 01, 000001
000001 = EOB -> 26 bit
8 x 8 x 8 bit
= 512 bit
Source: Schäfer HHI [W2]
compression factor = 512/26 ≈ 20
schmidt@informatik.
haw-hamburg.de
Transformed Representation
o Concentration of information in few spectral coefficients (decorrelation)
1 2 3 4
5 6
7 8
8 6
4 2
0 50 100 150 200 250 300 350
schmidt@informatik.
haw-hamburg.de
Transformed Representation
o Concentration of information in few spectral coefficients (decorrelation)
1 2 3 4
5 6
7 8
8 6
4 2
0 50 100 150 200 250 300 350
16 of 64 coefficients
schmidt@informatik.
haw-hamburg.de
Transformed Representation
o Concentration of information in few spectral coefficients (decorrelation)
1 2 3 4
5 6
7 8
8 6
4 2
0 50 100 150 200 250 300 350
16 of 64 coefficients
4 of 64 coefficients
schmidt@informatik.
haw-hamburg.de
Transformed Representation
o Concentration of information in few spectral coefficients (decorrelation)
1 2 3 4
5 6
7 8
8 6
4 2
0 50 100 150 200 250 300 350
16 of 64 coefficients
4 of 64 coefficients
1 of 64 coefficients
Source: Schäfer HHI [W2]
schmidt@informatik.
haw-hamburg.de
Problem of DCT: Blocking Artefacts
DCT 1:64
Original
schmidt@informatik.
haw-hamburg.de
Alternative Transformation: DWT
DCT 1:64 WLT 1:64
Original
schmidt@informatik.
haw-hamburg.de
Transform Coding Decoding (DCT- or Wavelet- based)
T Q
IT IQ IC
Image
Rec.Image
compressed bitstream lossless
decorelation
lossy Quantizer entropy coder C
Temporal Decorrelation:
Difference Coding
schmidt@informatik.haw-hamburg.de
In slow moving scenes many subsequent images are nearly alike:
→ Temporal Redundancy is eliminated by coding only the difference of subsequent images (Inter-Frames).
→ To limit accumulating errors full images (Intra-Frames) are coded regularly ( ≈ one of 50 frames)
t
GOP
I = Intra P = Inter
I P P P P I P P P
schmidt@informatik.
haw-hamburg.de
Hybrid Decorrelation: Difference Coding with Motion Prediction
Source: Schäfer HHI [W2]
schmidt@informatik.
haw-hamburg.de
Block Motion Compensation Prediction
1 2
3 4
5 6 7 8
9 10
11 12 13 14 15
15 16 frame k-1
1 2 3 4
5 7 8
9 10 11 12
13 14 15 16
6
frame k
Block Matching
o Decomposition of previous picture into blocks
o Move & match blocks on top of next picture
o Simplify by motion vector discretisation
schmidt@informatik.
haw-hamburg.de
Bidirektional Prediction Coding
...
I frames - Intracoding (JPEG)
schmidt@informatik.
haw-hamburg.de
Bidirektional Prediction Coding
... P
I frames - Intracoding (JPEG)
P frames - Uni-directional predictive coding
schmidt@informatik.
haw-hamburg.de
Bidirektional Prediction Coding
B P
...
I frames - Intracoding (JPEG)
P frames - Uni-directional predictive coding
B frames - Bi-directional predictive coding
schmidt@informatik.
haw-hamburg.de
Bidirektional Prediction Coding
B B P
...
I frames - Intracoding (JPEG)
P frames - Uni-directional predictive coding
B frames - Bi-directional predictive coding
schmidt@informatik.
haw-hamburg.de
Bidirektional Prediction Coding
B B P P ...
...
I frames - Intracoding (JPEG)
P frames - Uni-directional predictive coding
B frames - Bi-directional predictive coding
schmidt@informatik.
haw-hamburg.de
Bidirektional Prediction Coding
B B P B B P ...
...
I frames - Intracoding (JPEG)
P frames - Uni-directional predictive coding
B frames - Bi-directional predictive coding
schmidt@informatik.
haw-hamburg.de
Bi-directional
Prediction
schmidt@informatik.
haw-hamburg.de
Statistical Coding Principles/
Entropy Coding
Huffmann Coder (variable length symbolic coder)
• Assign to every fixed word a variable length code word
• Frequent words → short code word, rare words → long code
Improvement: Arithmetic Coder
• Map entire sequences of symbols on [0,1] (also binary mapping)
Run-Length Coder
• abbbbbbbbcc → a7b!cc
Pattern Substitution: Dictionary Coding
• Represent repeating sequences of symbols by pointers
Context Modelling (Pre-Coding)
• Determine local conditional probabilities for symbols, instead of global frequencies
schmidt@informatik.
haw-hamburg.de
Layered Coding
Scalability and adaptability to varying play-out scenarios may be achieved through coding layers:
o Spatial layers → range of (pixel) resolutions
o Data partitioning layers → high and low priority data o SNR layers → range of ‘visual’ resolutions
o Temporal layers → range of frame rates
schmidt@informatik.
haw-hamburg.de
Video Coding Standards
Video Coding Standards are defined in ranges of applicability (image resolution, bandwidth, computational complexity, power consumption …), initially for specific target groups:
o ISO Moving Pictures Experts Group MPEG
- MPEG-1 (1989): CD-ROM applications at ≈ 1,5 Mb/s - MPEG-2 (1991): High Quality Coding at 2 – 50 Mb/s
- MPEG-4 (1998): Scalable ≈ 64 kb/s – 4 Mb/s – 100 Mb/s (V3) o ITU-T
- H.261 (1991): Video telephony, video conferencing ≈ 64 kb/s – 1 Mb/s - H.263 (1996): Low bit rate coding (ISDN) ≈ 8 kb/s – 1 Mb/s
- H.26L (2001): Low bit rate, low complexity
- H.264/AVC (2003): Joint with ISO, dbld. compr. of MPEG-4, 8 kb/s – 100 Mb/s
schmidt@informatik.
haw-hamburg.de
Milestones in Video Compression
0 100 200 300 400 500
26 28 PSNR
[dB]
DCT
(Motion JPEG) (1985)
Foreman 10 Hz, QCIF
133 frames encoded
Bit-Rate [kbps]
MPEG1/2 1994 MPEG4/H263
1998 H.120
1988
H.261 1991 H.26L
(2001) H264
2002
Bit rate Reduction 85%
30 32 34 36 38
30 32 34 36 38
Visual Gain 10dB
schmidt@informatik.
haw-hamburg.de
MPEG-2
o Aiming at TV quality (interlacing), but generic picture format: The ‘DVD-Standard’
o Discrete Cosine Transform (8 x 8 blocks)
o Motion compensation and prediction (I, P, B – Frames) o Supports coding layers
o Error resilience by interpolation
o Supports multiple audio
and video flows
schmidt@informatik.
haw-hamburg.de
H.263
o Aiming at telecommunication: CIF + QCIF formats.
The ‘old’ video conferencing standard
o Discrete Cosine Transform (8 x 8 blocks)
o Improved motion compensation (precision, variable block size, overlapping blocks)
o Prediction with PB-frame (interpolated B component) o Advanced negotiability
o Arithmetic coding
schmidt@informatik.
haw-hamburg.de
MPEG-4
o Ambitious standard to encode ‘multimedia streams’
(including interactivity)
o Focus of interest on video compression, based on a collection of profiles: Simple, advanced simple, …
o Content based compression, motion prediction, scaling o Concept of Video Object Planes (I/P/B-VOPs)
- Motion estimation and compensation - Shape coding
- Texture coding (DCT, but also wavelet based) - Sprite coding
o Adaptive techniques (motion comp., arithmetic coding,
error resilience …)
MPEG4
Generic Coding Scheme
schmidt@informatik.
haw-hamburg.de
schmidt@informatik.
haw-hamburg.de
MPEG4
System
Model
schmidt@informatik.
haw-hamburg.de
H.264/AVC
o Aiming at full scalability: from 3GPP to HDTV o Approval May 2003 (Editor T. Wiegand, HHI) o New 4x4 integer transform (of DCT kind)
o Many modes:
- Adaptive block size for transform
- Adaptive blocking for motion compensation - Adaptive Intra prediction
- Two VL Entropy codings: CAVLC + CABAC (D. Marpe, HHI)
o Content adaptive deblocking filters o Complexity:
- 8 – 10 times MPEG-2 for encoding
- 3 times MPEG-2 for decoding
schmidt@informatik.
haw-hamburg.de
H.264: Structure
Deq./Inv.
Transform
Motion- Compensated
Predictor
Control Data Quant.
Transf. coeffs
Motion Data 0
Intra/Inter
Coder Control
Decoder
Motion Estimator Transform/
Quantizer
-
Entropy Coding
schmidt@informatik.
haw-hamburg.de
Deblocking Filter
Source: Schäfer HHI [W2]
schmidt@informatik.
haw-hamburg.de
What else?
o MPEG-7: Multimedia Content Description Interface - Meta data standard
- Goal: describe multimedia data for search, retrieval and (combined/synchronized) play out
o MPEG-21: Multimedia Framework (just finishing) - Meta data standard for multimedia applications o Proprietary codecs:
- RealNetworks: Helix - Microsoft: VC-1
- a few more …
- (at most) similar performance, similar ‘ideas’ visible
- pay per ???
schmidt@informatik.
haw-hamburg.de