• Keine Ergebnisse gefunden

Huffman Encoding

N/A
N/A
Protected

Academic year: 2022

Aktie "Huffman Encoding"

Copied!
9
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

10.07.12 | Komplexität | 272

Huffman Encoding

Let some text be given:

How is such a text usually encoded?

-> e.g.: Subset of ASCII-letters What might be an „optimal“ code?

Assumptions:

•  every letter si in the original text is replaced by a code li .

•  We are looking for an optimal code in the sense that this code minimizes the averaged code word length.

The averaged code word length L is compiuted as follows:

L = pili

i=1 n

(2)

10.07.12 | Komplexität | 273

Huffman Encoding

Rough description of the algorithm:

1.) examine, how often each letter occurs in the original text.

2.) build a so called Huffman Tree

3.) build a table with so called Huffman Codes

(3)

10.07.12 | Komplexität | 274

Huffman Encoding

1.) examine, which letter occurs how often in the given text

go through the input text and count the occurrences of each letter.

Example.: „test_string“

letters _ e g i n r s t occurrences 1 1 1 1 1 1 2 3

(4)

10.07.12 | Komplexität | 275

Huffman Encoding

2.) build the so called Huffman Tree

Build the tree as follows: Firstly, each occuring letter is caught in its own tree.

Thereafter, those two trees that have the smallest number of occurrences are brought together. The sum of the occurrences of the old roots is written into a new root node.

Example.:

_ 1

e 1

g 1

i 1

n 1

r 1

s 2

t 3

_ 1

e 1 2

(5)

10.07.12 | Komplexität | 276

Huffman Encoding

g 1

i 1 n

1

r 1 s

2 t

3 _

1

e 1

2 2

2 4

7 4

11

(6)

10.07.12 | Komplexität | 277

Huffman Coding

3.) build a table with the final Huffman Codes

000 _

001 e

010 g

011 n

10 t

110 s

1110 i

1111 r

Encoded text:

10001110100001101011111110011010

Observation: No code is prefix of another code.

(7)

10.07.12 | Komplexität | 278

Let Σ be the alphabet for which the code is to be generated. It contains

| Σ | = n letters (characters).

Lemma 1: Every inner node in a minimal prefix tree possesses two children.

Proof: Let us assume that a minimal tree T, which possesses an inner node with only one child, exists. Then, we construct a tree T‘ with one node less:

We remove the single successor and replace it by its child-node.

For this new tree is valid: some encodings of some letters have been shortened. This is a contradictoin to the assumption that the tree T was minimal.

(8)

10.07.12 | Komplexität | 279

Lemma 2: Let si and sj be those letters with smallest occuring probability.

Then, si and sj have maximum depth in T.

Proof:

Assumption: there is a letter s that is placed in maximum depth, but not having smallest occuring probability.

Then we exchange s with si or with sj and receive a smaller total encoding.

(9)

10.07.12 | Komplexität | 280

Optimality of Huffman-Coding

Theorem: The Huffman-Coding has minimal expected encoding length.

Proof by induction over |Σ|.

• Induction start for |Σ| ≤ 2 is clear.

• Now, let |Σ|>2 and let T be a tree, representing the optimal prefix code for Σ.

• 1st observation: Every inner node in T has two children (otherwise contradiction to optimality).

• 2nd observation: Let si and sj be the letters with smallest occuring probability. Then si and sj are in maximum depth in T

(otherwise contradiction to optimality).

• Thus: si and sj are in T as in the Huffman-Tree

• Replace si and sj with a new letter s with Prob(s ) = Prob(si ) + Prob(sj ).

• Induct.-assumption.: Remaining Huffman-Tree for new Σ is optimal

⇒ induction step

Referenzen

ÄHNLICHE DOKUMENTE

 Erstelle Klasse mit langen und vielen Methoden anstatt weitere Klassen einzuführen.  Eigene Implementation anstatt die Plattform Funktionen

Es gelten nach wie vor die Voraussetzungen, dass alle Wahlserver und -datenbanken sicher und zuverlässig sind, dass alle TANs auf allen Code Sheets zufällig und

Odor quality may be encoded in the identity of the active output fibers (thus in the spatial com- ponent of the odor code), whereas stimulus properties such as concentration and

Combinations  of  histone  modification  marks  are  highly  informative  of  the   methylation  and  accessibility  levels  of  different  genomic  regions,  while  the

Combinations of histone modification marks are highly informative of the methylation and accessibility levels of different genomic regions, while the converse is not always

The basis file defines a basic solution, a basis matrix and a triangular permutation for this basis. The order in which basic arcs are entered is the order of corresponding

Finally, the six basic units (encoder, arithmetic unit, control panel, drum memory, core memory, and decoder) are combined to form a manually- operated digital

The specific materiality of a video is already a historical form of embodiment which defines itself through the original image and sound carriers onto which the code has