Vector Quantization

(1)

- 1 - Digitale Signalverarbeitung und Mustererkennung

Vector Quantization

(2)

- 2 - Digitale Signalverarbeitung und Mustererkennung

Motivation: Identification of mixture probability distributions

Idea: Mixture densities

So far: Assumption that emission distribution in the HMM states is normal This is not necessarily the case!

With sufficient amount of tranining data, more „flexible “ densities make sense.

Maximum likelihood estimation of  leads to a system of equations which cannot be solved analytically 

simple parametric densities, e.g. normal densities weights with

parameters

Problem

(3)

- 3 - Digitale Signalverarbeitung und Mustererkennung

Model with a single normal distribution

Model with a weighted sum of two normal distributions (mixture distribution)

(4)

- 4 -

Digitale Signalverarbeitung und Mustererkennung

(5)

- 5 - Digitale Signalverarbeitung und Mustererkennung

Task

Categorize the professors you know into groups.

Find a suitable name for each each group, e.g.

easy entertainer

hopelessly confused theoretician shirt-sleeved practitioner

demanding sadist

…

(6)

- 6 - Digitale Signalverarbeitung und Mustererkennung

Procedure

Find relevant features

z.B. average number of anecdotes per hour percieved work load per hour or per ECTS average grade in exams

Every professor corresponds to a feature vector z.B.

… and hence to a point in the n-dimensional feature space

(7)

- 7 - Digitale Signalverarbeitung und Mustererkennung

Feature vector of a professor

Vector Quantization:

Group points into compact cells

Feature 1 Feature 2

Feature space

A point can then be approximated (quantized) by the mean value of its cell

(8)

- 8 - Digitale Signalverarbeitung und Mustererkennung

Feature vector of a professor

Feature 1 Feature 2

Feature space

Mean value of a cell

Vector Quantization:

Group points into compact cells

A point can then be approximated (quantized) by the mean value of its cell

(9)

- 9 - Digitale Signalverarbeitung und Mustererkennung

Given

N sample vectors

Wanted

n Codebook vectors such that

(quantization error) where is the nearest codebook vector from

Problem statement

(10)

- 10 - Digitale Signalverarbeitung und Mustererkennung

Iterative approach: Lloyd-algorithm

Partition into n cells:

Cell contains those vectors

for which is the nearest codebook vector Choose an initial codebook

Replace each codebook vector by the mean of its cell

Iterate

Example

(11)

- 11 - Digitale Signalverarbeitung und Mustererkennung

Recompute cells, codebook fixed

Recompute codebook vectors, cells are fixed Quantization error

Σ over sample Σ over

cells

Σ over vectors in a cell

The algorithm converges always!

: old cells, : new cells (minimize quantization error)

: old codebook, : new codebook (minimizes quantization error)

(12)

- 12 - Digitale Signalverarbeitung und Mustererkennung

But:

Convergence only to a local optimum.

Quality of the resulting codebook depends on the initial codebook!

Quantization error never increases in each iteration and is bounded by zero.

Hence the algorithm converges.

(13)

- 13 - Digitale Signalverarbeitung und Mustererkennung

Example with 1-dimensional vectors

Initial codebook

Partitioning

New codebook

Optimal codebook

Convergence to a sub optimum!

Sample ³ ²

(14)

- 14 - Digitale Signalverarbeitung und Mustererkennung

Objective: avoid a bad initial codebook

LBG algorithm (Linde, Buzo, Gray )

Idea:

Start with trivial codebook of size 1:

Iteriate

Optimize current codebook with Lloyds algorithm

Choose a codebook vector and replace it with two new codebook vectors und

where is a (small) random vector.

(15)

- 15 - Digitale Signalverarbeitung und Mustererkennung

Example with 1-dimensional vectors (see above)

Initial codebook n=1

Partitioning New codebook

Sample 3 2

Split

Partitioning

New codebook Optimal solution!

(16)

- 16 - Digitale Signalverarbeitung und Mustererkennung

Criteria to choose the codebook vector which is split

All codebook vectors simultaneously (size of the codebook is doubled)

The codebook vector whose cells contains most vectors The codebook vector whose cell has biggest variance Some combination of these criteria

Further improvements

Merge two codebook vectors if they are close together or if their cells contain only few vectors.

Annealing: Add artificial noise to the codebook vectors by random numbers.

Reduce the noise slowly as the computation proceeds.

Objective: avoid to get stuck in a local optimum.

LBG algorithm leads in general also only to a local optimimum

(17)

- 17 - Digitale Signalverarbeitung und Mustererkennung

Idea: Replace codebook vectors by parametric densities

Identification of mixture distributions

 Lloyd algorithm with distance measure based on probabilities instead of Euclidean distances.

Partition sample

Choose initial densities

Reestimate the parameters of the densities

Estimation of a mixture density for the entire sample Prior probability of cell i.

= Maximum likelihood estimation from vectors of cell

(18)

- 18 - Digitale Signalverarbeitung und Mustererkennung

Comparision of Lloyd algorithm and Viterbi Training

Model

^Codebook HMM: States with transition- and emission probabilities

Training phase 1: Match training data with existing model

HMM fixed.

Compute segmentation:

Map feature vectors to HMM states

Codebook fixed.

Compute cells : Map feature vectors to coodebook vectors

Training phase 2: Adapt model parameters to the training data

Cells fixed.

Recompute codebook:

= Maximum likelihood estimation from

Segmentation fixed.

Recompute transition and emission distributions with maximum likelihood from segmentation

Vector Quantization

- 1 - Digitale Signalverarbeitung und Mustererkennung

Vector Quantization

- 2 - Digitale Signalverarbeitung und Mustererkennung

Motivation: Identification of mixture probability distributions

Idea: Mixture densities

Problem

- 3 - Digitale Signalverarbeitung und Mustererkennung

- 4 -

Digitale Signalverarbeitung und Mustererkennung

- 5 - Digitale Signalverarbeitung und Mustererkennung

Task

- 6 - Digitale Signalverarbeitung und Mustererkennung

Procedure

- 7 - Digitale Signalverarbeitung und Mustererkennung

Vector Quantization:

Feature space

- 8 - Digitale Signalverarbeitung und Mustererkennung

Feature space

Vector Quantization:

- 9 - Digitale Signalverarbeitung und Mustererkennung

Given

Wanted

Problem statement

- 10 - Digitale Signalverarbeitung und Mustererkennung

Iterative approach: Lloyd-algorithm

- 11 - Digitale Signalverarbeitung und Mustererkennung

The algorithm converges always!

- 12 - Digitale Signalverarbeitung und Mustererkennung

But:

Convergence only to a local optimum.

Quality of the resulting codebook depends on the initial codebook!

Quantization error never increases in each iteration and is bounded by zero.

Hence the algorithm converges.

- 13 - Digitale Signalverarbeitung und Mustererkennung

Example with 1-dimensional vectors

- 14 - Digitale Signalverarbeitung und Mustererkennung

Objective: avoid a bad initial codebook

LBG algorithm (Linde, Buzo, Gray )

Idea:

- 15 - Digitale Signalverarbeitung und Mustererkennung

Example with 1-dimensional vectors (see above)

- 16 - Digitale Signalverarbeitung und Mustererkennung

Criteria to choose the codebook vector which is split

Further improvements

LBG algorithm leads in general also only to a local optimimum

- 17 - Digitale Signalverarbeitung und Mustererkennung

Identification of mixture distributions

- 18 - Digitale Signalverarbeitung und Mustererkennung

Comparision of Lloyd algorithm and Viterbi Training

Model

Training phase 1: Match training data with existing model

Training phase 2: Adapt model parameters to the training data

Expectation maximization principle (EM-algorithm)