- 1 - Digitale Signalverarbeitung und Mustererkennung
Vector Quantization
- 2 - Digitale Signalverarbeitung und Mustererkennung
Motivation: Identification of mixture probability distributions
Idea: Mixture densities
So far: Assumption that emission distribution in the HMM states is normal This is not necessarily the case!
With sufficient amount of tranining data, more „flexible “ densities make sense.
Maximum likelihood estimation of leads to a system of equations which cannot be solved analytically
simple parametric densities, e.g. normal densities weights with
parameters
Problem
- 3 - Digitale Signalverarbeitung und Mustererkennung
Model with a single normal distribution
Model with a weighted sum of two normal distributions (mixture distribution)
- 4 -
Digitale Signalverarbeitung und Mustererkennung
- 5 - Digitale Signalverarbeitung und Mustererkennung
Task
Categorize the professors you know into groups.
Find a suitable name for each each group, e.g.
easy entertainer
hopelessly confused theoretician shirt-sleeved practitioner
demanding sadist
…
- 6 - Digitale Signalverarbeitung und Mustererkennung
Procedure
Find relevant features
z.B. average number of anecdotes per hour percieved work load per hour or per ECTS average grade in exams
Every professor corresponds to a feature vector z.B.
… and hence to a point in the n-dimensional feature space
- 7 - Digitale Signalverarbeitung und Mustererkennung
Feature vector of a professor
Vector Quantization:
Group points into compact cells
Feature 1 Feature 2
Feature space
A point can then be approximated (quantized) by the mean value of its cell
- 8 - Digitale Signalverarbeitung und Mustererkennung
Feature vector of a professor
Feature 1 Feature 2
Feature space
Mean value of a cell
Vector Quantization:
Group points into compact cells
A point can then be approximated (quantized) by the mean value of its cell
- 9 - Digitale Signalverarbeitung und Mustererkennung
Given
N sample vectors
Wanted
n Codebook vectors such that
(quantization error) where is the nearest codebook vector from
Problem statement
- 10 - Digitale Signalverarbeitung und Mustererkennung
Iterative approach: Lloyd-algorithm
Partition into n cells:
Cell contains those vectors
for which is the nearest codebook vector Choose an initial codebook
Replace each codebook vector by the mean of its cell
Iterate
Example
- 11 - Digitale Signalverarbeitung und Mustererkennung
Recompute cells, codebook fixed
Recompute codebook vectors, cells are fixed Quantization error
Σ over sample Σ over
cells
Σ over vectors in a cell
The algorithm converges always!
: old cells, : new cells (minimize quantization error)
: old codebook, : new codebook (minimizes quantization error)
- 12 - Digitale Signalverarbeitung und Mustererkennung
But:
Convergence only to a local optimum.
Quality of the resulting codebook depends on the initial codebook!
Quantization error never increases in each iteration and is bounded by zero.
Hence the algorithm converges.
- 13 - Digitale Signalverarbeitung und Mustererkennung
Example with 1-dimensional vectors
Initial codebook
Partitioning
New codebook
Optimal codebook
Convergence to a sub optimum!
Sample 3 2
- 14 - Digitale Signalverarbeitung und Mustererkennung
Objective: avoid a bad initial codebook
LBG algorithm (Linde, Buzo, Gray )
Idea:
Start with trivial codebook of size 1:
Iteriate
Optimize current codebook with Lloyds algorithm
Choose a codebook vector and replace it with two new codebook vectors und
where is a (small) random vector.
- 15 - Digitale Signalverarbeitung und Mustererkennung
Example with 1-dimensional vectors (see above)
Initial codebook n=1
Partitioning New codebook
Sample 3 2
Split
Partitioning
New codebook Optimal solution!
- 16 - Digitale Signalverarbeitung und Mustererkennung
Criteria to choose the codebook vector which is split
All codebook vectors simultaneously (size of the codebook is doubled)
The codebook vector whose cells contains most vectors The codebook vector whose cell has biggest variance Some combination of these criteria
Further improvements
Merge two codebook vectors if they are close together or if their cells contain only few vectors.
Annealing: Add artificial noise to the codebook vectors by random numbers.
Reduce the noise slowly as the computation proceeds.
Objective: avoid to get stuck in a local optimum.
LBG algorithm leads in general also only to a local optimimum
- 17 - Digitale Signalverarbeitung und Mustererkennung
Idea: Replace codebook vectors by parametric densities
Identification of mixture distributions
Lloyd algorithm with distance measure based on probabilities instead of Euclidean distances.
Partition sample
Choose initial densities
Reestimate the parameters of the densities
Estimation of a mixture density for the entire sample Prior probability of cell i.
= Maximum likelihood estimation from vectors of cell
- 18 - Digitale Signalverarbeitung und Mustererkennung
Comparision of Lloyd algorithm and Viterbi Training
Model
Codebook HMM: States with transition- and emission probabilitiesTraining phase 1: Match training data with existing model
HMM fixed.
Compute segmentation:
Map feature vectors to HMM states
Codebook fixed.
Compute cells : Map feature vectors to coodebook vectors
Training phase 2: Adapt model parameters to the training data
Cells fixed.
Recompute codebook:
= Maximum likelihood estimation from
Segmentation fixed.
Recompute transition and emission distributions with maximum likelihood from segmentation