Probability Theory

(1)

- 1 - Digital Signal Processing and Pattern Recognition

Probability Theory

(2)

Objective

 Fewer classification errors

Better models which do not only depend on mean values of reference vectors but also on their variances.

(3)

Example

Classification of animals as fish or bird

Collect some reference birds and reference fishes.

Compute average weight in both classes.

Model of a class consists of a single number.

Only one feature: Weight.

Create a model for each class:

(4)

Example

weight

fish

m

_fish bird

m

_bird x

x is classified as bird although it is a fish.

plankton whale

hummingbird albatros

x

What is the reason for this error?

(5)

Example

weight

fish bird

m

_fish

m

_bird x

In order to classify something as bird, it has to have a much smaller distance to the mean of bird than to the mean of fish.

plankton whale

hummingbird albatros

Weight has a small variance in class bird but a high variance in class fish.

x

A large deviaton from the mean weight is much more likely in class fish than in class bird.

We need a better distance measure which takes this into account!

(6)

Example

signal energy vowel

consonant

m

_vowel

m

_cons x

x is classified as consonant although it is a vowel.

Difference to mean value is not a sufficient distance measure for classification!

Classification of vowels and consonants by their signal energy

Classes do not overlap, perfect classification should be possible.

Reason: High variance of energy in class vowel, low variance in class consonant.

(7)

Example

Class A

Class B Two dimensional feature vectors

Feature 2

Feature 1

(8)

A very mean example

Class A

Class B

Variances of both features are equal in each class.

But feature 1 has higher variance than feature 2!

Reason for misclassification

Distance in feature component 2 should be weighted higher than distance in component 1!

(9)

Example

Class A

Class B

Two dimensional feature vectors – components correlated

Grade in Physics

Grade in Math

(10)

Example

Universalists

Specialists

Two dimensional feature vectors – class wise different correlation

Grade in Language

Grade in Science

(11)

?

Example: More than one reference pattern per class

Reference patterns Class A:

Reference patterns Class B:

Test pattern

Distance measure: Absolute or quadratic distance to the class mean

(15)

Sample Mean

Average over all samples

Random sample of reference patterns of a class:

Sample Variance

Measure for the scattering: Average squared deviation from the sample mean

Empirical mean value, empirical variance

(sample mean, sample variance)

(16)

Class A: Class B:

Sample variance in class B is much higher than in class A!

(17)

„Improved“ distance measure, which takes the variance into account Mahalanobis distance:

Motivation: „normalized“ distance measure

Distance ofx to the mean relative to the average distance to the mean in the class

Distance measure so far: Squared distance to the mean

Mahalanobis Distance

Class

(18)

Class A: Class B:

Classification with Mahalanobis distance

Distance to class A Distance to class B

Class B

Class B ???

Class B

„Misclassification“ with Mahalanobis distance!

Classes wih a very high variance are preferred too much!

(19)

Probability Density and

Random Variables

(20)

Class A

Class B Class B

62 should be classified as B, although 62 is closer to the mean of class A!

35 should be classified as A.

Probability density

(21)

Example: Density function of the normal distribution

Density function depends on only two paramenters mean and variance which can be estimated empirically from some samples

(22)

Classification result with normal distribution assumption

Class A: Class B:

Class B Class A Class B Class

(23)

Contour lines of f(x,y) if X and Y are independent

(24)

Contour lines of f(x,y) if X and Y are dependent

(25)

and

(26)

Sample of reference vectors from a class:

Empirical mean, empirical variance for each component

Empirical mean vector, empirical variance vector

Example

(27)

Objective: Improved model for the classification of feature vector sequences

Example

Reference patterns of some class

(given)

Modell

for the class (wanted)

length 6 length 7

length 3

Application for Viterbi Training

and Classification

(28)

Linear Segmentation (nothing new)

Model states

(29)

Initial estimation of the model states

(30)

Matching of the reference sequences with the model using Viterbi Algorithm (use normal distribution based distance measure!)

Model states

(31)

Reestimation of the model states

Matching with new model (Viterbi Algorithm) Reestimate model using new segmentation Iterate:

(32)

Probability Theory