• Keine Ergebnisse gefunden

Probability Theory

N/A
N/A
Protected

Academic year: 2021

Aktie "Probability Theory"

Copied!
32
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

- 1 - Digital Signal Processing and Pattern Recognition

Probability Theory

(2)

- 2 - Digital Signal Processing and Pattern Recognition

Objective

 Fewer classification errors

Better models which do not only depend on mean values of reference vectors but also on their variances.

(3)

- 3 - Digital Signal Processing and Pattern Recognition

Example

Classification of animals as fish or bird

Collect some reference birds and reference fishes.

Compute average weight in both classes.

Model of a class consists of a single number.

Only one feature: Weight.

Create a model for each class:

(4)

- 4 - Digital Signal Processing and Pattern Recognition

Example

weight

fish

m

fish bird

m

bird x

x is classified as bird although it is a fish.

plankton whale

hummingbird albatros

x

What is the reason for this error?

(5)

- 5 - Digital Signal Processing and Pattern Recognition

Example

weight

fish bird

m

fish

m

bird x

In order to classify something as bird, it has to have a much smaller distance to the mean of bird than to the mean of fish.

plankton whale

hummingbird albatros

Weight has a small variance in class bird but a high variance in class fish.

x

A large deviaton from the mean weight is much more likely in class fish than in class bird.

We need a better distance measure which takes this into account!

(6)

- 6 - Digital Signal Processing and Pattern Recognition

Example

signal energy vowel

consonant

m

vowel

m

cons x

x is classified as consonant although it is a vowel.

Difference to mean value is not a sufficient distance measure for classification!

Classification of vowels and consonants by their signal energy

Classes do not overlap, perfect classification should be possible.

Reason: High variance of energy in class vowel, low variance in class consonant.

(7)

- 7 - Digital Signal Processing and Pattern Recognition

Example

Class A

Class B Two dimensional feature vectors

Feature 2

Feature 1

(8)

- 8 - Digital Signal Processing and Pattern Recognition

A very mean example

Class A

Class B

Variances of both features are equal in each class.

But feature 1 has higher variance than feature 2!

Reason for misclassification

Distance in feature component 2 should be weighted higher than distance in component 1!

(9)

- 9 - Digital Signal Processing and Pattern Recognition

Example

Class A

Class B

Two dimensional feature vectors – components correlated

Grade in Physics

Grade in Math

(10)

- 10 - Digital Signal Processing and Pattern Recognition

Example

Universalists

Specialists

Two dimensional feature vectors – class wise different correlation

Grade in Language

Grade in Science

(11)

- 11 - Digital Signal Processing and Pattern Recognition

Contents

Mahalanobis Distance

Improved Distance Measure based on Normal Distribution Assumption Random Variables, Probability Density

Random Vectors, Common Density Function

Application of Probability Theory to Training and Classification

(12)

- 12 - Digital Signal Processing and Pattern Recognition

Reference pattern Class A:

Reference pattern Class B:

Test pattern

Simple special case:

Length of vector sequence 1

Vector dimension 1  Distance measure for numbers

Example

(13)

- 13 - Digital Signal Processing and Pattern Recognition

Reference patterns Class A:

Reference patterns Class B:

Test pattern

Example: More than one reference pattern per class

(14)

- 14 - Digital Signal Processing and Pattern Recognition

Class A Class B

Class A

?

Example: More than one reference pattern per class

Reference patterns Class A:

Reference patterns Class B:

Test pattern

Distance measure: Absolute or quadratic distance to the class mean

(15)

- 15 - Digital Signal Processing and Pattern Recognition

Sample Mean

Average over all samples

Random sample of reference patterns of a class:

Sample Variance

Measure for the scattering: Average squared deviation from the sample mean

Empirical mean value, empirical variance

(sample mean, sample variance)

(16)

- 16 - Digital Signal Processing and Pattern Recognition

Class A: Class B:

Sample variance in class B is much higher than in class A!

(17)

- 17 - Digital Signal Processing and Pattern Recognition

„Improved“ distance measure, which takes the variance into account Mahalanobis distance:

Motivation: „normalized“ distance measure

Distance ofx to the mean relative to the average distance to the mean in the class

Distance measure so far: Squared distance to the mean

Mahalanobis Distance

Class

Class

(18)

- 18 - Digital Signal Processing and Pattern Recognition

Class A: Class B:

Classification with Mahalanobis distance

Distance to class A Distance to class B

Class B

Class B ???

Class B

„Misclassification“ with Mahalanobis distance!

Classes wih a very high variance are preferred too much!

(19)

- 19 - Digital Signal Processing and Pattern Recognition

Probability Density and

Random Variables

(20)

- 20 - Digital Signal Processing and Pattern Recognition

Class A

Class B Class B

62 should be classified as B, although 62 is closer to the mean of class A!

35 should be classified as A.

Probability density

(21)

- 21 - Digital Signal Processing and Pattern Recognition

Example: Density function of the normal distribution

Density function depends on only two paramenters mean and variance which can be estimated empirically from some samples

(22)

- 22 - Digital Signal Processing and Pattern Recognition

Classification result with normal distribution assumption

Class A: Class B:

Class B Class A Class B Class

(23)

- 23 - Digital Signal Processing and Pattern Recognition

Contour lines of f(x,y) if X and Y are independent

(24)

- 24 - Digital Signal Processing and Pattern Recognition

Contour lines of f(x,y) if X and Y are dependent

(25)

- 25 - Digital Signal Processing and Pattern Recognition

and

(26)

- 26 - Digital Signal Processing and Pattern Recognition

Sample of reference vectors from a class:

Empirical mean, empirical variance for each component

Empirical mean vector, empirical variance vector

Example

(27)

- 27 - Digital Signal Processing and Pattern Recognition

Objective: Improved model for the classification of feature vector sequences

Example

Reference patterns of some class

(given)

Modell

for the class (wanted)

length 6 length 7

length 3

Application for Viterbi Training

and Classification

(28)

- 28 - Digital Signal Processing and Pattern Recognition

Linear Segmentation (nothing new)

Model states

Model states

(29)

- 29 - Digital Signal Processing and Pattern Recognition

Initial estimation of the model states

(30)

- 30 - Digital Signal Processing and Pattern Recognition

Matching of the reference sequences with the model using Viterbi Algorithm (use normal distribution based distance measure!)

Model states

Model states

(31)

- 31 - Digital Signal Processing and Pattern Recognition

Reestimation of the model states

Matching with new model (Viterbi Algorithm) Reestimate model using new segmentation Iterate:

(32)

- 32 - Digital Signal Processing and Pattern Recognition

Referenzen

ÄHNLICHE DOKUMENTE

MPRA Paper No.. Particularly, we show that the probability not to have stock out during the period is always greater than the critical fractile which depends upon the overage and

The economy represented by the more efficient equilibrium displays a smaller underground sector, higher levels of entrepreneurial ability used, extra-profits, relative wages,

It is notable that these results have been achieved by the use a standard ontology matching systems instead of using a specific approach for process model matching.. For the details

The strict measurement approach assumes that the process designer requests a full compliance of a business process model under construction with a reference model and its

In this research paper I have used historic data on stocks, value European call options using both logistic and normal distribution and then finally compare the results

The article “Parallel optimization of the ray-tracing algorithm based on the HPM model”, written by Wang Jun-Feng, Ding Gang-Yi, Wang Yi-Ou, Li Yu-Gang, and Zhang Fu-Quan,

When I use a log-normal distribution for simulation purposes, I use a truncated version of the distribution, one which extends from zero up to some maximum value, let us say 6.

In classical credibility theory, we make a linearized Bayesian forecast of the next observation of a particular individual risk, using his experience data and the statis- tics