Maximum Likelihood Parameter Estimation

(1)

- 1 - Digitale Signalverarbeitung und Mustererkennung

Maximum Likelihood

Parameter Estimation

(2)

Motivation: Training of HMMs

Given:

Sample of feature vectors produced by an HMM state (training data)

Wanted:

Emission probability density of the state

Approach so far:

Assume normal distribution with independent components

Compute empirical mean- and variance vector from training data

(3)

Simplification: numbers instead of vectors

Random variable X with unknown density function.

2, 5, 3 Sample of X:

Given:

Wanted:

Density function for X.

(4)

Normal distribution with

Simplification: numbers instead of vectors

Given:

Wanted:

(5)

Other density function

 solution is not unique

Probably a bad choice:

„Overfitting“

Simplification: numbers instead of vectors

Given:

Wanted:

(6)

Correct problem formulation

Random variable X with parametric density function

Sample of X:

Given:

Wanted:

Values for the parameters , such that the likelihood of the observed sample is maximum, i.e.

Normal distribution with parameters Example:

Likelihood function

(7)

Find  such that is maximum.

Set derivatives to zero, i.e.

Task

Problem

Derivatives of products with many factors leads to complicated terms!

Idea: take the logarithm!

Observation Method

and

have their maximum for the same value of , as ln is monotonic increasing!

Log likelihood

function

(8)

Example:

Likelihood function

Log likelihood function

Maximum likelihood estimation of parameters  and ² of the normal distribution from a sample x₁, x₂, …, x_n

(9)

Partial derivative wrt. 

(10)

Partial derivative wrt. ²

(11)

Maximum likelihood estimation of parameters  and ² of the normal distribution from a sample x₁, x₂, …, x_n

Result:

(12)

Example:

Likelihood function (assume all x_i  0)

Maximum likelihood estimation of parameter  of the exponential distribution from a sample x₁, x₂, …, x_n

Density of the exponential distribution

if

otherwise

(13)

Derivative wrt. 

Inverse of the empirical mean

(14)

Example:

Maximum likelihood estimation of parameters a,b of the uniform distribution from a sample x₁, x₂, …, x_n

Density of the uniform distribution

Likelihood function

if

otherwise

maximum if

minimum and

if and

otherwise

(15)

Example:

Likelihood funktion

Log likelihood funktion

Maximum likelihood extimation of parameter p of a Bernoulli experiment from a sample x₁, x₂, …, x_n

Probability distribution (discrete distribution!)

(16)

Derivative wrt. p