0
DENSITY ESTIMATION
Estimating an Unknown
Probability Density Function
Th&Ko §2.5 / DHS §3.1-3.4
Density Estimation 1
• Parametric techniques
• Maximum Likelihood
• Maximum A Posteriori
• Bayesian Inference
• Gaussian Mixture Models (GMM) – EM-Algorithm
• Non-parametric techniques
• Histogram
• Parzen Windows
• k-nearest-neighbor rule
Estimation of an unknown PDF
1 1,1 ,
T ask :
E stim ate th e p aram eters o f a p d f w ith k n o w n stru ctu re fro m a set o f d ata .
( ( ) ( ; ) is k n o w n to b e G au ssian ( ), w ith u n k n o w n [ , ..., l, , ...., l l] )T X
e.g . p x p x N μ , Σ
1 2 1 2
1 2
F o rm al:
L et , , ...., k n o w n an d in d ep en d en t (i.i.d .) , , ...
L et ( ) b e k n o w n w ith in a vecto r p aram eter : ( ) ( ; )
( ; ) ( , , ... ; )
N N
N
x x x X x x x
p x p x p x
p X p x x x
1
( ; ) w h ich is k n o w n as th e lik elih o o d o f
N k k
p x w .r. to X
3
M L 1
ˆ : arg m ax (
k; )
k
p x
Maximum Likelihood Method ( ML )
Search for the parameter Θ
MLthat maximizes
0 )
(
)
( ;
1
k
N
k p x
– a necessary condition
( ) ln ( ; )
L
p X
– since ln is monotonic we can write the log-likelihood
0 ) (
) ( ) (
1 )
( ) (
ˆ : ;
;
1
kk N
ML k
x p x p L
Estimation of an unknown PDF
1ln ( ; )
N
k p xk
4
If there is a true value Θ
0, the ML estimate Θ
MLhas the following properties (no proofs):
a) Θ
MLis asymptotically unbiased and converges in the mean.
N 0
0
) , then
lim [ ˆ ]
; x p(
) x
p(
E
MLProperties of Maximum Likelihood Method
ˆ 1
prob lim
0
N
ML
b) Θ
MLis asymptotically consistent and converges in probability.
ˆ 0
lim
2 N 0
E
ML
c) Θ
MLis asymptotically consistent and converges in mean square.
5
ln ( ; ) )
(
1
N k
k
x p L
)
; ( ) (
,..., ,
unknown
: ) , ( : ) (
2 1
k k
N
x p x p
x x x
N x p
ML Example 1:
A A
A A
T
T ( ) 2
if :
Remember
1
1
1( ) ( )
2
N
T
k k
k
C x x
)) ( ) ( 2 exp( 1 ) 2 (
1 1
2 1 2
k T
l xk x
1
( ) 2
. ( )
.
l
L
L L
L
k N
ML k x
N 1 1
0 )
1(
1
k
N
k
x
ln ( ; , ) )
,
( 2
1
2
N k
k
x p L
2
1 2
2
( ): ( , ): , u n k n o w n , , ...,
( ) ( ; , )
l N
k k
p x N Σ σ I
x x x
p x p x σ
ML Example 2 :
) ) ( 2 exp( 1 ) 2 (
1 2
2 2
2 1
k
l
x
0 )
( ) (
2
L L L
k N
ML k x
N 1 1
N
k k
ML x
N 1
2
2 1 ( )
N
k
k N
k N
k k
x x
1
2 4
1 2
1 2
) ( 2
1 2
1
) 1 (
H o w ever, th e tru e is u n k n o w n , th erefo re w e h ave to u se
M L
Maximum Likelihood estimates are only asymptotically unbiased, so N should be large enough !
2 2
1 1 2
1 1
ln ( 2 ) ( )
2 2
N N
l
k k k
x
7
2
1 2
( ): ( , ): , u n k n o w n , , ...,
( ) ( ; , )
l N
k k
p x N Σ σ I
x x x
p x p x Σ
ML Example (3) :
WARNING: An unbiased estimator is also no guarantee for a correct result!!
)) ( ) ( 2 exp( 1 ) 2 (
1 1
2 1 2
k T
l xk x
ln ( ; , ) )
,
( 2
1
2
N k
k
x p
L 1
1
ln ( 2 ) 1 ( ) ( )
2 2
N
l T
k k
k
N x x
0 ..
...
. ) (
)
( 11
ll
L L L
L
k N
ML k x
N 1 1
1
1 ( )( )
N
T
k k
M L k M L M L
x x
N
Maximum Likelihood estimates are only asymptotic unbiased, so we need a large
N
!Estimation of an unknown PDF 8
M aximum A posteriori P robability estimation (MAP)
( In ML θ was considered as a parameter ) Here we shall look at θ as a random vector described by a pdf p(θ), assumed to be known
x
1, x
2,..., x
N
X
) X ( p
Given
Compute the maximum of
) ( ) ( ) ( )
( p X p X p X
p
From Bayes theorem
) (
) ( ) ( ) ( or
X p
X p p X
p
9
The method:
ˆ a rg m a x ( ) o r
ˆ : ( ( ) ( ) ) 0
If ( ) is u n ifo rm o r b ro a d e n o u g h ˆ
M A P
M A P
M A P M L
p X
P p X
p
MAP Example:
) 2 exp(
) 2 ( ) 1 (
unknown ),
, ( : ) (
2 2 0
2 1
2
l l N
p
x ,..., x X
I N
x p
0 )) ( ) ( ln(
:
1
p xk p
N
MAP k
N xk
N
k MAP
2 2 2 1 2
0
1 ˆ
N for or , 1 2
2
For
2 2 0
1
1 1
ˆ ˆ
o r ( ) ( ) 0
N k k
x
k N
ML k x
N 1
MAP
ˆ 1 ˆ