Markov Models and Hidden Markov Models for sequence analysis

(1)

Markov Models and Hidden Markov Models for sequence analysis

Michael Seifert

IPK Gatersleben

16.01.07 - 17.01.07 seifert@ipk-gatersleben.de

(2)

Overview

1 CpG Islands

2 Markov Models

3 Hidden Markov Models

(3)

CpG Islands - Motivation

genome segments with high frequency of dinucleotide CG segment size 100 up to 1000 bp

in different organisms (human, fly, mouse, worm, arabidopsis, ... )

(4)

CpG Islands - Evolution

C in dinucleotide CG often methylated

methylatedC is frequently mutated to T:CG → TG

suppression of methylation in promotor regions of different genes (e.g.

housekeeping genes)

mutation ofCtoTsuppressed

dinucleotidCGmore frequently in promotor regions than in other genome regions CpG Islands

(5)

CpG Islands - Classification of short DNA segments

set of short DNA segments {o¹, ...,o^M} Question

How can we decide for each DNA segment o^m if it is from a CpG Island or not?

(6)

CpG Islands - Basics of Modeling

dinucleotides over DNA alphabet{A,C,G,T} AA,AC,AG, ...,TT

DNA segments of CpG Islands

DNA segments of background (not CpG Islands) Modeling DNA segments

random vectorO =O1, ...,OT

Ot random variable over{A,C,G,T}

(7)

CpG Islands - Using Markov Models for Classification

First-order homogeneous Markov Model for DNA Model Assumptions

O₁is independent ofO₂, ...,O_T O_t+1 depends only onO_t

homogeneous: one probability distribution for allOt+1 that depend on O_t

Markov Modelλ= (S, π,A) set of statesS={A,C,G,T} start distributionπ= (πA, πC, πG, πT) stochastic transition matrixA= (aij)_i,j∈S

(8)

CpG Islands - Using Markov Models for Classification

First-order homogeneous Markov Model for DNA Likelihood

P[O =o|λ] =P[O₁=o₁|λ]·

T−1

Y

t=1

P[O_t+1=o_t+1|O_t=o_t, λ]

=πo1

T−1

Y

t=1

aotot+1

(9)

CpG Islands - Graphical Representation of Markov Models

(10)

CpG Islands - Classificator for short DNA segments

Create two Markov Models

λCpG = (S, π,ACpG) for CpG Islands λ¬CpG = (S, π,A¬CpG) for background Make Maximum Likelihood estimation

forλCpG using CpG Islands training data forλ_¬CpG using background training data

Decide for each of the short DNA segments {o¹, ...,o^M} whether it belongs to CpG Islands or background

using scoreS(o^m) = log_P[O=om|λCpG] P[O=o^m|λ¬CpG]

S(o^m)> ε: o^m is a CpG Island S(o^m)<−ε: o^m is background

(11)

CpG Islands - Estimated Transition Matrices

A_CpG =







a^CpG_ij A C G T

A 0.180 0.274 0.426 0.120 C 0.171 0.308 0.274 0.188 G 0.161 0.339 0.375 0.125 T 0.079 0.355 0.389 0.182







A_¬CpG =







a^¬CpG_ij A C G T A 0.300 0.205 0.285 0.210 C 0.322 0.298 0.078 0.302 G 0.248 0.246 0.298 0.208 T 0.177 0.239 0.292 0.292







(12)

CpG Islands - Detection in large DNA segments

Large DNA segments

can contain different numbers of CpG Islands Markov Models

classify short DNA segments

cannot model transitions between CpG Islands and background Markov Models for large DNA segments

segment large DNA segmento into short DNA segmentso¹, ...,o^M classify each short DNA segment using Markov Models λ_CpG and λ¬CpG

(13)

CpG Islands - Detection in large DNA segments

Problems

CpG Islands have variable lengths

How should we segment large DNA segments?

We require another model for analyzing large DNA segments!

use large DNA segments without segmentation model CpG Islands and background in one model detection of CpG Islands and background segments

(14)

CpG Islands - Detection in large DNA segments

New model for large DNA segments two states

CpG⁺for CpG Islands CpG⁻ for background transition probabilities

CpG⁺→CpG⁺: extend CpG Island

CpG⁺→CpG⁻: change from CpG Island to background CpG⁻→CpG⁺: change from background to CpG Island CpG⁻→CpG⁻: extend background

start probabilities for CpG⁺ and CpG⁻

(15)

CpG Islands - Detection in large DNA segments

We have defined a Markov Model!

But how can this model work with large DNA segments over alphabet {A,C,G,T}?

stateCpG⁺ gets emission distributions for{A,C,G,T} stateCpG⁻ gets emission distributions for{A,C,G,T}

e.g. useful for CpG Islands: probability for nucleotideo_t+1 depends on ot, but not in this lecture

Now we have motivated an Hidden Markov Model!

(16)

CpG Islands - Hidden Markov Models

Modeling

emission sequence: random vectorO=O1, ...,OT

Ot random variable over{A,C,G,T} state sequence: random vector Q=Q₁, ...,Q_T

Q_t random variable over{CpG⁺,CpG⁻} Hidden Markov Model for large DNA segments

Model Assumptions

O_t is independent of all otherO_d withd 6=t Ot depends onQt

Q1is independent ofQ2, ...,QT

Qt+1depends only onQt

emission sequenceo is known and state sequenceq is unknown (hidden)

(17)

CpG Islands - Hidden Markov Models

Hidden Markov Model for large DNA segments λ= (Σ,S, π,A,B)

emission alphabet Σ ={A,C,G,T} set of statesS={CpG⁺,CpG⁻} start distributionπ= (π_CpG+, π_CpG−) stochastic transition matrixA= (aij)i,j∈S

stochastic emission matrixB= (bi(v))i∈S∧v∈Σ

(18)

CpG Islands - Hidden Markov Models

Hidden Markov Model for large DNA segments Complete-Data-Likelihood

P[O =o,Q =q|λ] =P[Q1 =q1|λ]·

T−1

Y

t=1

P[Qt+1 =qt+1|Q_t=qt, λ]

·

T

Y

t=1

P[O_t=o_t|Q_t=q_t, λ]

=π_q₁·

T−1

Y

t=1

a_q_t_q_t+1 ·

T

Y

t=1

b_q_t(o_t) runtime: O(T)

(19)

CpG Islands - Hidden Markov Models

Central Questions

1 Likelihood of emission sequence o under HMM λ?

2 Probability of state i at time stept for given emission sequence o?

3 Probability of a transition from state i to statej at time stept for given emission sequence o?

4 Most probable state sequenceq^∗ for a given emission sequence o under HMMλ?

5 Maximum Likelihood estimation of HMMλ?

(20)

Likelihood of emission sequence o under HMM λ?

Naive Approach

emission sequence o =o₁, ...,o_T

use Complete-Data-LikelihoodP[O =o,Q=q|λ]

marginalize over all|S|^T state sequences q P[O =o|λ] = X

q∈S^T

P[O=o,Q =q|λ]

problem: number of state sequences grows exponential with the length of the emission sequence

runtime: O(T · |S|^T)

(21)

Likelihood of emission sequence o under HMM λ?

Forward-Algorithm

dynamic programming

Forward-Variable: α_t(i) :=P[O₁^t=o₁^t,Q_t =i|λ]

Probability to observe emissionso₁, . . . ,o_t and to be in state i at time stept under HMMλ.

Algorithm Initialization:

∀i ∈S : α1(i) =πi ·bi(o1) Iteration:

∀1≤t <T ∀i ∈S : α_t+1(i) =



 X

j∈S

α_t(j)·a_ji



·b_i(o_t+1)

(22)

Likelihood of emission sequence o under HMM λ?

Forward-Algorithm

Likelihood: P[O =o|λ] =P

i∈SαT(i) Runtime

αt+1(i) inO(|S|)

|S|Forward-Variablesαt(i) per time stept T time steps in total

Forward-Algorithm requiresO(T · |S|²)

(23)

Probability of state i at time step t for given emission sequence o?

Gamma-Variable

γ_t(i) : =P[Q_t =i|O=o, λ]

= P[O₁^T =o₁^T,Q_t =i|λ]

P[O=o|λ]

= P[O_t+1^T =o_t+1^T |O₁^t=o₁^t,Q_t =i, λ]·P[O₁^t =o₁^t,Q_t =i|λ]

P[O=o|λ]

= P[O_t+1^T =o_t+1^T |O₁^t=o₁^t,Q_t =i, λ]·α_t(i) P[O=o|λ]

O_t is independent of all otherO_d withd6=t Q_t+1depends onQ_t

P[O_t+1^T =o_t+1^T |O₁^t =o₁^t,Q_t =i, λ] =P[O_t+1^T =o_t+1^T |Q_t =i, λ]

(24)

Probability of state i at time step t for given emission sequence o?

Backward-Algorithm dynamic programming

Backward-Variable: β_t(i) :=P[O_t+1^T =o_t+1^T |Q_t =i, λ]

Probability to observe emissionso_t+1, . . . ,o_T given statei at time step t

Algorithm Initialization

∀i ∈S : β_T(i) = 1 Iteration

∀T >t≥1∀i ∈S : βt(i) =X

j∈S

aij ·bj(ot+1)·βt+1(j)

(25)

Probability of state i at time step t for given emission sequence o?

Backward-Algorithm Runtime

β_t(i) inO(|S|)

|S|Backward-Variablesβt(i) per time stept T time steps in total

Backward-Algorithm requiresO(T· |S|²) Gamma-Variable

γt(i) = α_t(i)·β_t(i) P[O =o|λ]

P[O =o|λ] =P

i∈SP[O =o,Qt =i|λ] =P

i∈Sαt(i)·βt(i)

(26)

Probability of state i at time step t for given emission sequence o?

Gamma-Variable

requires Forward and Backward-Algorithm for efficient computation usage

posterior decoding required for training test implementation

∀1≤t≤T : P

i∈Sγt(i) = 1

(27)

Probability of a transition from state i to state j at time step t for given emission sequence o?

Epsilon-Variable

ε_t(i,j) : =P[Q_t =i,Q_t+1 =j|O =o, λ]

= αt(i)·aij·bj(ot+1)·βt+1(j) P[O=o|λ]

How to do???

P[Q_t=i,Q_t+1=j|O=o, λ] =P[O_t+2^T =o^T_t+2|Q_t+1=j,Q_t=i,O^t+1₁ =o₁^t+1, λ]

·P[Ot+1=ot+1|Q_t+1=j,Qt=i,O₁^t=o₁^t, λ]·P[Qt+1=j|Q_t=i,O₁^t=o^t₁, λ]

·P[O₁^t=o^t₁,Q_t=i|λ]

=P[O_t+2^T =o^T_t+2|Q_t+1=j, λ]

·P[O_t+1=o_t+1|Q_t+1=j, λ]·P[Q_t+1=j|Q_t=i, λ]

·P[O₁^t=o^t₁,Q_t=i|λ]

=β_t+1(j)

·b_j(o_t+1)·a_ij

·αt(i)

(28)

Probability of a transition from state i to state j at time step t for given emission sequence o?

Epsilon-Variable

requires Forward and Backward-Algorithm for efficient computation usage

required for training test implementation

∀1≤t<T : P

j∈Sε_t(i,j) =γ_t(i)

(29)

Most probable state sequence q

^∗

for a given emission sequence o under HMM λ?

Viterbi-Path

q^∗=argmax

q∈S^T

P[Q =q|O =o, λ]

=argmax

q∈S^T

P[O =o,Q =q|λ]

P[O =o|λ]

=argmax

q∈S^T

P[O =o,Q =q|λ]

(30)

Most probable state sequence q

^∗

for a given emission sequence o under HMM λ?

Naive Approach

emission sequence o =o1, ...,oT

use Complete-Data-LikelihoodP[O =o,Q=q|λ]

compute Complete-Data-Likelihood for all |S|^T state sequencesq q^∗ =argmax

q∈S^T

P[O =o,Q =q|λ]

problem: number of state sequences grows exponential with the length of the emission sequence

runtime: O(T · |S|^T)

(31)

Most probable state sequence q

^∗

for a given emission sequence o under HMM λ?

Viterbi-Algorithm

dynamic programming

most probable state sequence q^∗ consists of most probable subsequences

basics

Delta-Variable: probability of most probable subsequence q1, . . .qt−1,i for emissions o1, . . . ,ot

δt(i) = max

q^t−1₁ ∈S^t−1

P[O₁^t=o₁^t,Q₁^t−1 =q₁^t−1,Qt=i|λ]

Psi-Variable: pointer for trace back Ψ_t(i) =argmax

j∈S

δt−1(j)·a_ji

(32)

Most probable state sequence q

^∗

for a given emission sequence o under HMM λ?

Viterbi-Algorithm Algorithm

Initialization: ∀i ∈S

δ₁(i) =π_i ·b_i(o₁) Iteration: ∀1≤t<T ∀i ∈S

δt+1(i) = max

j∈S (δt(j)·aji)·bi(ot+1) Ψt+1(i) =argmax

j∈S

δt(j)·aji

(33)

Most probable state sequence q

^∗

for a given emission sequence o under HMM λ?

Reconstruction of Viterbi-Path use pointer for trace back reconstruction

Initialization

q^∗_T =argmax

j∈S

δ_T(j) Iteration: ∀T ≥t≥2

q_t−1^∗ = Ψ_t(q_t^∗)

(34)

Most probable state sequence q

^∗

for a given emission sequence o under HMM λ?

Viterbi-Algorithm Runtime δ_t(i) and Ψ_t(i) in O(|S|)

|S|δ_t(i)⁰s and Ψ_t(i)⁰s per time step in totalT time steps

reconstruction inO(T)

Viterbi-Algorithm computes Viterbi-Path in O(TN²+T)

(35)

Maximum Likelihood estimation of HMM λ?

Parameter Estimation if state sequence is known emission sequence o =o1, . . . ,oT

state sequence q =q₁, . . . ,q_T

maximize Complete-Data-Likelihood P[O=o,Q =q|λ]

Maximum Likelihood Estimators π_i =|{i :q1 =i}|

a_ij = |{t : 1≤t <T ∧q_t =i∧q_t+1=j}|

|{t: 1≤t<T ∧qt =i}|

b_i(j) = |{t : 1≤t ≤T ∧ot =j∧qt =i}|

|{t: 1≤t ≤T ∧q_t =i}|

(36)

Maximum Likelihood estimation of HMM λ?

Parameter Estimation if state sequence is known pseudocounts or Maximum Aposterior Estimation Try it with M independent pairs o^m and q^m!

P[O =o,Q =q|λ] =

M

Y

m=1

P[O =o_m,Q =q_m|λ]

Problem!!!

state sequenceqis not known in most cases

(37)

Maximum Likelihood estimation of HMM λ?

Parameter Estimation if state sequence is unknown emission sequence o =o1, ...,oT

maximize LikelihoodP[O =o|λ]

λ^∗ =argmax

λ

P[O =o|λ]

=argmax

λ

log(P[O=o|λ])

(38)

Maximum Likelihood estimation of HMM λ?

Parameter Estimation if state sequence is unknown Likelihood and Log-Likelihood

P[O =o|λ] = X

q∈S^T

P[O=o,Q =q|λ]

log (P[O =o|λ]) = log



 X

q∈S^T

P[O =o,Q =q|λ]





logarithm of a sum is bad

(39)

Maximum Likelihood estimation of HMM λ?

Parameter Estimation if state sequence is unknown iterative training

Baum-Welch algorithm special case of EM algorithm finds local maximum

initial HMMλ¹ leads to a stepwise series of HMMsλ¹, λ², . . . , λ^∗ that fulfill

P[O=o|λ¹]≤P[O =o|λ²]≤. . .≤P[O =o|λ^∗]

(40)

Maximum Likelihood estimation of HMM λ?

Parameter Estimation if state sequence is unknown Baum-Welch algorithm

Start with HMMλ¹and determine HMMλ² Use HMMλ²like HMM λ¹ and determineλ³

iterate or stop if change in Likelihood is less than a threshold Likelihood-Series converge to local maximum

try different initial HMMsλ¹

(41)

Maximum Likelihood estimation of HMM λ?

Parameter Estimation if state sequence is unknown assume we know a state sequenceq

P[Q =q|O =o, λ] = P[O =o,Q=q|λ]

P[O =o|λ]

now we have another formula for Log-Likelihood

log (P[O =o|λ]) = log (P[O=o,Q =q|λ])

−log (P[Q =q|O =o, λ])

(42)

Maximum Likelihood estimation of HMM λ?

Parameter Estimation if state sequence is unknown

now we assume that we have the HMMλ^h of training step h recallλ¹ is the initial HMM

basis

log (P[O =o|λ]) = log (P[O=o,Q =q|λ])

−log (P[Q =q|O =o, λ]) multiply with P[Q =q|O =o, λ^h]

P[Q =q|O =o, λ^h]·log (P[O =o|λ])

=P[Q =q|O =o, λ^h]·log (P[O=o,Q =q|λ])

−P[Q =q|O =o, λ^h]·log (P[Q =q|O =o, λ])

(43)

Maximum Likelihood estimation of HMM λ?

Parameter Estimation if state sequence is unknown marginalize over all state sequencesq ∈S^T

log(P[O =o|λ])

= X

q∈S^T

P[Q=q|O =o, λ^h]·log (P[O =o,Q =q|λ])

− X

q∈S^T

P[Q =q|O =o, λ^h]·log (P[Q =q|O =o, λ])

=Q(λ|λ^h)

− X

q∈S^T

P[Q =q|O =o, λ^h]·log (P[Q =q|O =o, λ])

Quasi-Log-Likelihood Q(λ|λ^h)

(44)

Maximum Likelihood estimation of HMM λ?

difference of Log-Likelihoods must be positive to improve the Log-Likelihood of HMMλin comparison with the Log-Likelihood of HMM λ^h

log(P[O =o|λ])−log(P[O =o|λ^h])≥^! 0 rewrite difference of Log-Likelihoods

log(P[O =o|λ])−log(P[O =o|λ^h])

=Q(λ|λ^h)− X

q∈S^T

P[Q =q|O =o, λ^h] log(P[Q =q|O =o, λ])

−Q(λ^h|λ^h) + X

q∈S^T

P[Q =q|O =o, λ^h] log(P[Q =q|O =o, λ^h])

(45)

Maximum Likelihood estimation of HMM λ?

Parameter Estimation if state sequence is unknown rewrite difference

log(P[O=o|λ])−log(P[O =o|λ^h])

=Q(λ|λ^h)−Q(λ^h|λ^h)

+ X

q∈S^T

P[Q =q|O =o, λ^h] log

P[Q =q|O=o, λ^h] P[Q=q|O =o, λ]

| {z }

≥0 relative entropy

≥Q(λ|λ^h)−Q(λ^h|λ^h) Q(λ^h|λ^h) is constant

choose HMM

λ^h+1=argmax

λ

Q(λ|λ^h)

(46)

Maximum Likelihood estimation of HMM λ?

theoretical basics of the Baum-Welch algorithm are known :-) now we maximize the Quasi-Log-Likelihood functionQ(λ|λ^h)

Q(λ|λ^h) = X

q∈S^T

P[Q =q|O =o, λ^h]·log (P[O =o,Q =q|λ])

splitting into three independent functions start parameters

transition parameters emission parameters

(47)

Maximum Likelihood estimation of HMM λ?

Parameter Estimation if state sequence is unknown Q(λ|λ^h) = X

q∈S^T

P[Q =q|O =o, λ^h]·log(P[O =o,Q =q|λ])

= X

q∈S^T

P[Q =q|O =o, λ^h]·log πq1·

T−1

Y

t=1

aqtqt+1·

T

Y

t=1

bqt(ot)

!

= X

q∈S^T

P[Q =q|O =o, λ^h] log(π_q₁)

+ X

q∈S^T

P[Q =q|O =o, λ^h]·log

T−1

Y

t=1

aqtqt+1

!

+ X

q∈S^T

P[Q =q|O =o, λ^h]·log

T

Y

t=1

b_q_t(o_t)

!

: =Qπ(λ|λ^h) +Qa(λ|λ^h) +Qb(λ|λ^h)

(48)

Maximum Likelihood estimation of HMM λ?

Parameter Estimation if state sequence is unknown rewrite function for start parameters

Q_π(λ|λ^h) = X

q∈S^T

P[Q =q|O =o, λ^h]·log(π_q₁)

=X

i∈S

X

q∈S T q1=i

P[Q =q|O =o, λ^h]·log(πq1)

=X

i∈S

log(π_i)·P[Q₁ =i|O =o, λ^h]

=X

i∈S

log(πi)·γ1(i)

(49)

Maximum Likelihood estimation of HMM λ?

Parameter Estimation if state sequence is unknown rewrite function for transition parameters

Q_a(λ|λ^h) = X

q∈S^T

P[Q=q|O=o, λ^h]·log

T−1

Y

t=1

a_q_t_q_t+1

!

=

T−1

X

t=1

X

q∈S^T

P[Q=q|O=o, λ^h]·log(a_q_t_q_t+1)

=X

i∈S

X

j∈S T−1

X

t=1

X

q∈S T qt=i,qt+1=j

P[Q=q|O=o, λ^h]·log(a_q_t_q_t+1)

=X

i∈S

X

j∈S T−1

X

t=1

log(a_ij)·P[Q_t=i,Q_t+1=j|O=o, λ^h]

=X

i∈S

X

j∈S

log(aij)·

T−1

X

t=1

εt(i,j)

(50)

Maximum Likelihood estimation of HMM λ?

Parameter Estimation if state sequence is unknown rewrite function for emission parameters

Q_b(λ|λ^h) = X

q∈S^T

P[Q=q|O=o, λ^h]·log

T

Y

t=1

bqt(ot)

!

=

T

X

t=1

X

q∈S^T

P[Q=q|O=o, λ^h]·log(bqt(ot))

=X

i∈S

X

j∈Σ T

X

t=1 ot=j

X

q∈S T qt=i

P[Q=q|O=o, λ^h]·log(b_q_t(o_t))

=X

i∈S

X

j∈Σ

log(bi(j))·

T

X

t=1 ot=j

P[Qt=i|O=o, λ^h]

=X

i∈S

X

j∈Σ

log(bi(j))·

T

X

t=1 ot=j

γt(i)

(51)

Maximum Likelihood estimation of HMM λ?

independent maximization under stochastic side conditions function for start parametersQ_π(λ|λ^h)

function for transition parametersQ_a(λ|λ^h) function for emission parametersQ_b(λ|λ^h)

(52)

Maximum Likelihood estimation of HMM λ?

Parameter Estimation if state sequence is unknown Function for Start Parameters

Q_π(λ|λ^h) = X

i∈S

log(π_i)γ₁(i)

!

−κ −1 +X

i∈S

π_i

!

∂Qπ(λ|λ^h)

∂π_i = γ1(i)

π_i −κ= 0^! π_i = γ₁(i)

κ 1 =X

i∈S

γ1(i)

κ ↔ κ=X

i∈S

γ1(i) π_i^h+1= γ₁(i)

X

i∈S

γ1(i)

=γ₁(i)

(53)

Maximum Likelihood estimation of HMM λ?

Parameter Estimation if state sequence is unknown Function for Transition Parameters

Q_a(λ|λ^h) =



 X

i∈S

X

j∈S

log(a_ij)

T−1

X

t=1

ε_t(i,j)



−κ_i



−1 +X

j∈S

a_ij





∂Q_a(λ|λ^h)

∂a_ij = PT−1

t=1 ε_t(i,j)

a_ij −κ_i = 0^!

aij = PT−1

t=1 ε_t(i,j) κ_i

1 =X

j∈S

PT−1 t=1 εt(i,j)

κ_i ↔ κi =

T−1

X

t=1

γt(i)

a_ij^h+1=

T−1

X

t=1

ε_t(i,j)

T−1

X

t=1

γt(i)

(54)

Maximum Likelihood estimation of HMM λ?

Parameter Estimation if state sequence is unknown Function for Emission Parameters

Q_b(λ|λ^h) =





 X

i∈S

X

j∈Σ

log(b_i(j))

T

X

t=1 ot=j

γ_t(i)





−κ_i



−1 +X

j∈Σ

b_i(j)





∂Q_b(λ|λ^h)

∂b_i(j) =

T

X

t=1 ot=j

γ_t(i)

b_i(j) −κ_i= 0^!

b_i(j) =

T

X

t=1 ot=j

γ_t(i)

κ_i 1 =X

j∈Σ T

X

t=1 ot=j

γ_t(i)

κ_i ↔ κ_i=X

j∈Σ T

X

t=1 ot=j

γt(i) =

T

X

t=1

γt(i)

b_i(j)^h+1=

T

X

t=1 ot=j

γ_t(i)

T

X

t=1

γ_t(i)

(55)

Maximum Likelihood estimation of HMM λ?

Parameter Estimation if state sequence is unknown Maximization

π_i^h+1 maximizesQ_π(λ|λ^h) a^h+1_ij maximizesQ_a(λ|λ^h) b_i(j)^h+1 maximizesQ_b(λ|λ^h) proof: hessian matrix

Baum-Welch algorithm for M independent emission sequences o¹, . . . ,o^M

P[O¹=o¹, . . . ,O^M =o^M|λ] =

M

Y

m=1

P[O^m =o^m|λ]

(56)

Maximum Likelihood estimation of HMM λ?

Parameter Estimation if state sequence is unknown Baum-Welch algorithm - Overview

Initialization

Initialize model parameters for HMMλ¹

Iteration

Computeεt(i,j) andγt(i) for emission sequenceo Computeπ_i^h+1,a^h+1_ij andbi(j)^h+1

Compute LikelihoodP[O=o|λ^h+1] under HMMλ^h+1

Stop

IfP[O=o|λ^h+1]−P[O=o|λ^h]≤αor a given number of iterations is reached

(57)

CpG Islands - Detection using HMMs

Two-State HMM

Use first order emission probabilities O₁is independent ofO₂, . . . ,O_T O₁depends on Q₁

O_t+1 depends onO_t andQ_t+1

Try to modify the standard algorithms using Complete-Data-Likelihood

P[O=o,Q =q|λ] =π_q₁·

T−1

Y

t=1

a_q_t_q_t+1·b_q₁(o₁)·

T−1

Y

t=1

b_q_t+1(o_t+1|o_t)

(58)

CpG Islands - Detection using HMMs

Two-State HMM

choose initial model parameters from known CpG Island and background sequences

detect CpG Islands in emission sequence o^m using Viterbi-Algorithm Viterbi-Pathq^∗=q₁^∗, . . . ,q_T^∗

allq^∗_t =CpG⁺ are potential CpG Islands

(59)

Literatur

Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchision.

Biological sequence analysis - Probabilistic models of proteins and nucleic acids.

Cambridge University Press, 1998.

L. Rabiner.

A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.

Proceedings of the IEEE, 77(2):257–286, February 1989.