Wissensentdeckung in Datenbanken Probabilistische Graphische Modelle II Nico Piatkowski und Uwe Ligges

(1)

Informatik—Künstliche Intelligenz Computergestützte Statistik Technische Universität Dortmund

Graphische Modelle

Wissensentdeckung in Datenbanken

Probabilistische Graphische Modelle II

Nico Piatkowski und Uwe Ligges

Informatik—Künstliche Intelligenz Computergestützte Statistik Technische Universität Dortmund

27.06.2017

1 von 16

(2)

Graphische Modelle

Überblick

Was bisher geschah...

Modellklassen Verlustfunktionen

Numerische Optimierung Regularisierung

Überanpassung SQL, Häufige Mengen SVM, xDA, Bäume, . . . Graphische Modelle Heute

Graphische Modelle—Theorie und Algorithmen

(3)

Graphische Modelle

Überblick

Was bisher geschah...

Modellklassen Verlustfunktionen

Numerische Optimierung Regularisierung

Überanpassung SQL, Häufige Mengen SVM, xDA, Bäume, . . . Graphische Modelle Heute

Graphische Modelle—Theorie und Algorithmen

2 von 16

(4)

Graphische Modelle

Überblick

Suffiziente Statistiken Maximum-Entropie Gradient

Randverteilung

Belief Propagation

Gibbs Sampling

(5)

1 3 2

4

Graphische Modelle

Graph

G = ( V, E ) mit Knotenmenge V und Kantenmenge E Hier: V = { 1, 2,3, 4 } , E = {{ 1, 2 } , { 1, 3 } , { 1, 4 } , { 2, 3 } , { 3, 4 }}

Cliquen: C( G ) = V ∪ E ∪ {{ 1, 2, 3 } , { 1, 3, 4 }}

4 von 16

(6)

1 3 2

4

Graphische Modelle

Graph

G = ( V, E ) mit Knotenmenge V und Kantenmenge E Hier: V = { 1, 2,3, 4 } , E = {{ 1, 2 } , { 1, 3 } , { 1, 4 } , { 2, 3 } , { 3, 4 }}

Cliquen: C( G ) = V ∪ E ∪ {{ 1, 2, 3 } , { 1, 3, 4 }}

(7)

Graphische Modelle

Suffiziente Statistik

Daten D , Modell mit Parameter β

Funktion φ ist eine suffiziente Statistik ⇔ β ⊥⊥ D ∣ φ (D) mit φ (D) = ∑ ^x∈D φ ( x )

Für diskrete X :

φ ist immer gegeben durch φ _C=y _C ( x _C ) = ∏ v∈C 1 x v =y _v

mit C ∈ C( G ) , x _C ∈ X C , x ∈ X φ _C ( x _C ) = ( φ _C=y 1

C ( x _C ) , φ _C=y 2

C ( x _C ) , . . . ) = ( φ _C=y _C ( x _C )) y _C ∈X C

5 von 16

(8)

Graphische Modelle

Suffiziente Statistik

Daten D , Modell mit Parameter β

Funktion φ ist eine suffiziente Statistik ⇔ β ⊥⊥ D ∣ φ (D) mit φ (D) = ∑ ^x∈D φ ( x )

Für diskrete X :

φ ist immer gegeben durch φ _C=y _C ( x _C ) = ∏ v∈C 1 x v =y _v

mit C ∈ C( G ) , x _C ∈ X C , x ∈ X φ _C ( x _C ) = ( φ _C=y 1

C ( x _C ) , φ _C=y 2

C ( x _C ) , . . . ) = ( φ _C=y _C ( x _C )) y _C ∈X C

(9)

Graphische Modelle

Beispiel: Suffiziente Statistik

6 von 16

(10)

Graphische Modelle

Beispiel: Suffiziente Statistik

(11)

Graphische Modelle

Beispiel: Suffiziente Statistik

6 von 16

(12)

Graphische Modelle

Beispiel: Suffiziente Statistik

(13)

Graphische Modelle

Beispiel: Suffiziente Statistik

6 von 16

(14)

1 3 2

4

Graphische Modelle

Beispiel: Suffiziente Statistik

V = { 1, 2, 3, 4 } , E = {{ 1, 2 } , { 1, 3 } , { 1, 4 } , { 2, 3 } , { 3, 4 }} , C( G ) = V ∪ E ∪ {{ 1, 2, 3 } , { 1, 3, 4 }}

φ _C ( x _C ) = ( φ _C=y 1

C ( x _C ) , φ _C=y 2

C ( x _C ) , . . . ) = ( φ _C=y _C ( x _C )) ^y C ∈X C

X ¹ = { 1, 2, 3, 4, 5 } , X ² = {− , } ,

X ³ = {− , Punk, Pop, . . . } , X ⁴ = {∎ , ∎ , ∎}

x = ( 2, − , − , ∎)

x ^′ = ( 1, − , Pop, ∎)

x ^′′ = ( 1, , Punk, ∎)

. . .

(15)

1 3 2

4

Graphische Modelle

Beispiel: Suffiziente Statistik

V = { 1, 2, 3, 4 } , E = {{ 1, 2 } , { 1, 3 } , { 1, 4 } , { 2, 3 } , { 3, 4 }} , C( G ) = V ∪ E ∪ {{ 1, 2, 3 } , { 1, 3, 4 }}

φ _C ( x _C ) = ( φ _C=y 1

C ( x _C ) , φ _C=y 2

C ( x _C ) , . . . ) = ( φ _C=y _C ( x _C )) ^y C ∈X C

X ¹ = { 1, 2, 3, 4, 5 } , X ² = {− , } ,

X ³ = {− , Punk, Pop, . . . } , X ⁴ = {∎ , ∎ , ∎}

x = ( 2, − , − , ∎) x ^′ = ( 1, − , Pop, ∎) x ^′′ = ( 1, , Punk, ∎) . . .

7 von 16

(16)

1 3 2

4

Graphische Modelle

Beispiel: Suffiziente Statistik

V = { 1, 2, 3, 4 } , E = {{ 1, 2 } , { 1, 3 } , { 1, 4 } , { 2, 3 } , { 3, 4 }} , C( G ) = V ∪ E ∪ {{ 1, 2, 3 } , { 1, 3, 4 }}

φ _C ( x _C ) = ( φ _C=y 1

C ( x _C ) , φ _C=y 2

C ( x _C ) , . . . ) = ( φ _C=y _C ( x _C )) ^y C ∈X C

X ¹ = { 1, 2, 3, 4, 5 } , X ² = {− , } ,

X ³ = {− , Punk, Pop, . . . } , X ⁴ = {∎ , ∎ , ∎}

x = ( 2, − , − , ∎)

x ^′ = ( 1, − , Pop, ∎)

x ^′′ = ( 1, , Punk, ∎)

. . .

(17)

1 3 2

4 Graphische Modelle

Beispiel: Suffiziente Statistik II

C = { 1, 2, 3 } , X C = X 1 × X 2 × X 3

X 1 = { 1, 2, 3, 4, 5 } , X 2 = {− , } , X ³ = {− , Punk, Pop, Rock, Schlager }

X C = {( 1, − , −) , ( 2, − , −) , . . . , ( 5, − , −) , ( 1, , −) , . . . , ( 5, , −) , ( 1, , Punk ) , . . . , ( 5, , Schlager )}

φ _{1,2,3} ( 5, , Schlager ) =

⎛ ⎜⎜

⎜⎜ ⎜⎜

⎝ 0 0

⋮ 0

⋮ 0 1

⎞ ⎟⎟

⎟⎟ ⎟⎟

⎠

φ _{1,2,3} ( 2, , −) =

⎛ ⎜⎜

⎜⎜ ⎜⎜

⎝ 0 0

⋮ 1

⋮ 0 0

⎞ ⎟⎟

⎟⎟ ⎟⎟

⎠

8 von 16

(18)

1 3 2

4 Graphische Modelle

Beispiel: Suffiziente Statistik II

C = { 1, 2, 3 } , X C = X 1 × X 2 × X 3

X 1 = { 1, 2, 3, 4, 5 } , X 2 = {− , } , X ³ = {− , Punk, Pop, Rock, Schlager }

X C = {( 1, − , −) , ( 2, − , −) , . . . , ( 5, − , −) , ( 1, , −) , . . . , ( 5, , −) , ( 1, , Punk ) , . . . , ( 5, , Schlager )}

φ _{1,2,3} ( 5, , Schlager ) =

⎛ ⎜⎜

⎜⎜ ⎜⎜

⎝ 0 0

⋮ 0

⋮ 0 1

⎞ ⎟⎟

⎟⎟ ⎟⎟

⎠

φ _{1,2,3} ( 2, , −) =

⎛ ⎜⎜

⎜⎜ ⎜⎜

⎝ 0 0

⋮ 1

⋮ 0 0

⎞ ⎟⎟

⎟⎟ ⎟⎟

⎠

(19)

1 3 2

4 Graphische Modelle

Exponentialfamilie

P β ( x ) = 1

Z ( β ) exp ( ⟨ β _{1,2,3} , φ _{1,2,3} ( x _{1,2,3} )⟩) exp (⟨ β _{1,3,4} , φ _{1,3,4} ( x _{1,3,4} )⟩ )

= exp (⟨ β, φ ( x )⟩ − A ( β ))

A ( β ) = log Z ( β ) = log ∑

x∈X exp (⟨ β, φ ( x )⟩)

9 von 16

(20)

Graphische Modelle

Maximum Entropie Prinzip: Aufgabe

Gegeben: Daten X (Realisierungen einer ZV X) und beliebige Funktion f ∶ X → R ^d

Gesucht: P mit E P [ f ( X )] = E ˜ D [ f ( X )] = _∣D∣ ¹ ∑ ^x∈D f ( x )

Problem: Viele P haben die gesuchte Eigenschaft!!

(21)

0 0.2 0.4 0.6 0.8 1

Entropy(p)

p(x=1)

Graphische Modelle

Maximum Entropie Prinzip: Intuition

Entropie H einer Zufallsvariable X mit P H( X ) = − ∑

x∈X P ( x ) log P ( x )

11 von 16

(22)

0 0.2 0.4 0.6 0.8 1

Entropy(p)

Graphische Modelle

Maximum Entropie Prinzip: Intuition

Sei P der Raum aller Wahrscheinlichkeitsfunktionen max

P ∈P H(P)

s.t. E _P [ f ( X )] = E ˜ D [ f ( X )] ← d Nebenbedingungen

(23)

0 0.2 0.4 0.6 0.8 1

Entropy(p)

p(x=1)

Graphische Modelle

Maximum Entropie Prinzip: Lösung

Umformulieren in Lagrange Funktion max

P ∈P H(P) + ∑ ^d

i=1 λ i E P [ f ( X )] ⁱ − E ˜ D [ f ( X )] ⁱ ableiten (nach P !) und = 0 setzen liefert

P ( x ) = exp (⟨ λ, f ( x )⟩ − A ( λ )) Also: Parameter von Exponentialfamilien sind Lagrange-Multiplikatoren der Nebenbedingung(en) E P [ f ( X )] = E ˜ D [ f ( X )]

13 von 16

(24)

0.4 0.6 0.8 1

Entropy(p)

Graphische Modelle

Maximum Entropie Prinzip: Lösung

Umformulieren in Lagrange Funktion max

P ∈P H(P) + ∑ ^d

i=1 λ i E P [ f ( X )] ⁱ − E ˜ D [ f ( X )] ⁱ ableiten (nach P !) und = 0 setzen liefert

P ( x ) = exp (⟨ λ, f ( x )⟩ − A ( λ ))

Also: Parameter von Exponentialfamilien sind

Lagrange-Multiplikatoren der Nebenbedingung(en)

E P [ f ( X )] = E ˜ D [ f ( X )]

(25)

Graphische Modelle

Parameterlernen durch Gradientenabstieg

Gegeben Datensatz D , Funktion φ (binär) Verlustfunktion:

Negative mittlere log-Likelihood

` ( β, D) = − 1

∣D∣ ∑ x∈D log P β ( x )

= − 1

∣D∣ ∑ x∈D ⟨ β, φ ( x )⟩ + A ( β )

= − ⟨ β, µ ˜ ⟩ + A ( β ) Partielle Ableitung:

∂` ( β, D)

∂β _i = − µ ˜ _i + ∂

∂β _i A ( β ) = µ ˆ _i − µ ˜ _i

14 von 16

(26)

Graphische Modelle

Parameterlernen durch Gradientenabstieg

Gegeben Datensatz D , Funktion φ (binär) Verlustfunktion:

Negative mittlere log-Likelihood

` ( β, D) = − 1

∣D∣ ∑ x∈D log P β ( x )

= − 1

∣D∣ ∑ x∈D ⟨ β, φ ( x )⟩ + A ( β )

= − ⟨ β, µ ˜ ⟩ + A ( β ) Partielle Ableitung:

∂` ( β, D)

∂β _i = − µ ˜ _i + ∂

∂β _i A ( β ) = µ ˆ _i − µ ˜ _i

(27)

Graphische Modelle

Parameterlernen durch Gradientenabstieg

Gegeben Datensatz D , Funktion φ (binär) Verlustfunktion:

Negative mittlere log-Likelihood

` ( β, D) = − 1

∣D∣ ∑ x∈D log P β ( x )

= − 1

∣D∣ ∑ x∈D ⟨ β, φ ( x )⟩ + A ( β )

= − ⟨ β, µ ˜ ⟩ + A ( β ) Partielle Ableitung:

∂` ( β, D)

∂β _i = − µ ˜ _i + ∂

∂β _i A ( β ) = µ ˆ _i − µ ˜ _i

14 von 16

(28)

Graphische Modelle

Parameterlernen durch Gradientenabstieg

Gegeben Datensatz D , Funktion φ (binär) Verlustfunktion:

Negative mittlere log-Likelihood

` ( β, D) = − 1

∣D∣ ∑ x∈D log P β ( x )

= − 1

∣D∣ ∑ x∈D ⟨ β, φ ( x )⟩ + A ( β )

= − ⟨ β, µ ˜ ⟩ + A ( β ) Partielle Ableitung:

∂` ( β, D)

∂β _i = − µ ˜ _i + ∂

∂β _i A ( β ) = µ ˆ _i − µ ˜ _i

(29)

Graphische Modelle

Marginalisierung

Wenn φ binär, dann ist µ _i = E [ φ _i ( X )] die Wahrscheinlichkeit für φ i ( X ) = 1

Annahme: Paarweises Modell ≡ Nur die Kantengewichte sind relevant

P( X _v = x ) = ∑

y∈X _V∖{v} P( y, x )

Ausnutzen der Faktorisierung sowie der Distributivität:

P( X v = x ) = 1

Z ∑

y∈X _V _∖{v} ∏

C∈C(G)

exp (⟨ β _C , φ C ( x C )⟩)

wobei x = ( y, x )

15 von 16

(30)

Graphische Modelle

Marginalisierung

Wenn φ binär, dann ist µ _i = E [ φ _i ( X )] die Wahrscheinlichkeit für φ i ( X ) = 1

Annahme: Paarweises Modell ≡ Nur die Kantengewichte sind relevant

P( X _v = x ) = ∑

y∈X _V∖{v} P( y, x )

Ausnutzen der Faktorisierung sowie der Distributivität:

P( X v = x ) = 1

Z ∑

y∈X _V _∖{v} ∏

C∈C(G)

exp (⟨ β _C , φ C ( x C )⟩)

wobei x = ( y, x )

(31)

Graphische Modelle

Marginalisierung

Wenn φ binär, dann ist µ _i = E [ φ _i ( X )] die Wahrscheinlichkeit für φ i ( X ) = 1

Wissensentdeckung in Datenbanken Probabilistische Graphische Modelle II Nico Piatkowski und Uwe Ligges

Graphische Modelle

Wissensentdeckung in Datenbanken

Probabilistische Graphische Modelle II

Nico Piatkowski und Uwe Ligges

Informatik—Künstliche Intelligenz Computergestützte Statistik Technische Universität Dortmund

27.06.2017

1 von 16

Graphische Modelle

Überblick

Was bisher geschah...

Modellklassen Verlustfunktionen

Numerische Optimierung Regularisierung

Überanpassung SQL, Häufige Mengen SVM, xDA, Bäume, . . . Graphische Modelle Heute

Graphische Modelle—Theorie und Algorithmen

Graphische Modelle

Überblick

Was bisher geschah...

Modellklassen Verlustfunktionen

Numerische Optimierung Regularisierung

Überanpassung SQL, Häufige Mengen SVM, xDA, Bäume, . . . Graphische Modelle Heute

Graphische Modelle—Theorie und Algorithmen

2 von 16

Graphische Modelle

Überblick

Suffiziente Statistiken Maximum-Entropie Gradient

Randverteilung

Belief Propagation

Gibbs Sampling

1

3 2

4

Graphische Modelle

Graph

G = ( V, E ) mit Knotenmenge V und Kantenmenge E Hier: V = { 1, 2,3, 4 } , E = {{ 1, 2 } , { 1, 3 } , { 1, 4 } , { 2, 3 } , { 3, 4 }}

Cliquen: C( G ) = V ∪ E ∪ {{ 1, 2, 3 } , { 1, 3, 4 }}

4 von 16

1

3 2

4

Graphische Modelle

Graph

G = ( V, E ) mit Knotenmenge V und Kantenmenge E Hier: V = { 1, 2,3, 4 } , E = {{ 1, 2 } , { 1, 3 } , { 1, 4 } , { 2, 3 } , { 3, 4 }}

Cliquen: C( G ) = V ∪ E ∪ {{ 1, 2, 3 } , { 1, 3, 4 }}

Graphische Modelle

Suffiziente Statistik

Daten D , Modell mit Parameter β

Funktion φ ist eine suffiziente Statistik ⇔ β ⊥⊥ D ∣ φ (D) mit φ (D) = ∑ x∈D φ ( x )

Für diskrete X :

φ ist immer gegeben durch φ C=y C ( x C ) = ∏ v∈C 1 x v =y v

mit C ∈ C( G ) , x C ∈ X C , x ∈ X φ C ( x C ) = ( φ C=y 1

C ( x C ) , φ C=y 2

C ( x C ) , . . . ) = ( φ C=y C ( x C )) y C ∈X C

5 von 16

Graphische Modelle

Suffiziente Statistik

Daten D , Modell mit Parameter β

Funktion φ ist eine suffiziente Statistik ⇔ β ⊥⊥ D ∣ φ (D) mit φ (D) = ∑ x∈D φ ( x )

Für diskrete X :

φ ist immer gegeben durch φ C=y C ( x C ) = ∏ v∈C 1 x v =y v

mit C ∈ C( G ) , x C ∈ X C , x ∈ X φ C ( x C ) = ( φ C=y 1

C ( x C ) , φ C=y 2

C ( x C ) , . . . ) = ( φ C=y C ( x C )) y C ∈X C

Graphische Modelle

Beispiel: Suffiziente Statistik

6 von 16

Graphische Modelle

Beispiel: Suffiziente Statistik

Graphische Modelle

Beispiel: Suffiziente Statistik

6 von 16

Graphische Modelle

Beispiel: Suffiziente Statistik

Graphische Modelle

Beispiel: Suffiziente Statistik

6 von 16

1

3 2

4

Graphische Modelle

Funktion φ ist eine suffiziente Statistik ⇔ β ⊥⊥ D ∣ φ (D) mit φ (D) = ∑ ^x∈D φ ( x )

φ ist immer gegeben durch φ _C=y _C ( x _C ) = ∏ v∈C 1 x v =y _v

mit C ∈ C( G ) , x _C ∈ X C , x ∈ X φ _C ( x _C ) = ( φ _C=y 1

C ( x _C ) , φ _C=y 2

C ( x _C ) , . . . ) = ( φ _C=y _C ( x _C )) y _C ∈X C

Funktion φ ist eine suffiziente Statistik ⇔ β ⊥⊥ D ∣ φ (D) mit φ (D) = ∑ ^x∈D φ ( x )

φ ist immer gegeben durch φ _C=y _C ( x _C ) = ∏ v∈C 1 x v =y _v

mit C ∈ C( G ) , x _C ∈ X C , x ∈ X φ _C ( x _C ) = ( φ _C=y 1

C ( x _C ) , φ _C=y 2

C ( x _C ) , . . . ) = ( φ _C=y _C ( x _C )) y _C ∈X C

φ _C ( x _C ) = ( φ _C=y 1

C ( x _C ) , φ _C=y 2

C ( x _C ) , . . . ) = ( φ _C=y _C ( x _C )) ^y C ∈X C

X ¹ = { 1, 2, 3, 4, 5 } , X ² = {− , } ,

X ³ = {− , Punk, Pop, . . . } , X ⁴ = {∎ , ∎ , ∎}

x ^′ = ( 1, − , Pop, ∎)

x ^′′ = ( 1, , Punk, ∎)

φ _C ( x _C ) = ( φ _C=y 1

C ( x _C ) , φ _C=y 2

C ( x _C ) , . . . ) = ( φ _C=y _C ( x _C )) ^y C ∈X C

X ¹ = { 1, 2, 3, 4, 5 } , X ² = {− , } ,

X ³ = {− , Punk, Pop, . . . } , X ⁴ = {∎ , ∎ , ∎}

x = ( 2, − , − , ∎) x ^′ = ( 1, − , Pop, ∎) x ^′′ = ( 1, , Punk, ∎) . . .

φ _C ( x _C ) = ( φ _C=y 1

C ( x _C ) , φ _C=y 2

C ( x _C ) , . . . ) = ( φ _C=y _C ( x _C )) ^y C ∈X C

X ¹ = { 1, 2, 3, 4, 5 } , X ² = {− , } ,

X ³ = {− , Punk, Pop, . . . } , X ⁴ = {∎ , ∎ , ∎}

x ^′ = ( 1, − , Pop, ∎)

x ^′′ = ( 1, , Punk, ∎)

X 1 = { 1, 2, 3, 4, 5 } , X 2 = {− , } , X ³ = {− , Punk, Pop, Rock, Schlager }

φ _{1,2,3} ( 5, , Schlager ) =

φ _{1,2,3} ( 2, , −) =