Organisation Inhalt Fragen Beispiele Überblick
Wissensentdeckung in Datenbanken
Organisation und Überblick
Nico Piatkowski und Uwe Ligges
Informatik—Künstliche Intelligenz Computergestützte Statistik Technische Universität Dortmund
18.04.2017
Organisation Inhalt Fragen Beispiele Überblick
Fakten
Team
Vorlesung: Uwe Ligges, Nico Piatkowski
Übung: Sarah Schnackenberg, Sebastian Buschjäger
28 Termine (13/13/2)
Dienstags — 10:15 Uhr bis 12:00 Uhr Donnerstags — 14:15 Uhr bis 16:00 Uhr
Feiertage
25.05. (Christi Himmelfahrt)
15.06. (Fronleichnam)
Organisation Inhalt Fragen Beispiele Überblick
Inhalt
18.4. L,P Übersicht, Einführung 20.4. L Statistik 1
25.4. L Statistik 2
27.4. L Stichproben, Versuchsplanung, Datenvorverarbeitung 02.5. Optimierung, Modellklassen, Lineares Modell,
P Bias-Varianz, Overfitting 1
04.5. Optimierung, Modellklassen, Lineares Modell, P Bias-Varianz, Overfitting 2
09.5. P Data Cube, Frequent sets, Apriori, FPgrowth 1 11.5. P Data Cube, Frequent sets, Apriori, FPgrowth 2 16.5. L Einführung Klassifikation, BayesRegel,
Logistische Regression
18.5. L kNN, Ähnlichkeitsmaße, Modellselektion
23.5. L Resampling, Klassifikationsbeurteilung 2
Organisation Inhalt Fragen Beispiele Überblick
Inhalt
30.5. P SVM
01.6. L Diskriminanzanalyse (LDA) 1
06.6. L Diskriminanzanalyse (LDA, QDA, RDA) 2 08.6. L von Entscheidungsbäumen (CART) zu Wäldern 13.6. L Ensemble Methoden (Bagging, Boosting) 20.6. L Stetige Modelle Hauptkomponentenanalyse) 22.6. P Graphische Modelle 1
27.6. P Graphische Modelle 2
29.6. P Nicht-glatte und stochastische Optimierung
04.7. P Merkmalsselektion, Struktur Lernen, Regularisierung
Organisation Inhalt Fragen Beispiele Überblick
Inhalt
06.7. P Clustering, k-Means, Gaussian Mixture, Latent Dirichlet Allocation 1
11.7. P Clustering, k-Means, Gaussian Mixture, Latent Dirichlet Allocation 2
13.7. L Hierarchisches Clustern; Zeitreihen 1 18.7. L Zeitreihen 2
20.7. P Künstliche Neuronale Netze 1
25.7. P Künstliche Neuronale Netze 2
27.7. L,P Zusammenfassung; Rückblick
Organisation Inhalt Fragen Beispiele Überblick
Fragen und Kommentare
Unsere Bitte:
Mitdenken, kommentieren und Fragen stellen!
Die unterschiedliche Terminologie mag zunächst verwirren:
Merkmal, Feature, Variable, Parameter im Sprachgebrauch in
Informatik, Mathematik und Statistik.
Organisation Inhalt Fragen Beispiele Überblick
Beispiele für KDD / Data Mining
Telekommunikationstechnik
riesige Datenbestände, Prozesskontrolle, Zeitreihenanalyse, Zuverlässigkeitsanalysen CallCenter
Warteschlangen,
Klassifikation des Problems mit wenig Information Kundenkartenanalyse
riesige Datenbestände (und große Anzahl Variablen), Vorhersage von Kaufverhalten und
Warengruppenzuordnung Genomik
riesige Datenbestände (und große Anzahl Variablen),
viel Rauschen, Klassifikation zur Wissensentdeckung
Organisation Inhalt Fragen Beispiele Überblick
Statistische Grundlagen
Maß, Dichte, Erwartungswert, Zufallsvariable, . . .
P( X ∈ S ) = ∫ S d P ○ X −1
d ν d ν = ∫ S p d ν
p ( X = x ∣ Y = y ) = p ( X = x, Y = y ) p Y ( y )
E[ φ ( X )] = ∫ X φ d ν
Organisation Inhalt Fragen Beispiele Überblick
Regression
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
0 20 40 60 80 100
100 150 200
alter[x]
messwert_bp_sys[x]
Männer − Interaktion
Frauen − Interaktion
F − nur Geschlecht
F − gleiche Steigung
0 0.2 0.4 0.6 0.8 1
0 0.2 0.4 0.6 0.8 1
Bias Varianz
Error
Complexity
Organisation Inhalt Fragen Beispiele Überblick
Maschinelles Lernen
Optimierung, Modell, Verlustfunktion, Overfitting, . . . E[( y − f ˆ ( x )) 2 ] = B[ f ˆ ( x ) 2 ] + V[ f ˆ ( x )] + σ 2 β t+1 = β t − η t ∇ ` ( β t ; D)
f ˆ = min
f ∈F
` ( f ; D)
Organisation Inhalt Fragen Beispiele Überblick
Frequent Set Mining
Datenbanken, Häufige Mengen, A-Priori, FP-Trees, . . . SELECT * FROM transactions WHERE ...
p ( A, B ) ≤ min { p ( A ) , p ( B )}
Organisation Inhalt Fragen Beispiele Überblick
Klassifikation
Logistische Regression, k-NN, Distanzmaße, . . . .
p ( y ∣ X = x ) = 1
1 + exp(−(β 0 + ⟨β, x⟩))
dist ( x, y ) = ∥ x − y ∥ 2 =
¿ Á Á À ∑ d
i=1
( x i − y i ) 2
Organisation Inhalt Fragen Beispiele Überblick
Klassifikation
1 2 3 4 5 6 7
k−NN auf IRIS mit k=1
Sepal.Length
Petal.Length
●
●●
● ●
●
● ●
● ●● ●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●● ●●
● ●
● ●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
● ●
●
●
●●
●
●
●
●
●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
5 6 7
k−NN auf IRIS mit k=7
Sepal.Length
Petal.Length
●
●●
● ●
●
● ●
● ●● ●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●● ●●
● ●
● ●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
● ●
●
●
●●
●
●
●
●
●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
5 6 7
Organisation Inhalt Fragen Beispiele Überblick
Modelle und Sampling
Modellselektion, Sampling, Klassifikationsgüte, . . . BIC = d log ( N ) − 2` ( β ∗ ; D)
F 1 = 2 × PREC × REC
PREC + REC
x ∼ P
Organisation Inhalt Fragen Beispiele Überblick
Stützvektormethode
Hyperebene, Hinge-Loss, Merkmalsraum, Kernel, . . . f ( x ) = b + ⟨ β, φ ( x )⟩
K Gauss ( x, y ) = ⟨ φ ( x ) , φ ( y )⟩ = exp (− 1
γ ∥ x − y ∥ 2 2 )
` Hinge ( y ∗ , x, β ) = max { 0, 1 − y ∗ × ( b + ⟨ β, φ ( x )⟩)}
Organisation Inhalt Fragen Beispiele Überblick
Diskriminanzanalyse
Linear, Quadratic, Regularized, . . .
h L i ( x ) ∶= ( Σ −1 µ i ) ′ x − 0.5µ ′ i Σ −1 µ i + ln ( π i )
h Q i ( x ) ∶= − 0.5 ( x − µ i ) ′ Σ −1 i ( x − µ i ) + ln ( π i ) − 0.5 ln ( det ( Σ i ))
Σ ˆ i (δ, λ) ∶= (1 − λ) Σ ˆ i (δ) + λ ⋅ tr[ Σ ˆ i (δ)]
p I
Organisation Inhalt Fragen Beispiele Überblick
Diskriminanzanalyse
111 22 22 2222
6 66 6 66 6 2 22 2 22 2
4 444
11 1
666 666 6
111111111 1111 11 1
1 1
1
222 2 22
222 4 44
222 4 4 44 4 Error: 0.051
−2−10123
111 222 2 2222
66 66 66 6
2 2 2 2 222
4444
1 11
6 66
66 6 6
1 11 1 11 11111 11 11 1
1 1
1 222
2 22
222 4 44
22 2
4 4444 Error: 0.051
−2 −1 0 1 2 3
1112222 22
2 2
6 6 6 66 66
22 22 22 2
44 44
11 1
66 6 666 6
111 1 111 11 11 1 1 111
1
1 1
222 22 2 222
44 4 222
44 44 4 Error: 0.103
−3−1012
11 12222
2222 6 6 6666 6
222 2
22 2 4 4 44
1 11
6 66
66 6 6
1 111 11 1111 1 1 1 111 1 1
1 22 2 2 22
2 22
4 44
222 4444 4 Error: 0.103
−3 −10 1 2
111 222 2 22 2 2
6 6 6 6 6 6 6
22 22 22
2 4444
11 1
66 6 6 6 66
111 1 11 1111 1 1
1 11
1 1
1 1 222
22 2 222
444 222
44 44 4 Error: 0.064
11 1
22222222 6 6 6 6
6 6 6 222 2
22 2 4 4 44
11 1
66 6 6666
111 11 111 11 111 111 1 1
1
22 2 222
222 4 44
222 4444 4 Error: 0.064
111 2 22 2 2222
6 66 66 66
22 22 22 2
44 44
11 1
6 66 6666
1 111 111 1 11111 11 1 111
22 2
22 2
2 22
4 44
222 4 44 44 Error: 0.077
−10123
−1 0 1 2 3
111 2 22 2
2222 66666 66
222 2 22 2
44 44
1 11
6 66
66 6 6
11 11 11 1 111111 111 1 1 1
22 2 2 22
222 4 44
222 4 4444 Error: 0.077
−1 0 1 2 3
−10123
11 1
2 22 2 2222
6 666 6 6 6
22 22 2 2
2 44
44
111 6 66 6 66 6
1 11 11 111111 11 11
1 111
22 2
22 2
2 22
4 44
22 2
4 4444 Error: 0.064
−2 −1 0 1 2 3
11 1
2 22 2 2222
6 666 66 6 222 2 22 2
44 44
11 1
666 66 66
11 11 11 11 111 11 111 1 1 1
22 2 2 22
222 4 44
222 4 4444 Error: 0.064
−2−10123
11 1 2
22 2
2222 6 6 6 66
6 6
222 2
22 2
44 44
11 1
6 66
6 66 6
111 11 111 11 11 1 111 1 11
22 2
22 2
2 2 2
4 44
222 4444 4 Error: 0.09
−3 −10 1 2
111 2 22 2
22 2 2
6 66
66 66
222 22 2 2 4444
11 1
66 6 6 6 66
11 11 11 11 11 1 1 1 1111 1 1
22 2 222
222 4 44 222
4444 4 Error: 0.09
−3−1012
−2−10123
−2−1 0 1 2 3
v3
v6
v14
−2 0 1 2 3 4
−201234
v17
Organisation Inhalt Fragen Beispiele Überblick
Ensembles
Bäume, Bagging, Boosting, Radom Forests . . .
glucose p < 0.001
1
≤≤130 >>130
glucose p < 0.001
2
≤≤101 >>101
pregnant p = 0.033 3
≤≤9 >>9
mass p = 0.042
4
≤≤34 >>34
n = 85
posneg
0 0.2 0.4 0.6 0.8
1 n = 41
posneg
0 0.2 0.4 0.6 0.8
1 n = 11
posneg
0 0.2 0.4 0.6 0.8 1
pedigree p = 0.017 8
≤≤0.52 >>0.52
n = 136
posneg
0 0.2 0.4 0.6 0.8
1 n = 60
posneg
0 0.2 0.4 0.6 0.8 1
mass p < 0.001
11
≤≤29.9 >>29.9
n = 48
posneg
0 0.2 0.4 0.6 0.8
1 n = 119
posneg
0 0.2 0.4 0.6 0.8 1
Organisation Inhalt Fragen Beispiele Überblick
Graphische Modelle
Exponentialfamilien, Markov Random Fields, Belief-Propagation, . . .
p θ ( x ) = exp (⟨ θ, φ ( x )⟩ − A ( θ ))
p θ ( x ) = 1 Z ∏
v∈V
ψ v ( x v ) ∏
(v,u)∈E
ψ vu ( x v , x u )
m u→v ( x ) = ∑
y∈X u
ψ u ( y ) ψ uv ( x, y ) ∏
w∈N u ∖{v}
m w→u ( y )
0 0.2 0.4 0.6 0.8 1
0 0.2 0.4 0.6 0.8 1
Entropy(p)
p(x=1)
0 0.2 0.4 0.6 0.8 1
-1 -0.5 0 0.5 1
Regularization
Parameter
Organisation Inhalt Fragen Beispiele Überblick
Strukturlernen
Merkmalsselektion, Regularisierung, nicht-glatte Optimierung, . . .
I(v, u ) = ∑
x,y
p vu (x, y) log p vu (x, y) p v ( x ) p u ( y )
min
β∈ R d
` ( β; D) + λR ( β )
prox λ∥⋅∥ 1 ( β ) i = ⎧⎪⎪⎪ ⎪⎨
⎪⎪⎪⎪ ⎩
β i − λ , β i ≥ + λ
0 , ∣β i ∣ < λ
β i + λ , β i ≤ − λ
Organisation Inhalt Fragen Beispiele Überblick
Clusteranalyse
k-Means, Gaussian Mixture, Latent Dirichlet Allocation, . . .
min
C⊆ R d ,∣C∣=k ∑
x∈D
min c∈C ∥ x − c ∥ 2
p(w d i ∣ z d i , φ)p(z d i ∣ θ d )p(θ d ∣ α)p(φ ∣ β)
Organisation Inhalt Fragen Beispiele Überblick
Zeitreihenanalyse
Glättung, von Autoregression bis SARIMA, Schwingungen
y t = β 1 + β 2 y t−1 + . . . + β p+1 y t−p + t − γ 1 t−1 − . . . − γ q t−q
y t = β 1 + ∑ K
k=1
( β 2k cos ( 2πλ k t ) + β 2k+1 sin ( 2πλ k t )) + t
Organisation Inhalt Fragen Beispiele Überblick
Zeitreihenanalyse
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
150002000025000300003500040000
Weinverkauf australischer Winzer
Datum
Anzahl Flaschen < 1l
1980 1982 1984 1986 1988 150002000025000300003500040000
Weinverkauf australischer Winzer
Datum
Anzahl Flaschen < 1l
1980 1982 1984 1986 1988
20000250003000035000
Weinverkauf australischer Winzer
Datum
Anzahl Flaschen < 1l
1980 1982 1984 1986 1988
20000250003000035000
Weinverkauf australischer Winzer
Datum
Anzahl Flaschen < 1l
1980 1982 1984 1986 1988