Intelligent Systems:
Three Practical Questions
Carsten Rother
Prüfungsfragen
• Nur vom zweiten Teil der Vorlesung (Dimitri Schlesinger, Carsten Rother)
• Drei Typen von Aufgaben:
1) Algorithmen
2) Definitionen und Wissensfragen 3) Theoretische Herleitungen
• Fragen werden auf Deutsch gestellt
1) Algorithmen
Was würde ein parallelisierter ICM Algorithmus in den nächsten zwei Schritten machen? Bitte zeichnen sie ein.
Gegeben die Energy:
𝑥𝑖 ∈ {0,1}
𝑥1 𝑥2
𝑥3 𝑥4
Initializer Zustand:
𝑥1=0 𝑥2=1
𝑥3= 1 𝑥4=0
Schritt 1:
𝑥1=? 𝑥2=?
𝑥3= ? 𝑥4=?
Hinweis: dunkle Konten werden im ersten Schritt nicht verändert
𝑥1=? 𝑥2=?
𝑥3= ? 𝑥4=?
Schritt 2:
𝜃1 0 = 0, 𝜃1 1 = 1 𝜃2 0 = 1, 𝜃2 1 = 1 𝜃3 0 = 2, 𝜃3 1 = 1 𝜃4 0 = 1, 𝜃4 1 = 2 𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗 = |𝑥𝑖 − 𝑥𝑗|
For all 𝑖, 𝑗
2) Definitionen und Wissensfragen
Frage: Was ist ein strukturiertes Model (“structured model”)?
Antwort: Ein Modell in dem die Ausgaben nicht unabhängig sind.
Der Zusammenhang der Ausgaben wird modelliert.
3) Theoretische Herleitungen
Man berechne die Wahrscheinlichkeit dafür, dass die Summe der Augenzahlen zweier voneinander unabhängig gewürfelten
Spielwürfel durch 5 teilbar ist.
(ähnliche Aufgaben wurden in der Vorlesung und Übung betrachtet)
Roadmap this lecture
• The question of “the combinatorial explosion”
• The question of “generalization”
• The questions with “Big Data”
Roadmap this lecture
• The question of “the combinatorial explosion”
• The question of “generalization”
• The questions with “Big Data”
Going back to 1973
• Sir James Lighthill report to the British Parliament
The general purpose robot is a mirage
“Ein Roboter der alles kann ist eine Illusion”
Going back to 1973
• Sir James Lighthill report to the British Parliament
He specifically mentioned the problem of "combinatorial explosion" or "intractability", which implied that many of AI's most successful algorithms would grind to a halt on real world problems and were only suitable for solving "toy" tasks.
(My) View: 2015
• Better computers (faster, more memory, clusters, GPUs)
• More powerful methods which can deal with combinatorial exploration (e.g. graph cut, search)
• Better sensors (e.g. time-of-flight)
• More (labeled) data can give better, more adaptive methods (e.g. deep learning)
• Humans can do it, and they have a finite-sized brain.
Roadmap this lecture
• The question of “the combinatorial explosion”
• The question of “generalization”
• The questions with “Big Data”
Mushroom example
Measure attributes:
- Size in centimeters
- Color: Average “whitness” of the whole mushroom, e.g. diagonal in RGB cube
Size
white
0 255
2cm 15cm eatable not
eatable
eatable not
eatable
not
eatable not eatable
Task: Build a decision tree such that you can distgunish eatable form not eatable mushrooms.
Decision Tree
Size
white
0 255
2cm 15cm
Size>8cm
Decision Tree
Size
white
0 255
2cm 15cm
Size>8cm white>180
eatable
not eatable not
eatable
eatable
not eatable
not eatable
Decision Tree – Split Criteria
Before split
Shannon’s entropy
Split 1Split 2
Information gain
Think of minimizing Entropy
Decision Tree – Split Criteria
• We have 𝑆 = 12 with 𝑆𝐿 = 6 and 𝑆𝑅 = 6
• In 𝑆 we have 6 red and 6 blue points (2 classes)
• 𝐻(𝑆) = − 0.5 log 0.5 + 0.5 log 0.5 = 1
• We look at two possible splits:
1) 50%-50% class-split (each side (𝑆𝐿 and 𝑆𝑅) gets 3 red and 3 blue) 𝐻 𝑆𝐿 = − 0.5 log 0.5 + 0.5 log 0.5 = 1
𝐻 𝑆𝑅 = − 0.5 log 0.5 + 0.5 log 0.5 = 1 𝐼(𝑆) = 𝐻(𝑆) – (0.5 + 0.5) = 𝐻(𝑆) – 1 = 0
2) 16%-84% class-split (left side has 5 red and 1 blue, right side has 5 blue and 1 red) 𝐻 𝑆𝐿 = − 1
6log 1
6 + 5
6log 5
6 = 0.64 𝐻 𝑆𝑅 = − 1
6log 1
6 + 5
6log 5
6 = 0.64
𝐼 𝑆 = 𝐻 𝑆 – 0.5 ∗ 0.64 + 0.5 ∗ 0.64 = 𝐻 𝑆 – 0.64 = 0.36
(Higher information gain) (Lower information gain)
Generalization
eatable not eatable
eatable not
eatable
not
eatable not eatable
Training Data:
Size>8cm white>180
eatable
not eatable
not eatable
Test Data:
Generalization
eatable not eatable
eatable not
eatable
not
eatable not eatable
Training Data:
Size>8cm white>180
eatable
not eatable
not eatable
Test Data:
System may not be optimal!
Generalization
Size
white
0 255
2cm 15cm
BUT its eatable!
BUT its not eatable!
eatable not
eatable not
eatable
Definition
• Over-fitting: Is the effect that the model perfectly memorizes the training data, but does not perform well on test data
• Generalization: One of the most important aspect of a model is its ability to generalize. That means that new (unseen) test data is
correctly classified. A model which overfitts does not generalize well.
• How to avoid over-fitting?
• Idea 1: Max-Margin
• Idea 2: Do not make the trees too deep
Max-Margin
Size
white
0 255
2cm 15cm
Size>9cm white>190
eatable
not eatable not
eatable
Place “decision boundary” such that the
not eatable
not eatable eatable
Do not make the trees too deep
Size
white
0 255
2cm 15cm
Size>9cm
Output:
𝑝(𝑒𝑎𝑡𝑎𝑏𝑙𝑒) = 2/3
𝑝(𝑛𝑜𝑡 𝑒𝑎𝑡𝑎𝑏𝑙𝑒) = 1/3 not eatable
not eatable
eatable
Do not make the trees too deep - Comparison
Size
white
0 255
2cm 15cm Test data:
System with 2 nodes makes:
2 mistakes on test data
Not eatable
eatable
Not eatable System with 1 nodes makes:
2 mistakes on test data
This means: system with one node makes a mistake during training, but performs equally
Example: People tracking
body joint hypotheses
front view side view top view
input depth image body parts
Bodypart
Labelling Clustering
Body is divided into 31 body parts
(simple centroid computation)
Small Model capacity
Is the part round?
Training data
Model
Test time:
Color red
Is it darker below me?
Large Model capacity
Training data
Model
Test time:
Tree is very deep (e.g. depth 30)
The Model perfectly memorizes the training data!
Is it darker below me?
Is the colour of my pixel 130?
Is the part round?
Tree Depth
Number of Parameters (tree depth) More training
data pushes this boundary
Infinite Training Data
Model
Test time:
Is the colour of my pixel 131?
Tree is very deep (e.g. depth 30)
Is the colour of my pixel 130?
The Model perfectly memorizes the training data!
If we were to have all possible training images
that can occur (inclusive noise)
Real System that runs on Xbox!
acc ur acy
Amount of Training Data is one of the main factors
Roadmap this lecture
• The question of “the combinatorial explosion”
• The question of “generalization”
• The questions with “Big Data”
The important factors
Performance Model Power
Training Data
More Training Data = More powerful models = Better Performance (unlabeled and labeled!)
(e.g. tree depth)
The Increase in Data
2005
~100 training images
~100 Parameters
~10M images
~60M Parameters 2014
accuracy rate: ~20%
accuracy rate: ~85%
ImageNet Challenge
Deep Models have many parameters
The return of Neural Networks from the ~70s … made them work
Major leap forward in ImageNet Challenge
Where does the training Data come from?
1. Annotation by researchers:
~1K training images
2. Search Engines:
type in “cow”
3. Crowd sourcing:
~1M training images 4. Simulation:
~10M training images
Where does the training Data come from?
1. Get images of tigers
2. Search Engines: “type in tiger”
3. Crowd sourcing: run e.g. Amazon Mechanical Turk to filter out bad images.
(Start-up companies)
Where does the training Data come from?
ImageNet challenge
1.2 Million annotated images; 1000+ classes
Where does the training Data come from
• Simulation:
~10M training images
Record mocap
100,000s of poses
Retarget to varied body shapes
Render (depth, body parts) pairs + add noise
[Vicon]