Intelligent Systems:

(1)

Intelligent Systems:

Three Practical Questions

Carsten Rother

(2)

Prüfungsfragen

• Nur vom zweiten Teil der Vorlesung (Dimitri Schlesinger, Carsten Rother)

• Drei Typen von Aufgaben:

1) Algorithmen

2) Definitionen und Wissensfragen 3) Theoretische Herleitungen

• Fragen werden auf Deutsch gestellt

(3)

1) Algorithmen

Was würde ein parallelisierter ICM Algorithmus in den nächsten zwei Schritten machen? Bitte zeichnen sie ein.

Gegeben die Energy:

𝑥_𝑖 ∈ {0,1}

𝑥₁ 𝑥₂

𝑥₃ 𝑥₄

Initializer Zustand:

𝑥₁=0 𝑥₂=1

𝑥₃= 1 𝑥₄=0

Schritt 1:

𝑥₁=? 𝑥₂=?

𝑥₃= ? 𝑥₄=?

Hinweis: dunkle Konten werden im ersten Schritt nicht verändert

𝑥₁=? 𝑥₂=?

𝑥₃= ? 𝑥₄=?

Schritt 2:

𝜃₁ 0 = 0, 𝜃₁ 1 = 1 𝜃₂ 0 = 1, 𝜃₂ 1 = 1 𝜃₃ 0 = 2, 𝜃₃ 1 = 1 𝜃₄ 0 = 1, 𝜃₄ 1 = 2 𝜃_𝑖𝑗 𝑥_𝑖, 𝑥_𝑗 = |𝑥_𝑖 − 𝑥_𝑗|

For all 𝑖, 𝑗

(4)

2) Definitionen und Wissensfragen

Frage: Was ist ein strukturiertes Model (“structured model”)?

Antwort: Ein Modell in dem die Ausgaben nicht unabhängig sind.

Der Zusammenhang der Ausgaben wird modelliert.

(5)

3) Theoretische Herleitungen

Man berechne die Wahrscheinlichkeit dafür, dass die Summe der Augenzahlen zweier voneinander unabhängig gewürfelten

Spielwürfel durch 5 teilbar ist.

(ähnliche Aufgaben wurden in der Vorlesung und Übung betrachtet)

(6)

Roadmap this lecture

• The question of “the combinatorial explosion”

• The question of “generalization”

• The questions with “Big Data”

(7)

Roadmap this lecture

• The question of “the combinatorial explosion”

• The question of “generalization”

• The questions with “Big Data”

(8)

Going back to 1973

• Sir James Lighthill report to the British Parliament

The general purpose robot is a mirage

“Ein Roboter der alles kann ist eine Illusion”

(9)

Going back to 1973

• Sir James Lighthill report to the British Parliament

He specifically mentioned the problem of "combinatorial explosion" or "intractability", which implied that many of AI's most successful algorithms would grind to a halt on real world problems and were only suitable for solving "toy" tasks.

(10)

(My) View: 2015

• Better computers (faster, more memory, clusters, GPUs)

• More powerful methods which can deal with combinatorial exploration (e.g. graph cut, search)

• Better sensors (e.g. time-of-flight)

• More (labeled) data can give better, more adaptive methods (e.g. deep learning)

• Humans can do it, and they have a finite-sized brain.

(11)

Roadmap this lecture

• The question of “the combinatorial explosion”

• The question of “generalization”

• The questions with “Big Data”

(12)

Mushroom example

Measure attributes:

- Size in centimeters

- Color: Average “whitness” of the whole mushroom, e.g. diagonal in RGB cube

Size

white

0 255

2cm 15cm eatable not

eatable

eatable not

eatable

not

eatable not eatable

Task: Build a decision tree such that you can distgunish eatable form not eatable mushrooms.

(13)

Decision Tree

Size

white

0 255

2cm 15cm

Size>8cm

(14)

Decision Tree

Size

white

0 255

2cm 15cm

Size>8cm white>180

eatable

not eatable not

eatable

not eatable

(15)

Decision Tree – Split Criteria

Before split

Shannon’s entropy

Split 1Split 2

Information gain

Think of minimizing Entropy

(16)

Decision Tree – Split Criteria

• We have 𝑆 = 12 with 𝑆^𝐿 = 6 and 𝑆^𝑅 = 6

• In 𝑆 we have 6 red and 6 blue points (2 classes)

• 𝐻(𝑆) = − 0.5 log 0.5 + 0.5 log 0.5 = 1

• We look at two possible splits:

1) 50%-50% class-split (each side (𝑆^𝐿 and 𝑆^𝑅) gets 3 red and 3 blue) 𝐻 𝑆^𝐿 = − 0.5 log 0.5 + 0.5 log 0.5 = 1

𝐻 𝑆^𝑅 = − 0.5 log 0.5 + 0.5 log 0.5 = 1 𝐼(𝑆) = 𝐻(𝑆) – (0.5 + 0.5) = 𝐻(𝑆) – 1 = 0

2) 16%-84% class-split (left side has 5 red and 1 blue, right side has 5 blue and 1 red) 𝐻 𝑆^𝐿 = − ¹

6log ¹

6 + ⁵

6log ⁵

6 = 0.64 𝐻 𝑆^𝑅 = − ¹

6log ¹

6 + ⁵

6log ⁵

6 = 0.64

𝐼 𝑆 = 𝐻 𝑆 – 0.5 ∗ 0.64 + 0.5 ∗ 0.64 = 𝐻 𝑆 – 0.64 = 0.36

(Higher information gain) (Lower information gain)

(17)

Generalization

eatable not eatable

eatable not

eatable

not

eatable not eatable

Training Data:

Size>8cm white>180

eatable

not eatable

Test Data:

(18)

Generalization

eatable not eatable

eatable not

eatable

not

eatable not eatable

Training Data:

Size>8cm white>180

eatable

not eatable

Test Data:

System may not be optimal!

(19)

Generalization

Size

white

0 255

2cm 15cm

BUT its eatable!

BUT its not eatable!

eatable not

eatable

(20)

Definition

• Over-fitting: Is the effect that the model perfectly memorizes the training data, but does not perform well on test data

• Generalization: One of the most important aspect of a model is its ability to generalize. That means that new (unseen) test data is

correctly classified. A model which overfitts does not generalize well.

• How to avoid over-fitting?

• Idea 1: Max-Margin

• Idea 2: Do not make the trees too deep

(21)

Max-Margin

Size

white

0 255

2cm 15cm

Size>9cm white>190

eatable

not eatable not

eatable

Place “decision boundary” such that the

not eatable

not eatable eatable

(22)

Do not make the trees too deep

Size

white

0 255

2cm 15cm

Size>9cm

Output:

𝑝(𝑒𝑎𝑡𝑎𝑏𝑙𝑒) = 2/3

𝑝(𝑛𝑜𝑡 𝑒𝑎𝑡𝑎𝑏𝑙𝑒) = 1/3 not eatable

not eatable

eatable

(23)

Do not make the trees too deep - Comparison

Size

white

0 255

2cm 15cm Test data:

System with 2 nodes makes:

2 mistakes on test data

Not eatable

eatable

Not eatable System with 1 nodes makes:

2 mistakes on test data

This means: system with one node makes a mistake during training, but performs equally

(24)

Example: People tracking

body joint hypotheses

front view side view top view

input depth image body parts

Bodypart

Labelling Clustering

Body is divided into 31 body parts

(simple centroid computation)

(25)

Small Model capacity

Is the part round?

Training data

Model

Test time:

Color red

Is it darker below me?

(26)

Large Model capacity

Training data

Model

Test time:

Tree is very deep (e.g. depth 30)

The Model perfectly memorizes the training data!

Is it darker below me?

Is the colour of my pixel 130?

Is the part round?

(27)

Tree Depth

Number of Parameters (tree depth) More training

data pushes this boundary

(28)

Infinite Training Data

Model

Test time:

Tree is very deep (e.g. depth 30)

The Model perfectly memorizes the training data!

If we were to have all possible training images

that can occur (inclusive noise)

(29)

Real System that runs on Xbox!

acc ur acy

Amount of Training Data is one of the main factors

(30)

Roadmap this lecture

• The question of “the combinatorial explosion”

• The question of “generalization”

• The questions with “Big Data”

(31)

The important factors

Performance Model Power

Training Data

More Training Data = More powerful models = Better Performance (unlabeled and labeled!)

(e.g. tree depth)

(32)

The Increase in Data

2005

~100 training images

~100 Parameters

~10M images

~60M Parameters 2014

accuracy rate: ~20%

accuracy rate: ~85%

ImageNet Challenge

(33)

Deep Models have many parameters

The return of Neural Networks from the ~70s … made them work

Major leap forward in ImageNet Challenge

(34)

Where does the training Data come from?

1. Annotation by researchers:

~1K training images

2. Search Engines:

type in “cow”

3. Crowd sourcing:

~1M training images 4. Simulation:

~10M training images

(35)

Where does the training Data come from?

1. Get images of tigers

2. Search Engines: “type in tiger”

3. Crowd sourcing: run e.g. Amazon Mechanical Turk to filter out bad images.

(Start-up companies)

(36)

Where does the training Data come from?

ImageNet challenge

1.2 Million annotated images; 1000+ classes

(37)

Where does the training Data come from

• Simulation:

~10M training images

Record mocap

100,000s of poses

Retarget to varied body shapes

Render (depth, body parts) pairs + add noise

[Vicon]

… we start to look into this with biologists

(38)

Roadmap this lecture

• The question of “the combinatorial explosion”

• The question of “generalization”

• The questions with “Big Data”

(39)