Recognition in the wild

(1)

Intelligent Systems:

Recognition in the wild

Carsten Rother

(2)

Roadmap this lecture

• Example: Exams Questions

• Finishing off last lecture

• Recognition:

• Define the problem

• Decision Forests

• Person tracking … what runs on Microsoft Xbox

(3)

Roadmap this lecture

• Example: Exam Questions

• Finishing off last lecture

• Recognition:

• Define the problem

• Decision Forests

• Person tracking … what runs on Microsoft Xbox

(4)

Exam Questions

• Nur von meinem und Dimitri Schlesinger Teil.

• Drei Typen von Aufgaben:

1) Algorithmen

2) Definitionen und Wissensfragen 3) Theoretische Herleitungen

• Antworten können auf English oder Deutsch gegeben werden

(5)

1) Algorithmen

Was würde ein parallelisierter ICM Algorithmus in den nächsten zwei Schritten machen? Bitte zeichnen sie ein.

Gegeben die Energy:

𝑥

_𝑖

∈ {0,1}

𝑥

₁

𝑥

₂

𝑥

₃

𝑥

₄

Initializer Zustand:

𝑥

₁

=0 𝑥

₂

=1

𝑥

₃

= 1 𝑥

₄

=0

Schritt 1:

𝑥

₁

=? 𝑥

₂

=?

𝑥

₃

= ? 𝑥

₄

=?

Hinweis: dunkle Konten wird im ersten Schritt nicht verändert

𝑥

₁

=? 𝑥

₂

=?

𝑥

₃

= ? 𝑥

₄

=?

Schritt 2:

𝜃

₁

0 = 0, 𝜃

₁

1 = 1 𝜃

₂

0 = 1, 𝜃

₂

1 = 1 𝜃

₃

0 = 2, 𝜃

₃

1 = 1 𝜃

₄

0 = 1, 𝜃

₄

1 = 2 𝜃

_𝑖𝑗

𝑥

_𝑖

, 𝑥

_𝑗

= |𝑥

_𝑖

− 𝑥

_𝑗

|

For all 𝑖, 𝑗

(6)

2) Definitionen und Wissensfragen

• Was ist ein “Ising Prior”

Antwort aus den Slides:

Θ _𝑖,𝑗 𝑥 _𝑖 , 𝑥 _𝑗 = exp{−|𝑥 _𝑖 − 𝑥 _𝑗 |} called “Ising prior”

(7)

3) Theoretische Herleitungen

Man berechne die Wahrscheinlichkeit dafür, dass die Summe der Augenzahlen zweier von einander unabhängig gewürfelten

Spielwürfel durch 5 teilbar ist.

(Etwas ähnliches wurde bei der Vorlesung "Probability Theory"

betrachtet, Folie 14)

(8)

Kommentar

• Aussagen auf den Sides die als Hintergrund Information dienen.

Diese werden nicht “im Detail” abgefragt.

(9)

Beispiel: Image Retargeting

E 𝒙 =

𝑖

𝜃_𝑖 𝑥_𝑖 +

𝑖,𝑗

𝜃_𝑖𝑗 𝑥_𝑖, 𝑥_𝑗

Binary Label: 𝑥

_𝑖

∈ {0,1}

𝜃_𝑖𝑗 𝑥_𝑖, 𝑥_𝑗

𝜃_𝑖 𝑥_𝑖

Force

label 0 Force

label 1

labeling (sketched)

Label 0 Label 1

Path from top to bottom In this case the problem can be

represented in 2 ways:

labeling or path finding

You can do that as an exercise, please see details in:

http://www.merl.com/reports/docs/TR2008-064.pdf

(10)

Roadmap this lecture

• Example: Exam Questions

• Finishing off last lecture

• Recognition:

• Define the problem

• Decision Forests

• Person tracking … what runs on Microsoft Xbox

(11)

Comments to last lecture

I added some slides to last lecture to make it

clearer. These slides are also added below. The

slides from last lecture are now online.

(12)

Reminder: Definition: Factor Graph models

• Given an undirected Graph 𝐺 = (𝑉, 𝐹, 𝐸), where 𝑉, 𝐹 are the set of nodes and 𝐸 the set of Edges

• A Factor Graph defines a family of distributions:

𝑓: partition function 𝐹 : Factor

𝔽: Set of all factors

𝑁(𝐹): Neighbourhood of a factor

𝜓

_𝐹

: function (not distribution) depending on 𝒙

_𝑁(𝐹)

(𝜓

_𝐶

: 𝐾

^|𝐶|

→ 𝑅 where x

_i

∈ 𝐾)

𝑃 𝒙 = 1

𝑓 𝐹∈ 𝔽

𝜓 _𝐹 (𝒙 _{𝑁 𝐹} ) where 𝑓 = _𝒙 _𝐹∈ 𝔽 𝜓 ^𝐹 (𝒙 _{𝑁 𝐹} )

(13)

Reminder: Definition: Undirected Graphical models

• Given an undirected Graph 𝐺 = (𝑉, 𝐸), where 𝑉 is the set of nodes and 𝐸 the set of Edges

• An undirected Graphical Model defines a family of distributions:

𝑓: partition function C(G): Set of all cliques

C: a clique, i.e. a subset of variable indices.

𝜓

_𝐶

: factor (not distribution) depending on 𝒙

_𝐶

(𝜓

_𝐶

: 𝐾

^|𝐶|

→ 𝑅 where x

_i

∈ 𝐾)

Definition: Clique is a set of nodes where all nodes are linked with an edge

𝑃 𝒙 = 1

𝑓 𝐶∈𝐶(𝐺)

𝜓 _𝐶 (𝒙 _𝐶 ) where 𝑓 = _𝒙 _{𝐶∈𝐶(𝐺)} 𝜓 _𝐶 (𝒙 _𝐶 )

(14)

Comment on definition

In some books the set 𝐶(𝐺) is defined as the set of all maximum cliques only.

𝑥

₁

𝑥

₂

𝑥

₃

𝑥

₄

𝑥

₅

The set of families of distributions is equivalent.

For instance, a factor 𝜓 𝑥 , 𝑥 = (𝑥 +𝑥 ) 𝑥 can also be written as two

𝑃 𝑥₁, 𝑥₂, 𝑥₃, 𝑥₄, 𝑥₅ =1

𝑓 𝜓 𝑥₁, 𝑥₂, 𝑥₄ 𝜓 𝑥₂, 𝑥₃, 𝑥₄ 𝜓 𝑥₁, 𝑥₂ 𝜓 𝑥₁, 𝑥₂ 𝜓 𝑥₁, 𝑥₄ 𝜓 𝑥₄, 𝑥₂

𝜓 𝑥₃, 𝑥₂ 𝜓 𝑥₃, 𝑥₄ 𝜓 𝑥₄, 𝑥₅ 𝜓 𝑥₁ 𝜓 𝑥₂ 𝜓 𝑥₃ 𝜓 𝑥₄ 𝜓 𝑥₅

𝑃 𝑥₁, 𝑥₂, 𝑥₃, 𝑥₄, 𝑥₅ =1

𝑓 𝜓 𝑥₁, 𝑥₂, 𝑥₄ 𝜓 𝑥₂, 𝑥₃, 𝑥₄ 𝜓 𝑥₄, 𝑥₅

Using maximum cliques.

(15)

Easy to convert between the two representations

𝑥

₁

𝑥

₂

𝑥

₃

𝑥

₄

𝑥

₅

𝑥

₁

𝑥

₂

𝑥

₃

𝑥

₄

𝑥

₅

Make sure that every factor is represented by a clique

Family of distributions with this

factor graph

Family of distributions with this undirected graphical model

Convert a factor graph in such a way that the family of distributions of the

undirected graphical model covers all possible distributions of this factor graph:

(16)

Easy to convert between the two representations

𝑥

₁

𝑥

₂

𝑥

₃

𝑥

₄

𝑥

₅

𝑥

₁

𝑥

₂

𝑥

₃

𝑥

₄

𝑥

₅

Family of distributions of this undirected graphical model and factor graph is the same

Convert an undirected graphical model in such a way that the family of

distributions of the factor graph covers all possible distributions of this

undirected graphical model:

(17)

Easy to convert between the two representations

𝑥

₁

𝑥

₂

𝑥

₃

𝑥

₄

𝑥

₅

𝑥

₁

𝑥

₂

𝑥

₃

𝑥

₄

𝑥

₅

Make sure that every clique has an associated factor

Convert an undirected graphical model in such a way that the family of distributions of the factor graph covers all possible distributions of this undirected graphical model:

Comment: this is also correct, but not minimal

representation

(18)

Easy to convert between the two representations

𝑥

₁

𝑥

₂

𝑥

₃

𝑥

₄

𝑥

₅

Convert an undirected graphical model in such a way that the family of distributions of the factor graph covers all possible distributions of this undirected graphical model:

Comment: this is not correct

𝑥

₁

𝑥

₂

𝑥

₃

𝑥

₄

𝑥

₅

(19)

Reminder: Last lecture Road Map

• Define: Structured Models

• Formulate applications as discrete labeling problems

• Discrete Inference:

• Pixels-based: Iterative Conditional Mode (ICM)

• Line-based: Dynamic Programming (DP)

• Field-based: Graph Cut and Alpha-Expansion

• Interactive Image Segmentation

• From Generative models to

• Discriminative models to

• Discriminative function

(20)

Reminder: Generative Model

Models explicitly (or implicitly) the distribution of the input 𝒛 and output 𝒙

Joint Probablity 𝑃(𝒙, 𝒛) = 𝑃(𝒛|𝒙) 𝑃(𝒙)

Comment:

1. The joint distribution does not necessarily have to be decomposed into likelihood and prior, but in practice it (nearly) always is

2. Generative Models are used successfully when input 𝒛 and output 𝒙 are very related, e.g. image denoising.

Pros: 1. Possible to sample both: 𝒙 and 𝒛

2. Can be quite easily used for many applications (since prior and likelihood are modeled separately) 3. In some applications, e.g. biology, people want to model

likelihood and Prior explicitly, since the want to understand the model as much possible

4. Probability can be used in bigger systems

Cons: 1. might not always be possible to write down the full distribution (involves a distribution over images 𝒛).

likelihood prior

(21)

Reminder: Discriminative model

𝑃 𝒙 𝒛 = 1

𝑓 exp −𝐸 𝒙, 𝒛 where 𝑓 =

𝒙

exp{−𝐸(𝒙, 𝒛)}

Models that model the Posterior directly are discriminative models.

In Computer Vision we use mostly the Gibbs distribution with an Energy 𝐸:

These are also called: “Conditional random field”

Pros: 1. Simpler to write down than generative model (no need to model 𝒛)

and goes directly for the desired output 𝒙 2. More flexible since energy is arbitrary

3. Probability can be used in bigger systems

Cons: we can no longer sample images 𝒛

(22)

Reminder: Discriminative model

• Relation: Posterior and Joint: 𝑃 𝒙 𝒛 =

¹

𝑃 𝒛

𝑃 𝒙, 𝒛

• 𝑃(𝒙, 𝒛), 𝑃 𝒙 𝒛 and 𝐸(𝒙, 𝒛) all have the same optimal solution 𝒙

^∗

given z:

• 𝒙

^∗

= 𝑎𝑟𝑔𝑚𝑎𝑥

_𝒙

𝑃 𝒙, 𝒛 given 𝒛

• 𝒙

^∗

= 𝑎𝑟𝑔𝑚𝑎𝑥

_𝒙

𝑃 𝒙|𝒛 given 𝒛 (since 𝑃 𝒙 𝒛 =

¹

𝑃 𝒛

𝑃 𝒙, 𝒛 )

• 𝒙

^∗

= 𝑎𝑟𝑔𝑚𝑖𝑛

_𝒙

𝐸 𝒙, 𝒛 (since −log 𝑃 𝒙 𝒛 = log 𝑓 + 𝐸(𝒙, 𝒛))

(23)

Comment on Generative Models

One may also write the joint distribution 𝑃(𝒙, 𝒛) as a Gibbs distribution:

𝑃(𝒙, 𝒛) = 1

𝑓 exp −𝐸 𝒙, 𝒛 where 𝑓 =

𝒙,𝒛

exp{−𝐸(𝒙, 𝒛)}

If likelihood and prior are no longer modelled separately:

• sampling 𝒙, 𝒛 gets more difficult

• We can no longer learn prior and

likelihood separately (as in de-noising)

• We train 𝑃 𝒙, 𝒛 =

¹

𝑓

exp −𝐸 𝒙, 𝒛 , 𝑃(𝒙|𝒛) =

¹

𝑓

exp −𝐸 𝒙, 𝒛 in a quite similar way.

𝒙 Samples:

𝒛

𝒙 𝒛

The advantage of a generative model over a discriminative model are gone But … it lost the meaning of a “generative” model, since we don’t have

a likelihood which says how the data was “generated”.

(24)

Discriminative functions

𝐸 𝒙, 𝒛 : 𝑲 ^𝒏 → 𝑹

Models that model the classification problem via a function

Examples:

- Energy

- support vector machines - nearest neighbour classifier

Pros: most direct approach to model the problem Cons: no probabilities

𝒙 ^∗ = 𝑎𝑟𝑔𝑚𝑖𝑛 _𝒙 𝐸 𝒙, 𝒛

This is the most used approach in computer vision!

(25)

Recap

Modelling a problem:

• The input data is 𝒛 and the desired output 𝒙

We can identify three different approaches:

[see details in Bishop, page 42ff]:

• Generative (probabilistic) models: 𝑃(𝒙, 𝒛)

• Discriminative (probabilistic) models: 𝑃(𝒙|𝒛)

• Discriminative functions: 𝑓(𝒙, 𝒛)

The key difference are:

• Probabilistic or none-probabilistic model

• Generative models model also the data 𝒛

• Differences in Training (see previous lectures)

(26)

Roadmap this lecture

• Example: Exams Questions

• Finishing off last lecture

• Recognition:

• Define the problem

• Decision Forests

• Person tracking … what runs on Microsoft Xbox

(27)

Slides credits

• Bernt Schiele

• Li Fei-Fei

• Rob Fergus

• Kirsten Grauman

• Derek Hoiem

• Stefan Roth

• Jamie Shotton

• Antonio Criminisi

(28)

Recognition / Classification is fundamental part of many “Intelligent System”

• Robot / Intelligent cars

• Stereo Image recognition

• Classification of sensor

• Search in documents:

• Biology / medicine: classify cells, DNA, etc.

• Language / Hand drawings:

Google input:

“Show me frogs”

(29)

Image Recognition – What is the Goal ?

• Object instance recognition (more precise: known object instance recognition)

• We know exactly the instance

• Object class recognition (also called: Generic object recognition)

• Different instance of the same class

Proto-type images

Test image

Train-set Test-set

Result

(30)

Class versus Instance – a gray zone

Same instance or not?

(31)

Class-based recognition: Level of Detail

• Image Categorization

• One or more categories per image

• Object Class Detection

• Also find bounding box

• Part-based Object Detection

• Find parts of the object

(and in this way the full object)

• Semantic Segmentation (see last lecture) (segmentation implies pixel-wise accuracy)

• Object-class segmentation

Frog, branch

2D bounding box for

each frog

(32)

Roadmap this lecture

• Example: Exams Questions

• Finishing off last lecture

• Recognition:

• Define the problem

• Decision Forests

• Person tracking … what runs on Microsoft Xbox

(33)

Random Forest

Slide credits: Jamie Shotton, and Antonio Criminisi

• Proven very capable, especially for real-time applications

• e.g. keypoint recognition, Kinect body part classification

• High accuracy with very low computational cost

• can exploit low-level features (e.g. raw pixel values)

• feature vector computed sparsely on-demand

• generalization through randomization

• Gives out confidence values

• Flexible, non-parametric model

• Easy to implement and parallelize

(34)

What can random forests do? Tasks

Regression forests Classification / Decision forests

Manifold forests

Density forests Semi-supervised forests

(35)

What can random forests do? Applications

Regression forests Classification / Decision forests

Manifold forests

Density forests Semi-supervised forests

e.g. semantic segmentation e.g. object localization

e.g. novelty detection e.g. dimensionality reduction e.g. semi-sup. semantic segmentation

Classification / Decision forests

e.g. DNA classification

(36)

Reminder: Last Lecture

Global optimum

𝒙

^∗

= 𝒂𝒓𝒈𝒎𝒂𝒙

_𝒙

𝑃 𝒛, 𝒙

The user defined with brush strokes what is object and background:

Now we want to do that automatically

(37)

Semantic Segmentation

The desired output

Label each pixel with one out of 21 classes

[TextonBoost; Shotton et al, ‘06]

(38)

Failure Cases

(39)

Optimizes an Energy (details not important)

(color model) (location prior)

(class)

(edge aware smoothess prior)

Define Energy:

Location Prior:

grass sky

class information:

Each pixel gets a distribution over 21-classes:

𝜃

_𝑖

𝑥

_𝑖

= 𝑐, 𝒛 = 𝑃 𝑥

_𝑖

= 𝑐 𝒛 Many options: Here we use Random Forest 𝑥

_𝑖

∈ {1, … , 𝐾}

𝐸 𝑥, Θ =

𝑖

𝜃 _𝑖 𝑥 _𝑖 , 𝑧 _𝑖 , Θ + 𝜃 _𝑖 𝑥 _𝑖 + 𝜃 _𝑖 𝑥 _𝑖 , 𝒛 +

𝑖,𝑗

𝜃 _𝑖,𝑗 (𝑥 _𝑖 , 𝑥 _𝑗 )

(40)

Semantic Segmentation

Class and location only

+ edges + color

model

We concentrate on getting this out

(41)

Let us talk about features and simple (weak) classifiers

1) Get 𝑑-dimensional feature, depending on some parameters 𝜃:

1-dimensional:

We want to classify the white pixel Feature: color (e.g. red channel) of the green pixel

Parameters: 2D offset vector

1-dimensional: We want to classify the white pixel

Feature: average color (e.g. red channel) in the rectangle

Parameters: 4D (offset vector + size of rectangle)

Object Class Segmentation using Random Forests

(42)

Let us talk about features and simple (weak) classifiers

1) Get 𝑑-dimensional feature, depending on some parameters 𝜃:

2-dimensional:

We want to classify the white pixel Feature: color (e.g. red channel) of the green pixel and color (e.g. red channel) of the red pixel

Parameters: 4D (2 offset vectors)

We will visualize in this way:

(43)

Let us talk about features and simple (weak) classifiers

Weak learner: axis aligned Weak learner: oriented line Weak learner: conic section

Examples of weak learners

Feature response for 2D example.

With a generic line in homog. coordinates.

With a matrix representing a conic.

In general may select only a very small subset of features

With or

Classifier has 2 parameter:

which axis, continuous threshold

Classifier has 2/3 parameter Classifier has 5/6 parameter

Axis aligned linear classifier

linear classifier conic classifier 𝑎𝑥

₁

+ 𝑏𝑥

₂

+ c <> 0 𝑎𝑥

₁²

+ 𝑏𝑥

₂²

+

𝑐𝑥

₁

𝑥

₂

+ 𝑑𝑥

₁

+

𝑒𝑥

₂

+ 𝑓 <> 0

(44)

Let us talk about features and simple (weak) classifiers

• We put all the parameters of the classifier and of the features into one vector: 𝜽 Axis aligned linear

classifier

linear classifier conic classifier 𝑎𝑥

₁

+ 𝑏𝑥

₂

+ c <> 0 𝑎𝑥

₁²

+ 𝑏𝑥

₂²

+

𝑐𝑥

₁

𝑥

₂

+ 𝑑𝑥

₁

+ 𝑒𝑥

₂

+ 𝑓 <> 0

• They are called weak classifiers since they will be used to build a stronger classifier (here random forest, later Boosting)

• We denote the classifier as ℎ 𝜃, 𝑣 ∈ {𝑡𝑟𝑢𝑒(> 0), 𝑓𝑎𝑙𝑠𝑒(< 0)}

(45)

Decision Tree (Classification)

[ Y. Amit and D. Geman. Shape quantization and recognition with randomized trees. Neural Computation. 9:1545--1588, 1997]

terminal (leaf) node internal

(split) node

root node 0

1 2

3 4 5 6

7 8 9 10 11 12 13 14

A general tree structure

Is top part blue?

Is bottom

part green? Is bottom

part blue?

A decision tree

(46)

Decision Tree – Test Time

Input test

point Go left or right according to:

Input data in feature space

𝑝(𝑐)

(47)

Decision Tree – Train Time

Input: all training points Input data in feature space

each point has a class label

The set of all labelled (training data) points, here 35 red and 23 blue.

Split the training set at each node

Measure 𝑝(𝑐) at each leave, it could be 3 red an 1 blue,

i.e. 𝑝(𝑟𝑒𝑑) = 0.75; 𝑝(𝑏𝑙𝑢𝑒) = 0.25 (remember, the feature space

is also optimized with 𝜃)

(48)

Random Forests – Training of features (illustration)

What does it mean to optimize over 𝜃

• For each pixel the same feature test (at one split node) will be done.

• One has to define what happens with

feature tests that reaches outside the image Goal during Training: spate red pixel (class 1) from blue pixels (class 2)

Feature:

Value 𝑥

₁

: what is the value of green color channel (could also be red or blue) if you look:

^𝜃1

pixel right and 𝜃

₂

pixels up

Value 𝑥

₂

: what is the value of green color channel (could also be red or blue) if you look:

^𝜃3

pixel right and 𝜃

₄

pixels down

Goal: find a such a 𝜃 that it is best to separate the data 𝑝𝑜𝑠 + (𝜃

₁

, 𝜃

₂

)

𝑝𝑜𝑠 + (𝜃

₃

, 𝜃

₄

)

Image Labeling (2 classes, red and blue)

(49)

Decision Tree – Split Criteria

Be for e sp lit

Shannon’s entropy Node training

Sp lit 1 Sp lit 2

Information gain

Think of minimizing Entropy

(50)

Example Calculation

• We have 𝑆 = 12 with 𝑆

^𝐿

= 6 and 𝑆

^𝑅

= 6

• In 𝑆 we have 6 red and 6 blue points (2 classes)

• We look at two possible splits:

1) 50%-50% class-split (each side (𝑆

^𝐿

and 𝑆

^𝑅

) gets 3 red and 3 blue) 𝐻 𝑆

^𝐿

= − 0.5 log 0.5 + 0.5 log 0.5 = 1

𝐻 𝑆

^𝑅

= − 0.5 log 0.5 + 0.5 log 0.5 = 1 𝐼(𝑆) = 𝐻(𝑆) – (0.5 + 0.5) = 𝐻(𝑆) – 1 = 0

2) 16%-84% class-split (left side has 5 red and 1 blue, right side has 5 blue and 1 red) 𝐻 𝑆

^𝐿

= −

¹

6

log

¹

6

+

⁵

6

log

⁵

6

= 0.64 𝐻 𝑆

^𝑅

= −

¹

6

log

¹

6

+

⁵

6

log

⁵

6

= 0.64

𝐼 𝑆 = 𝐻 𝑆 – 0.5 ∗ 0.64 + 0.5 ∗ 0.64 = 𝐻 𝑆 – 0.64 = 0.36

(Higher information gain)

(Lower information gain)

(51)

Decision Forest

Tree t=1 t=2 t=3

Forest output probability

The ensemble model

𝑝 𝑐 = 1 𝑇 𝑡

𝑇

𝑝_𝑡(𝑐) 𝑝𝑐

𝑇 is the number of trees

(52)

Randomness in the training set

Bagging (randomizing the training set)

The full training set

The randomly sampled subset of training data made available for the tree t

Forest training

(53)

Example: Two classes; axis aligned linear classifier

Training different trees in the forest

Testing different trees in the forest

Training points

(54)

Example: Two classes; linear classifier

Training points

(55)

Example: Two classes; conic classifier

Training points

(56)

Decision Tree - Randomization

The full set of all possible node test parameters For each node the set of randomly sampled features Randomness control parameter.

For no randomness and maximum tree correlation.

For max randomness and minimum tree correlation.

Randomized node optimization

Small value of ; little tree correlation. Large value of ; large tree correlation.

The effect of

Node weak learner

Node test params Node training

(57)

Decision Forest the choices

• What is the depth of the trees?

• How many trees?

• Choice of 𝜌 ?

• What are the features?

• What type of classifier (linear, conic, etc.) ?

• What split criteria? (other than information gain)

(58)

A crucial factor is tree depth

(59)

Definition

• Over-fitting: Is the effect that the model perfectly memorizes the training data, but does not perform well on test data

• Generalization: One of the most important aspect of a model is its ability to generalize. That means that new (unseen) test data is

correctly classified. A model which overfitts does not generalize

well.

(60)

Example: four classes; conic classifier

Training points

(61)

Examples

Training points: 4-class spiral Training pts: 4-class spiral, large gaps Tr. pts: 4-class spiral, larger gaps

Testing posteriors

(62)

Examples - overfitting

T=200, D=3, w. l. = conic T=200, D=6, w. l. = conic T=200, D=15, w. l. = conic

Training points: 4-class mixed

(63)

Roadmap this lecture

• Example: Exams Questions

• Finishing off last lecture

• Recognition:

• Define the problem

• Decision Forests

• Person tracking … what runs on Microsoft Xbox

(64)

Body Tracking with Kinect Camera

… what runs on Microsoft Xbox

(65)

Reminder: Kinect Camera

(66)

Example Depth: Person

top view

side view

(67)

RGB vs depth for pose estimation

• RGB

 Only works well lit

 Background clutter

 Scale unknown

Clothing & skin colour

• D ^EPTH

 Works in low light

 Person ‘pops’ out from bg

 Scale known

Uniform texture

(68)

Body Tracking: Pipeline overview

body joint hypotheses

front view side view top view

input depth image body parts

Bodypart

Labelling Clustering

Body is divided into 31 body parts

(simple centroid computation)

(69)

Create lots and lots of training data

Model all sorts of variations:

Record mocap

100,000s of poses

Retarget to varied body shapes

Render (depth, body parts) pairs + add noise

[Vicon]

(70)

Train on synthetic data – test on real data

Synthetic (graphics) Real (hand-labelled)

(71)

Decision Forest

Each leaf stores a distribution over the 31 body parts:

(72)

Very Fast Features

Super Simple feature that can be computed very fast:

input depth image

p

Δ

p

Δ p

Δ

p

Δ

p

Δ

p

Δ

𝑥 _𝑖 p = 𝐽 p − 𝐽 p + Δ

depth

image coordinate

offset depth feature

response

offset scales with depth: Δ = 𝐫 _𝑖 𝐽(𝐩)

• 1D feature

• 2 Parameters (𝒓

_𝒊

)

(73)

Number of Trees

ground truth

1 tree 3 trees 6 trees

inferred body parts (most likely)

40%

45%

50%

55%

1 2 3 4 5 6

A ver ag e p er -clas s acc ur acy

Number of trees

Test Performance

(74)

depth 1 depth 2 depth 3 depth 4 depth 5 depth 6 depth 7 depth 8 depth 9 depth 10 depth 11 depth 12 depth 13 depth 14 depth 15 depth 16 depth 17 depth 18

Recognition in the wild

Intelligent Systems: