Intelligent Systems:
Recognition in the wild
Carsten Rother
Roadmap this lecture
β’ Example: Exams Questions
β’ Finishing off last lecture
β’ Recognition:
β’ Define the problem
β’ Decision Forests
β’ Person tracking β¦ what runs on Microsoft Xbox
Roadmap this lecture
β’ Example: Exam Questions
β’ Finishing off last lecture
β’ Recognition:
β’ Define the problem
β’ Decision Forests
β’ Person tracking β¦ what runs on Microsoft Xbox
Exam Questions
β’ Nur von meinem und Dimitri Schlesinger Teil.
β’ Drei Typen von Aufgaben:
1) Algorithmen
2) Definitionen und Wissensfragen 3) Theoretische Herleitungen
β’ Antworten kΓΆnnen auf English oder Deutsch gegeben werden
1) Algorithmen
Was wΓΌrde ein parallelisierter ICM Algorithmus in den nΓ€chsten zwei Schritten machen? Bitte zeichnen sie ein.
Gegeben die Energy:
π₯
πβ {0,1}
π₯
1π₯
2π₯
3π₯
4Initializer Zustand:
π₯
1=0 π₯
2=1
π₯
3= 1 π₯
4=0
Schritt 1:
π₯
1=? π₯
2=?
π₯
3= ? π₯
4=?
Hinweis: dunkle Konten wird im ersten Schritt nicht verΓ€ndert
π₯
1=? π₯
2=?
π₯
3= ? π₯
4=?
Schritt 2:
π
10 = 0, π
11 = 1 π
20 = 1, π
21 = 1 π
30 = 2, π
31 = 1 π
40 = 1, π
41 = 2 π
πππ₯
π, π₯
π= |π₯
πβ π₯
π|
For all π, π
2) Definitionen und Wissensfragen
β’ Was ist ein βIsing Priorβ
Antwort aus den Slides:
Ξ π,π π₯ π , π₯ π = exp{β|π₯ π β π₯ π |} called βIsing priorβ
3) Theoretische Herleitungen
Man berechne die Wahrscheinlichkeit dafΓΌr, dass die Summe der Augenzahlen zweier von einander unabhΓ€ngig gewΓΌrfelten
SpielwΓΌrfel durch 5 teilbar ist.
(Etwas Γ€hnliches wurde bei der Vorlesung "Probability Theory"
betrachtet, Folie 14)
Kommentar
β’ Aussagen auf den Sides die als Hintergrund Information dienen.
Diese werden nicht βim Detailβ abgefragt.
Beispiel: Image Retargeting
E π =
π
ππ π₯π +
π,π
πππ π₯π, π₯π
Binary Label: π₯
πβ {0,1}
πππ π₯π, π₯π
ππ π₯π
Force
label 0 Force
label 1
labeling (sketched)
Label 0 Label 1
Path from top to bottom In this case the problem can be
represented in 2 ways:
labeling or path finding
You can do that as an exercise, please see details in:
http://www.merl.com/reports/docs/TR2008-064.pdf
Roadmap this lecture
β’ Example: Exam Questions
β’ Finishing off last lecture
β’ Recognition:
β’ Define the problem
β’ Decision Forests
β’ Person tracking β¦ what runs on Microsoft Xbox
Comments to last lecture
I added some slides to last lecture to make it
clearer. These slides are also added below. The
slides from last lecture are now online.
Reminder: Definition: Factor Graph models
β’ Given an undirected Graph πΊ = (π, πΉ, πΈ), where π, πΉ are the set of nodes and πΈ the set of Edges
β’ A Factor Graph defines a family of distributions:
π: partition function πΉ : Factor
π½: Set of all factors
π(πΉ): Neighbourhood of a factor
π
πΉ: function (not distribution) depending on π
π(πΉ)(π
πΆ: πΎ
|πΆ|β π where x
iβ πΎ)
π π = 1
π πΉβ π½
π πΉ (π π πΉ ) where π = π πΉβ π½ π πΉ (π π πΉ )
Reminder: Definition: Undirected Graphical models
β’ Given an undirected Graph πΊ = (π, πΈ), where π is the set of nodes and πΈ the set of Edges
β’ An undirected Graphical Model defines a family of distributions:
π: partition function C(G): Set of all cliques
C: a clique, i.e. a subset of variable indices.
π
πΆ: factor (not distribution) depending on π
πΆ(π
πΆ: πΎ
|πΆ|β π where x
iβ πΎ)
Definition: Clique is a set of nodes where all nodes are linked with an edge
π π = 1
π πΆβπΆ(πΊ)
π πΆ (π πΆ ) where π = π πΆβπΆ(πΊ) π πΆ (π πΆ )
Comment on definition
In some books the set πΆ(πΊ) is defined as the set of all maximum cliques only.
π₯
1π₯
2π₯
3π₯
4π₯
5The set of families of distributions is equivalent.
For instance, a factor π π₯ , π₯ = (π₯ +π₯ ) π₯ can also be written as two
π π₯1, π₯2, π₯3, π₯4, π₯5 =1
π π π₯1, π₯2, π₯4 π π₯2, π₯3, π₯4 π π₯1, π₯2 π π₯1, π₯2 π π₯1, π₯4 π π₯4, π₯2
π π₯3, π₯2 π π₯3, π₯4 π π₯4, π₯5 π π₯1 π π₯2 π π₯3 π π₯4 π π₯5
π π₯1, π₯2, π₯3, π₯4, π₯5 =1
π π π₯1, π₯2, π₯4 π π₯2, π₯3, π₯4 π π₯4, π₯5
Using maximum cliques.
Easy to convert between the two representations
π₯
1π₯
2π₯
3π₯
4π₯
5π₯
1π₯
2π₯
3π₯
4π₯
5Make sure that every factor is represented by a clique
Family of distributions with this
factor graph
Family of distributions with this undirected graphical model
Convert a factor graph in such a way that the family of distributions of the
undirected graphical model covers all possible distributions of this factor graph:
Easy to convert between the two representations
π₯
1π₯
2π₯
3π₯
4π₯
5π₯
1π₯
2π₯
3π₯
4π₯
5Family of distributions of this undirected graphical model and factor graph is the same
Convert an undirected graphical model in such a way that the family of
distributions of the factor graph covers all possible distributions of this
undirected graphical model:
Easy to convert between the two representations
π₯
1π₯
2π₯
3π₯
4π₯
5π₯
1π₯
2π₯
3π₯
4π₯
5Make sure that every clique has an associated factor
Family of distributions of this undirected graphical model and factor graph is the same
Convert an undirected graphical model in such a way that the family of distributions of the factor graph covers all possible distributions of this undirected graphical model:
Comment: this is also correct, but not minimal
representation
Easy to convert between the two representations
π₯
1π₯
2π₯
3π₯
4π₯
5Family of distributions of this undirected graphical model and factor graph is the same
Convert an undirected graphical model in such a way that the family of distributions of the factor graph covers all possible distributions of this undirected graphical model:
Comment: this is not correct
π₯
1π₯
2π₯
3π₯
4π₯
5Reminder: Last lecture Road Map
β’ Define: Structured Models
β’ Formulate applications as discrete labeling problems
β’ Discrete Inference:
β’ Pixels-based: Iterative Conditional Mode (ICM)
β’ Line-based: Dynamic Programming (DP)
β’ Field-based: Graph Cut and Alpha-Expansion
β’ Interactive Image Segmentation
β’ From Generative models to
β’ Discriminative models to
β’ Discriminative function
Reminder: Generative Model
Models explicitly (or implicitly) the distribution of the input π and output π
Joint Probablity π(π, π) = π(π|π) π(π)
Comment:
1. The joint distribution does not necessarily have to be decomposed into likelihood and prior, but in practice it (nearly) always is
2. Generative Models are used successfully when input π and output π are very related, e.g. image denoising.
Pros: 1. Possible to sample both: π and π
2. Can be quite easily used for many applications (since prior and likelihood are modeled separately) 3. In some applications, e.g. biology, people want to model
likelihood and Prior explicitly, since the want to understand the model as much possible
4. Probability can be used in bigger systems
Cons: 1. might not always be possible to write down the full distribution (involves a distribution over images π).
likelihood prior
Reminder: Discriminative model
π π π = 1
π exp βπΈ π, π where π =
π
exp{βπΈ(π, π)}
Models that model the Posterior directly are discriminative models.
In Computer Vision we use mostly the Gibbs distribution with an Energy πΈ:
These are also called: βConditional random fieldβ
Pros: 1. Simpler to write down than generative model (no need to model π)
and goes directly for the desired output π 2. More flexible since energy is arbitrary
3. Probability can be used in bigger systems
Cons: we can no longer sample images π
Reminder: Discriminative model
β’ Relation: Posterior and Joint: π π π =
1π π
π π, π
β’ π(π, π), π π π and πΈ(π, π) all have the same optimal solution π
βgiven z:
β’ π
β= ππππππ₯
ππ π, π given π
β’ π
β= ππππππ₯
ππ π|π given π (since π π π =
1π π
π π, π )
β’ π
β= ππππππ
ππΈ π, π (since βlog π π π = log π + πΈ(π, π))
Comment on Generative Models
One may also write the joint distribution π(π, π) as a Gibbs distribution:
π(π, π) = 1
π exp βπΈ π, π where π =
π,π
exp{βπΈ(π, π)}
If likelihood and prior are no longer modelled separately:
β’ sampling π, π gets more difficult
β’ We can no longer learn prior and
likelihood separately (as in de-noising)
β’ We train π π, π =
1π
exp βπΈ π, π , π(π|π) =
1π
exp βπΈ π, π in a quite similar way.
π Samples:
π
π π
The advantage of a generative model over a discriminative model are gone But β¦ it lost the meaning of a βgenerativeβ model, since we donβt have
a likelihood which says how the data was βgeneratedβ.
Discriminative functions
πΈ π, π : π² π β πΉ
Models that model the classification problem via a function
Examples:
- Energy
- support vector machines - nearest neighbour classifier
Pros: most direct approach to model the problem Cons: no probabilities
π β = ππππππ π πΈ π, π
This is the most used approach in computer vision!
Recap
Modelling a problem:
β’ The input data is π and the desired output π
We can identify three different approaches:
[see details in Bishop, page 42ff]:
β’ Generative (probabilistic) models: π(π, π)
β’ Discriminative (probabilistic) models: π(π|π)
β’ Discriminative functions: π(π, π)
The key difference are:
β’ Probabilistic or none-probabilistic model
β’ Generative models model also the data π
β’ Differences in Training (see previous lectures)
Roadmap this lecture
β’ Example: Exams Questions
β’ Finishing off last lecture
β’ Recognition:
β’ Define the problem
β’ Decision Forests
β’ Person tracking β¦ what runs on Microsoft Xbox
Slides credits
β’ Bernt Schiele
β’ Li Fei-Fei
β’ Rob Fergus
β’ Kirsten Grauman
β’ Derek Hoiem
β’ Stefan Roth
β’ Jamie Shotton
β’ Antonio Criminisi
Recognition / Classification is fundamental part of many βIntelligent Systemβ
β’ Robot / Intelligent cars
β’ Stereo Image recognition
β’ Classification of sensor
β’ Search in documents:
β’ Biology / medicine: classify cells, DNA, etc.
β’ Language / Hand drawings:
Google input:
βShow me frogsβ
Image Recognition β What is the Goal ?
β’ Object instance recognition (more precise: known object instance recognition)
β’ We know exactly the instance
β’ Object class recognition (also called: Generic object recognition)
β’ Different instance of the same class
Proto-type images
Test image
Train-set Test-set
Result
Class versus Instance β a gray zone
Same instance or not?
Class-based recognition: Level of Detail
β’ Image Categorization
β’ One or more categories per image
β’ Object Class Detection
β’ Also find bounding box
β’ Part-based Object Detection
β’ Find parts of the object
(and in this way the full object)
β’ Semantic Segmentation (see last lecture) (segmentation implies pixel-wise accuracy)
β’ Object-class segmentation
Frog, branch
2D bounding box for
each frog
Roadmap this lecture
β’ Example: Exams Questions
β’ Finishing off last lecture
β’ Recognition:
β’ Define the problem
β’ Decision Forests
β’ Person tracking β¦ what runs on Microsoft Xbox
Random Forest
Slide credits: Jamie Shotton, and Antonio Criminisi
β’ Proven very capable, especially for real-time applications
β’ e.g. keypoint recognition, Kinect body part classification
β’ High accuracy with very low computational cost
β’ can exploit low-level features (e.g. raw pixel values)
β’ feature vector computed sparsely on-demand
β’ generalization through randomization
β’ Gives out confidence values
β’ Flexible, non-parametric model
β’ Easy to implement and parallelize
What can random forests do? Tasks
Regression forests Classification / Decision forests
Manifold forests
Density forests Semi-supervised forests
What can random forests do? Applications
Regression forests Classification / Decision forests
Manifold forests
Density forests Semi-supervised forests
e.g. semantic segmentation e.g. object localization
e.g. novelty detection e.g. dimensionality reduction e.g. semi-sup. semantic segmentation
Classification / Decision forests
e.g. DNA classification
Reminder: Last Lecture
Global optimum
π
β= ππππππ
ππ π, π
The user defined with brush strokes what is object and background:
Now we want to do that automatically
Semantic Segmentation
The desired output
Label each pixel with one out of 21 classes
[TextonBoost; Shotton et al, β06]
Failure Cases
Optimizes an Energy (details not important)
(color model) (location prior)
(class)
(edge aware smoothess prior)
Define Energy:
Location Prior:
grass sky
class information:
Each pixel gets a distribution over 21-classes:
π
ππ₯
π= π, π = π π₯
π= π π Many options: Here we use Random Forest π₯
πβ {1, β¦ , πΎ}
πΈ π₯, Ξ =
π
π π π₯ π , π§ π , Ξ + π π π₯ π + π π π₯ π , π +
π,π
π π,π (π₯ π , π₯ π )
Semantic Segmentation
Class and location only
+ edges + color
model
We concentrate on getting this out
Let us talk about features and simple (weak) classifiers
1) Get π-dimensional feature, depending on some parameters π:
1-dimensional:
We want to classify the white pixel Feature: color (e.g. red channel) of the green pixel
Parameters: 2D offset vector
1-dimensional: We want to classify the white pixel
Feature: average color (e.g. red channel) in the rectangle
Parameters: 4D (offset vector + size of rectangle)
Object Class Segmentation using Random Forests
Let us talk about features and simple (weak) classifiers
1) Get π-dimensional feature, depending on some parameters π:
2-dimensional:
We want to classify the white pixel Feature: color (e.g. red channel) of the green pixel and color (e.g. red channel) of the red pixel
Parameters: 4D (2 offset vectors)
We will visualize in this way:
Let us talk about features and simple (weak) classifiers
Weak learner: axis aligned Weak learner: oriented line Weak learner: conic section
Examples of weak learners
Feature response for 2D example.
With a generic line in homog. coordinates.
Feature response for 2D example.
With a matrix representing a conic.
Feature response for 2D example.
In general may select only a very small subset of features
With or
Classifier has 2 parameter:
which axis, continuous threshold
Classifier has 2/3 parameter Classifier has 5/6 parameter
Axis aligned linear classifier
linear classifier conic classifier ππ₯
1+ ππ₯
2+ c <> 0 ππ₯
12+ ππ₯
22+
ππ₯
1π₯
2+ ππ₯
1+
ππ₯
2+ π <> 0
Let us talk about features and simple (weak) classifiers
β’ We put all the parameters of the classifier and of the features into one vector: π½ Axis aligned linear
classifier
linear classifier conic classifier ππ₯
1+ ππ₯
2+ c <> 0 ππ₯
12+ ππ₯
22+
ππ₯
1π₯
2+ ππ₯
1+ ππ₯
2+ π <> 0
β’ They are called weak classifiers since they will be used to build a stronger classifier (here random forest, later Boosting)
β’ We denote the classifier as β π, π£ β {π‘ππ’π(> 0), ππππ π(< 0)}
Decision Tree (Classification)
[ Y. Amit and D. Geman. Shape quantization and recognition with randomized trees. Neural Computation. 9:1545--1588, 1997]
terminal (leaf) node internal
(split) node
root node 0
1 2
3 4 5 6
7 8 9 10 11 12 13 14
A general tree structure
Is top part blue?
Is bottom
part green? Is bottom
part blue?
A decision tree
Decision Tree β Test Time
Input test
point Go left or right according to:
Input data in feature space
π(π)
Decision Tree β Train Time
Input: all training points Input data in feature space
each point has a class label
The set of all labelled (training data) points, here 35 red and 23 blue.
Split the training set at each node
Measure π(π) at each leave, it could be 3 red an 1 blue,
i.e. π(πππ) = 0.75; π(πππ’π) = 0.25 (remember, the feature space
is also optimized with π)
Random Forests β Training of features (illustration)
What does it mean to optimize over π
β’ For each pixel the same feature test (at one split node) will be done.
β’ One has to define what happens with
feature tests that reaches outside the image Goal during Training: spate red pixel (class 1) from blue pixels (class 2)
Feature:
Value π₯
1: what is the value of green color channel (could also be red or blue) if you look:
π1pixel right and π
2pixels up
Value π₯
2: what is the value of green color channel (could also be red or blue) if you look:
π3pixel right and π
4pixels down
Goal: find a such a π that it is best to separate the data πππ + (π
1, π
2)
πππ + (π
3, π
4)
Image Labeling (2 classes, red and blue)
Decision Tree β Split Criteria
Be for e sp lit
Shannonβs entropy Node training
Sp lit 1 Sp lit 2
Information gain
Think of minimizing Entropy
Example Calculation
β’ We have π = 12 with π
πΏ= 6 and π
π= 6
β’ In π we have 6 red and 6 blue points (2 classes)
β’ We look at two possible splits:
1) 50%-50% class-split (each side (π
πΏand π
π) gets 3 red and 3 blue) π» π
πΏ= β 0.5 log 0.5 + 0.5 log 0.5 = 1
π» π
π= β 0.5 log 0.5 + 0.5 log 0.5 = 1 πΌ(π) = π»(π) β (0.5 + 0.5) = π»(π) β 1 = 0
2) 16%-84% class-split (left side has 5 red and 1 blue, right side has 5 blue and 1 red) π» π
πΏ= β
16
log
16
+
56
log
56
= 0.64 π» π
π= β
16
log
16
+
56
log
56
= 0.64
πΌ π = π» π β 0.5 β 0.64 + 0.5 β 0.64 = π» π β 0.64 = 0.36
(Higher information gain)
(Lower information gain)
Decision Forest
Tree t=1 t=2 t=3
Forest output probability
The ensemble model
π π = 1 π π‘
π
ππ‘(π) ππ
π is the number of trees
Randomness in the training set
Bagging (randomizing the training set)
The full training set
The randomly sampled subset of training data made available for the tree t
Forest training
Example: Two classes; axis aligned linear classifier
Training different trees in the forest
Testing different trees in the forest
Training points
Example: Two classes; linear classifier
Training different trees in the forest
Testing different trees in the forest
Training points
Example: Two classes; conic classifier
Training different trees in the forest
Testing different trees in the forest
Training points
Decision Tree - Randomization
The full set of all possible node test parameters For each node the set of randomly sampled features Randomness control parameter.
For no randomness and maximum tree correlation.
For max randomness and minimum tree correlation.
Randomized node optimization
Small value of ; little tree correlation. Large value of ; large tree correlation.
The effect of
Node weak learner
Node test params Node training
Decision Forest the choices
β’ What is the depth of the trees?
β’ How many trees?
β’ Choice of π ?
β’ What are the features?
β’ What type of classifier (linear, conic, etc.) ?
β’ What split criteria? (other than information gain)
A crucial factor is tree depth
Definition
β’ Over-fitting: Is the effect that the model perfectly memorizes the training data, but does not perform well on test data
β’ Generalization: One of the most important aspect of a model is its ability to generalize. That means that new (unseen) test data is
correctly classified. A model which overfitts does not generalize
well.
Example: four classes; conic classifier
Training different trees in the forest
Testing different trees in the forest
Training points
Examples
Training points: 4-class spiral Training pts: 4-class spiral, large gaps Tr. pts: 4-class spiral, larger gaps
Testing posteriors
Examples - overfitting
T=200, D=3, w. l. = conic T=200, D=6, w. l. = conic T=200, D=15, w. l. = conic
Training points: 4-class mixed
Roadmap this lecture
β’ Example: Exams Questions
β’ Finishing off last lecture
β’ Recognition:
β’ Define the problem
β’ Decision Forests
β’ Person tracking β¦ what runs on Microsoft Xbox
Body Tracking with Kinect Camera
β¦ what runs on Microsoft Xbox
Reminder: Kinect Camera
Example Depth: Person
top view
side view
RGB vs depth for pose estimation
β’ RGB
ο Only works well lit
ο Background clutter
ο Scale unknown
οClothing & skin colour
β’ D EPTH
ο Works in low light
ο Person βpopsβ out from bg
ο Scale known
οUniform texture
Body Tracking: Pipeline overview
body joint hypotheses
front view side view top view
input depth image body parts
Bodypart
Labelling Clustering
Body is divided into 31 body parts
(simple centroid computation)
Create lots and lots of training data
Model all sorts of variations:
Record mocap
100,000s of poses
Retarget to varied body shapes
Render (depth, body parts) pairs + add noise
[Vicon]