Computer Vision I -
Algorithms and Applications:
Recognition
Carsten Rother
Computer Vision I: Recognition 04/02/2014
Comment on last lecture
04/02/2014 Computer Vision I: Recognition 2
• We introduced generative model: 𝑃 𝒙, 𝒛 = 𝑃 𝒛 𝒙 𝑃 𝒙 and we said that a decomposition is (nearly) always done
• We introduced discriminative model: 𝑃 𝒙|𝒛 =
1𝑓
𝑒𝑥𝑝{−𝐸(𝒙, 𝒛)}
as Gibbs distribution with energy 𝐸(𝒙, 𝒛), where 𝑓 =
𝒙exp{−𝐸(𝒙, 𝒛)}
• They are related by: 𝑃 𝒙 𝒛 =
1𝑃 𝒛
𝑃(𝒙, 𝒛) (hence same optimal solution)
• We showed in a concrete segmentation example how to write:
𝑃 𝒙, 𝒛 = 𝑃 𝒛 𝒙 𝑃(𝒙) in form of a 𝑃 𝒙 𝒛 such that they have the same optimal solution 𝒙
∗= 𝑎𝑟𝑔𝑚𝑎𝑥
𝑥𝑃 𝒙, 𝒛 = 𝑎𝑟𝑔𝑚𝑎𝑥
𝑥𝑃(𝒙|𝒛)
• Given observed data 𝒛, and unobserved variables 𝒙
Comment on Generative Models
One may also write the joint distribution 𝑃(𝒙, 𝒛) as a Gibbs distribution:
04/02/2014 3
𝑃(𝒙, 𝒛) = 1
𝑓 exp −𝐸 𝒙, 𝒛 where 𝑓 =
𝒙,𝒛
exp{−𝐸(𝒙, 𝒛)}
If likelihood and prior are no longer modelled separately:
• sampling 𝒙, 𝒛 gets very difficult
• We can no longer learn prior and
likelihood separately (as in de-noising)
• We train 𝑃 𝒙, 𝒛 =
1𝑓
exp −𝐸 𝒙, 𝒛 , 𝑃(𝒙, 𝒛) =
1𝑓
exp −𝐸 𝒙, 𝒛 in a similar way.
(see CV 2 lectures)
𝒙 Samples:
𝒛
𝒙 𝒛
The advantages of a generative model over a discriminative model are mostly gone But … it lost the meaning of a “generative” model, since we don’t have
a likelihood which says how the data was “generated”.
Computer Vision I: Recognition
Another Comment: last lecture
04/02/2014 Computer Vision I: Recognition 4
We had:
• I think a better name is:
A simple procedure for GMM learning /fitting
• Correct k-means does not deal with Gaussians, just do nearest neighbor assignment (see this lecture)
Roadmap this lecture (chapter 14 in book)
• Defining the Problem
• Semantic segmentation
• Random Forests
• People tracking … what runs on Microsoft Xbox
• Image categorization: Generative versus Discriminative Approach
04/02/2014 Computer Vision I: Recognition 5
Slides credits
• Bernt Schiele
• Li Fei-Fei
• Rob Fergus
• Kirsten Grauman
• Derek Hoiem
• Stefan Roth
• Jamie Shotton
• Antonio Criminisi
04/02/2014 Computer Vision I: Recognition 6
Recognition – What is the Goal ?
• Object instance recognition (more precise: known object instance recognition)
• We know exactly the instance
• Object class recognition (also called: Generic object recognition)
• Different instance of the same class
04/02/2014 Computer Vision I: Recognition 7
Techniques (see lecture 7):
- Robust (Ransac) matching with 𝐹, 𝐻 - Sparse Points (Harris)
- Geometric and Illumination invariant features (SIFT)
Proto-type images
Test image
Train-set Test-set
Result
Class versus Instance – a gray zone
04/02/2014 Computer Vision I: Recognition 8
Object class: coke cans
Same instance or not?
Class-based recognition: Level of Detail
• Image Categorization
• One or more categories per image
• Object Class Detection
• Also find bounding box
• Part-based Object Detection
• Find parts of the object
(and in this way the full object)
• Semantic Segmentation (see last lecture) (segmentation implies pixel-wise accuracy)
• Object-class segmentation
04/02/2014 Computer Vision I: Recognition 9
Frog, branch
2D bounding box for each frog
Recognition – Many Variants and extensions
• Range:
• One can also add Attributes
• Tall, flat, looks nice, “can be used for sitting on”, etc
• material
04/02/2014 Computer Vision I: Recognition 10
grass, sky Forest Tree People, animals
Class is defined predominately by:
Outline (segmentation) Individual parts
Layout of parts Predominantly Texture
defines the class So-called:
Stuff
So-called:
Things
Recognition – Many variants and possibilities
• Context is important (reminder first lecture)
• Model-based recognition
• Cues: texture, curves, etc (reminder lecture 1)
• Input Image:
• 2D (image), 2.5D (Kinect Camera), 3D scan
04/02/2014 Computer Vision I: Recognition 11
The Pascal VOC Challenge (FYI)
04/02/2014 Computer Vision I: Recognition 12
http://pascallin.ecs.soton.ac.uk/challenges/VOC/
20 classes
~10.000 labeled images 20 classes
~27.000 labeled images
The Pascal VOC Challenge (FYI)
04/02/2014 Computer Vision I: Recognition 13
http://pascallin.ecs.soton.ac.uk/challenges/VOC/
10.000 classes
~10.000.000 labeled images
Roadmap this lecture (chapter 14 in book)
• Defining the Problem
• Semantic segmentation
• Random Forests
• People tracking … what runs on Microsoft Xbox
• Image categorization: Generative versus Discriminative Approach
04/02/2014 Computer Vision I: Recognition 14
Random Forest – In a nutshell
04/02/2014 15
Slide credits: Jamie Shotton, and Antonio Criminisi
• Proven very capable, especially for real-time applications
• e.g. keypoint recognition, Kinect body part classification
• High accuracy with very low computational cost
• can exploit low-level features (e.g. raw pixel values)
• feature vector computed sparsely on-demand
• generalization through randomization
• Gives out confidence values
• Flexible, non-parametric model
• Easy to implement and parallelize
Computer Vision I: Recognition
What can random forests do? Tasks
04/02/2014 16
Regression forests Classification forests
Manifold forests
Density forests Semi-supervised forests
Computer Vision I: Recognition
What can random forests do? Applications
04/02/2014 17
Regression forests Classification forests
Manifold forests
Density forests Semi-supervised forests
e.g. semantic segmentation e.g. object localization
e.g. novelty detection e.g. dimensionality reduction e.g. semi-sup. semantic segmentation
Computer Vision I: Recognition
A brief history of Decision Forests
04/02/2014 18
[ L. Breiman, J. Friedman, C.J. Stone, and R.A. Olshen. Classification and Regression Trees (CART). 1984 ]
[ Y. Amit and D. Geman. Randomized enquiries about shape; An application to handwritten digit recognition. Technical Report 1994]
[ Y. Amit and D. Geman. Shape quantization and recognition with randomized trees. 1997 ]
[ L. Breiman. Random forests. 1999, 2001 ]
[ V. Lepetit and P. Fua. Keypoint recognition using randomized trees. 2005, 2006 ]
[ F. Moosman, B. Triggs, F. Jurie. Fast discriminative visual codebooks using randomized clustering forests. 2006 ]
[ G. Rogez, J. Rihan, S. Ramalingam, P. Orrite, C. Torr. Randomized trees for human pose detection. 2008 ]
[ C. Leistner, A. Saffari, J. Santner, H. Bischoff. Semi-supervised random forests. 2009 ]
[ A. Saffari, C. Leistner, J. Santner, M. Godec, H. Bischoff. On-line random forests. 2009 ]
[ S. Nowozin, C. Rother, S. Bagon, T. Sharp, B. Yao, and P. Kohli. Decision tree fields. 2011 ]
Computer Vision I: Recognition
Reminder: Semantic Segmentation
04/02/2014 19
The desired output
Label each pixel with one out of 21 classes
[TextonBoost; Shotton et al, ‘06]
Computer Vision I: Recognition
Reminder: TextonBoost: How it is done
04/02/2014 20
(color model) (location prior)
(class)
(edge aware smoothess prior)
Define Energy:
[TextonBoost; Shotton et al, ‘06]
Location Prior:
grass
As in GrabCut
𝜃𝑖𝑗 𝑥𝑖, 𝑥𝑗 = 𝑤𝑖𝑗|𝑥𝑖 − 𝑥𝑗|
As in GrabCut, each object has an associate GMM
sky
class information:
Each pixel gets a distribution over 21-classes:
𝜃𝑖 𝑥𝑖 = 𝑐, 𝒛 = 𝑃(𝑥𝑖 = 𝑐|𝒛) - Using boosting – explained next lecture
- Using Random Forest – explained next 𝑥𝑖 ∈ {1, … , 𝐾}
𝐸 𝑥, Θ =
𝑖
𝜃
𝑖𝑥
𝑖, 𝑧
𝑖, Θ + 𝜃
𝑖𝑥
𝑖+ 𝜃
𝑖𝑥
𝑖, 𝒛 +
𝑖,𝑗
𝜃
𝑖,𝑗(𝑥
𝑖, 𝑥
𝑗)
Computer Vision I: Recognition
Reminder: TextonBoost
04/02/2014 Computer Vision I: Recognition 21
Class and location only
+ edges + color
model
[TextonBoost; Shotton et al, ‘06]
A variant of TextonBoost using Random Forests
04/02/2014 Computer Vision I: Recognition 22
Object Class Segmentation using Random Forests
Florian Schroff, Antonio Criminisi, and Andrew Zisserman, BMVC 2008
Let us talk about features and simple (weak) classifiers
04/02/2014 23
1) Get 𝑑-dimensional feature, depending on some parameters 𝜃:
1-dimensional:
We want to classify the white pixel Feature: color (e.g. red channel) of the green pixel
Parameters: 2D offset vector
1-dimensional: We want to classify the white pixel
Feature: average color (e.g. red channel) in the rectangle
Parameters: 4D (offset vector + size of rectangle)
Object Class Segmentation using Random Forests
Florian Schroff, Antonio Criminisi, and Andrew Zisserman, BMVC 2008 Computer Vision I: Recognition
Let us talk about features and simple (weak) classifiers
04/02/2014 24
1) Get 𝑑-dimensional feature, depending on some parameters 𝜃:
2-dimensional:
We want to classify the white pixel Feature: color (e.g. red channel) of the green pixel and color (e.g. red channel) of the red pixel
Parameters: 4D (2 offset vectors) We will visualize in this way:
Computer Vision I: Recognition
Let us talk about features and simple (weak) classifiers
04/02/2014 Computer Vision I: Recognition 25
1) Get 𝑑-dimensional feature, depending on some parameters 𝜃:
1-dimensional:
We want to classify the white pixel Feature: color of the green pixel Parameters: 2D offset vector
1-dimensional: We want to classify the white pixel
Feature: average color in the rectangle Parameters: 4D (offset vector + size of rectangle)
Let us talk about features and simple (weak) classifiers
04/02/2014 Computer Vision I: Recognition 26
1) Get 𝑑-dimensional feature, depending on some parameters 𝜃:
2-dimensional:
We want to classify the white pixel Feature: color of the green pixel and color of the red pixel
Parameters: 4D (2 offset vectors)
We will visualize in this way:
Let us talk about features and simple (weak) classifiers
04/02/2014 Computer Vision I: Recognition 27
Weak learner: axis aligned Weak learner: oriented line Weak learner: conic section
Examples of weak learners
Feature response for 2D example.
With a generic line in homog. coordinates.
Feature response for 2D example.
With a matrix representing a conic.
Feature response for 2D example.
In general may select only a very small subset of features
With or
Classifier has 2 parameter:
which axis, continuous threshold
Classifier has 2/3 parameter Classifier has 5/6 parameter
Axis aligned linear classifier
linear classifier conic classifier 𝑎𝑥1 + 𝑏𝑥2+ c <> 0 𝑎𝑥12 + 𝑏𝑥22 +
𝑐𝑥1𝑥2 + 𝑑𝑥1 + 𝑒𝑥2 + 𝑓 <> 0
Let us talk about features and simple (weak) classifiers
04/02/2014 Computer Vision I: Recognition 28
• We put all the parameters of the classifier and of the features into one vector: 𝜽 Axis aligned linear
classifier
linear classifier conic classifier 𝑎𝑥1 + 𝑏𝑥2+ c <> 0 𝑎𝑥12 + 𝑏𝑥22 +
𝑐𝑥1𝑥2 + 𝑑𝑥1 + 𝑒𝑥2 + 𝑓 <> 0
• They are called weak classifiers since they will be used to build a stronger classifier (here random forest, later Boosting)
• We denote the classifier as ℎ 𝜃, 𝑣 ∈ {𝑡𝑟𝑢𝑒(> 0), 𝑓𝑎𝑙𝑠𝑒(< 0)}
Decision Tree (Classification)
04/02/2014 29
[ Y. Amit and D. Geman. Shape quantization and recognition with randomized trees. Neural Computation. 9:1545--1588, 1997]
[ L. Breiman. Random forests. Machine Learning. 45(1):5--32, 2001]
terminal (leaf) node internal
(split) node
root node 0
1 2
3 4 5 6
7 8 9 10 11 12 13 14
A general tree structure
Is top part blue?
Is bottom
part green? Is bottom
part blue?
A decision tree
Computer Vision I: Recognition
Decision Tree – Test Time
04/02/2014 30
Input test
point Go left or right according to:
Input data in feature space
Prediction at leaf (classification) 𝑝(𝑐)
Computer Vision I: Recognition
Decision Tree – Train Time
04/02/2014 31
Input: all training points Input data in feature space
each point has a class label
The set of all labelled (training data) points, here 35 red and 23 blue.
Split the training set at each node
Measure 𝑝(𝑐) at each leave, it could be 3 red an 1 blue,
i.e. 𝑝(𝑟𝑒𝑑) = 0.75; 𝑝(𝑏𝑙𝑢𝑒) = 0.25 (remember, the feature space
is also optimized with 𝜃)
Computer Vision I: Recognition
Random Forests – Training of features (illustration)
04/02/2014 32
What does it mean to optimize over 𝜃
• For each pixel the same feature test (at one split node) will be done.
• One has to define what happens with
feature tests that reaches outside the image Goal during Training: spate red pixel (class 1) from blue pixels (class 2)
Feature:
Value 𝑥1: what is the value of green color channel (could also be red or blue) if you look: 𝜃1 pixel right and 𝜃2 pixels up
Value 𝑥2: what is the value of green color channel (could also be red or blue) if you look: 𝜃3 pixel right and 𝜃4 pixels down
One choice of 𝜃 another choice of 𝜃
Goal: find a such a 𝜃 that it is best to separate the data 𝑝𝑜𝑠 + (𝜃1, 𝜃2)
𝑝𝑜𝑠 + (𝜃3, 𝜃4)
Image Labeling (2 classes, red and blue)
Computer Vision I: Recognition
Decision Tree – Split Criteria
04/02/2014 33
Before split
Shannon’s entropy Node training
Split 1Split 2
Information gain
Think of minimizing Entropy
Computer Vision I: Recognition
Example Calculation
04/02/2014 34
• We have 𝑆 = 12 with 𝑆𝐿 = 6 and 𝑆𝑅 = 6
• In 𝑆 we have 6 red and 6 blue points (2 classes)
• We look at two possible splits:
1) 50%-50% class-split (each side (𝑆𝐿 and 𝑆𝑅) gets 3 red and 3 blue) 𝐻 𝑆𝐿 = − 0.5 log 0.5 + 0.5 log 0.5 = 1
𝐻 𝑆𝑅 = − 0.5 log 0.5 + 0.5 log 0.5 = 1 𝐼(𝑆) = 𝐻(𝑆) – (0.5 + 0.5) = 𝐻(𝑆) – 1 = 0
2) 16%-84% class-split (left side has 5 red and 1 blue, right side has 5 blue and 1 red) 𝐻 𝑆𝐿 = − 1
6log 1
6 + 5
6log 5
6 = 0.64 𝐻 𝑆𝑅 = − 1
6log 1
6 + 5
6log 5
6 = 0.64
𝐼 𝑆 = 𝐻 𝑆 – 0.5 ∗ 0.64 + 0.5 ∗ 0.64 = 𝐻 𝑆 – 0.64 = 0.36
(Higher information gain) (Lower information gain)
Computer Vision I: Recognition
Decision Forest
04/02/2014 35
Tree t=1 t=2 t=3
Forest output probability
The ensemble model
𝑝 𝑐 = 1 𝑇 𝑡
𝑇
𝑝𝑡(𝑐) 𝑝𝑐
𝑇 is the number of trees
Computer Vision I: Recognition
Randomness in the training set
04/02/2014 36
Bagging (randomizing the training set)
The full training set
The randomly sampled subset of training data made available for the tree t
Forest training
Computer Vision I: Recognition
Example: Two classes; axis aligned linear classifier
04/02/2014 37
Training different trees in the forest
Testing different trees in the forest
Parameters: T=200, D=2, weak learner = aligned, leaf model = probabilistic Training points
Computer Vision I: Recognition
Example: Two classes; linear classifier
04/02/2014 38
Training different trees in the forest
Testing different trees in the forest
Parameters: T=200, D=2, weak learner = linear, leaf model = probabilistic Training points
Computer Vision I: Recognition
Example: Two classes; conic classifier
04/02/2014 39
Training different trees in the forest
Testing different trees in the forest
Parameters: T=200, D=2, weak learner = conic, leaf model = probabilistic Training points
Computer Vision I: Recognition
Decision Tree - Randomization
04/02/2014 40
The full set of all possible node test parameters For each node the set of randomly sampled features Randomness control parameter.
For no randomness and maximum tree correlation.
For max randomness and minimum tree correlation.
Randomized node optimization
Small value of ; little tree correlation. Large value of ; large tree correlation.
The effect of
Node weak learner
Node test params Node training
Computer Vision I: Recognition
Decision Forest the choices
04/02/2014 41
• What is the depth of the trees?
• How many trees?
• Choice of 𝜌 ?
• What are the features?
• What type of classifier (linear, conic, etc.) ?
• What split criteria? (other than information gain)
Computer Vision I: Recognition
A crucial factor is tree depth
04/02/2014 Computer Vision I: Recognition 42
Definition
• Over-fitting: Is the effect that the model perfectly memorizes the training data, but does not perform well on test data
• Generalization: One of the most important aspect of a model is its ability to generalize. That means that new (unseen) test data is
correctly classified. A model which overfitts does not generalize well.
04/02/2014 Computer Vision I: Recognition 43
Half way slide
2 Minutes break
04/02/2014 Computer Vision I: Recognition 44
Example: four classes; conic classifier
04/02/2014 45
Training different trees in the forest
Testing different trees in the forest
Parameters: T=200, D=3, weak learner = conic, leaf model = probabilistic Training points
Computer Vision I: Recognition
Examples
04/02/2014 46
Parameters: T=200, D=13, w. l. = conic, predictor = prob.
Training points: 4-class spiral Training pts: 4-class spiral, large gaps Tr. pts: 4-class spiral, larger gaps
Testing posteriors
Computer Vision I: Recognition
Examples - overfitting
04/02/2014 47
max tree depth, D
overfitting underfitting
T=200, D=3, w. l. = conic T=200, D=6, w. l. = conic T=200, D=15, w. l. = conic
Training points: 4-class mixed
Computer Vision I: Recognition
Roadmap this lecture (chapter 14 in book)
• Defining the Problem
• Semantic segmentation
• Random Forests
• People tracking … what runs on Microsoft Xbox
• Image categorization: Generative versus Discriminative Approach
04/02/2014 Computer Vision I: Recognition 48
Body Tracking with Kinect Camera
04/02/2014 49
[ J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake.
Real-Time Human Pose Recognition in Parts from a Single Depth Image. In Proc. IEEE CVPR, June 2011.]
… what runs on Microsoft Xbox
Computer Vision I: Recognition
Reminder: Kinect Camera
04/02/2014 Computer Vision I: Recognition 50
Example Depth: Person
04/02/2014 51
top view
side view
Computer Vision I: Recognition
RGB vs depth for pose estimation
04/02/2014 52
• RGB
Only works well lit
Background clutter
Scale unknown
Clothing & skin colour
• D EPTH
Works in low light
Person ‘pops’ out from bg
Scale known
Uniform texture
Computer Vision I: Recognition
Body Tracking: Pipeline overview
04/02/2014 53
body joint hypotheses
front view side view top view
input depth image body parts
Bodypart
Labelling Clustering
Body is divided into 31 body parts
(simple centroid computation)
Computer Vision I: Recognition
Create lots and lots of training data
04/02/2014 54
Model all sorts of variations:
Record mocap
100,000s of poses
Retarget to varied body shapes
Render (depth, body parts) pairs + add noise
[Vicon]
Computer Vision I: Recognition
Train on synthetic data – test on real data
04/02/2014 55
Synthetic (graphics) Real (hand-labelled)
Computer Vision I: Recognition
Decision Forest
04/02/2014 56
Each leaf stores a distribution over the 31 body parts:
Computer Vision I: Recognition
Very Fast Features
04/02/2014 57
Super Simple feature that can be computed very fast:
input depth image
p
Δ
p
Δ
p
Δ
p
Δ
p
Δ
p
Δ
𝑥
𝑖p = 𝐽 p − 𝐽 p + Δ
depth
image coordinate
offset depth feature
response
offset scales with depth:
Δ = 𝐫
𝑖𝐽(𝐩)
• 1D feature
• 2 Parameters (𝒓
𝒊)
Computer Vision I: Recognition
Number of Trees
04/02/2014 58
ground truth
1 tree 3 trees 6 trees
inferred body parts (most likely)
40%
45%
50%
55%
1 2 3 4 5 6
A ver ag e per -clas s acc ur acy
Number of trees
Test Performance
Computer Vision I: Recognition
depth 1 depth 2 depth 3 depth 4 depth 5 depth 6 depth 7 depth 8 depth 9 depth 10 depth 11 depth 12 depth 13 depth 14 depth 15 depth 16 depth 17 depth 18
input depth ground truth parts inferred parts (soft)
Depth of Trees
04/02/2014 Computer Vision I: Recognition
Avoid Over-fitting
04/02/2014 60
The More (diverse) training images the better.
Computer Vision I: Recognition
Results – Posterior Distributions
04/02/2014 61
Body Parts Distribution: 𝑃(𝑐)
Computer Vision I: Recognition
Results - Tracking
04/02/2014 Computer Vision I: Recognition 62
Roadmap this lecture (chapter 14 in book)
• Defining the Problem
• Semantic segmentation
• Random Forests
• People tracking … what runs on Microsoft Xbox
• Image categorization: Generative versus Discriminative Approach
04/02/2014 Computer Vision I: Recognition 63
Image categorization – Bag of Words Approach
04/02/2014 Computer Vision I: Recognition 64
Also used in document search
04/02/2014 Computer Vision I: Recognition 65
Bag of Words
04/02/2014 Computer Vision I: Recognition 66
Bag of Words - Overview
04/02/2014 Computer Vision I: Recognition 67
Object Representation
04/02/2014 Computer Vision I: Recognition 68
Feature Detection and Representation
04/02/2014 Computer Vision I: Recognition 69
Feature Detection and Representation
04/02/2014 Computer Vision I: Recognition 70
Feature Detection and Representation
04/02/2014 Computer Vision I: Recognition 71
Take all training images
Feature Detection and Representation
04/02/2014 Computer Vision I: Recognition 72
Codeword dictionary formation
04/02/2014 Computer Vision I: Recognition 73
Reminder: K-means
04/02/2014 Computer Vision I: Recognition 74
Reminder: K-means
04/02/2014 Computer Vision I: Recognition 75
Reminder: K-means
04/02/2014 Computer Vision I: Recognition 76
Reminder: K-means
04/02/2014 Computer Vision I: Recognition 77
(Repeat means go to step 3)
Codeword dictionary visualization
04/02/2014 Computer Vision I: Recognition 78
K = 174 (averaged patches for each cluster)
[from Fei Fei Li]Image Patch examples of Codewords
04/02/2014 Computer Vision I: Recognition 79
[from Josef Sivic]
Examples which are assigned to same codeword
Examples which are assigned to same codeword
Bag of Words – Image Representation
04/02/2014 Computer Vision I: Recognition 80
K = 174
Roadmap this lecture (chapter 14 in book)
• Defining the Problem
• Semantic segmentation
• Random Forests
• People tracking … what runs on Microsoft Xbox
• Image categorization: Generative versus Discriminative Approach (will be continued in next lecture)
04/02/2014 Computer Vision I: Recognition 81