Algorithms and Applications:

(1)

Computer Vision I -

Algorithms and Applications:

Recognition

Carsten Rother

Computer Vision I: Recognition 04/02/2014

(2)

Comment on last lecture

04/02/2014 Computer Vision I: Recognition 2

• We introduced generative model: 𝑃 𝒙, 𝒛 = 𝑃 𝒛 𝒙 𝑃 𝒙 and we said that a decomposition is (nearly) always done

• We introduced discriminative model: 𝑃 𝒙|𝒛 =

¹

𝑓

𝑒𝑥𝑝{−𝐸(𝒙, 𝒛)}

as Gibbs distribution with energy 𝐸(𝒙, 𝒛), where 𝑓 =

_𝒙

exp{−𝐸(𝒙, 𝒛)}

• They are related by: 𝑃 𝒙 𝒛 =

¹

𝑃 𝒛

𝑃(𝒙, 𝒛) (hence same optimal solution)

• We showed in a concrete segmentation example how to write:

𝑃 𝒙, 𝒛 = 𝑃 𝒛 𝒙 𝑃(𝒙) in form of a 𝑃 𝒙 𝒛 such that they have the same optimal solution 𝒙

^∗

= 𝑎𝑟𝑔𝑚𝑎𝑥

_𝑥

𝑃 𝒙, 𝒛 = 𝑎𝑟𝑔𝑚𝑎𝑥

_𝑥

𝑃(𝒙|𝒛)

• Given observed data 𝒛, and unobserved variables 𝒙

(3)

Comment on Generative Models

One may also write the joint distribution 𝑃(𝒙, 𝒛) as a Gibbs distribution:

04/02/2014 3

𝑃(𝒙, 𝒛) = 1

𝑓 exp −𝐸 𝒙, 𝒛 where 𝑓 =

𝒙,𝒛

exp{−𝐸(𝒙, 𝒛)}

If likelihood and prior are no longer modelled separately:

• sampling 𝒙, 𝒛 gets very difficult

• We can no longer learn prior and

likelihood separately (as in de-noising)

• We train 𝑃 𝒙, 𝒛 =

¹

𝑓

exp −𝐸 𝒙, 𝒛 , 𝑃(𝒙, 𝒛) =

¹

𝑓

exp −𝐸 𝒙, 𝒛 in a similar way.

(see CV 2 lectures)

𝒙 Samples:

𝒛

𝒙 𝒛

The advantages of a generative model over a discriminative model are mostly gone But … it lost the meaning of a “generative” model, since we don’t have

a likelihood which says how the data was “generated”.

Computer Vision I: Recognition

(4)

Another Comment: last lecture

We had:

• I think a better name is:

A simple procedure for GMM learning /fitting

• Correct k-means does not deal with Gaussians, just do nearest neighbor assignment (see this lecture)

(5)

Roadmap this lecture (chapter 14 in book)

• Defining the Problem

• Semantic segmentation

• Random Forests

• People tracking … what runs on Microsoft Xbox

• Image categorization: Generative versus Discriminative Approach

(6)

Slides credits

• Bernt Schiele

• Li Fei-Fei

• Rob Fergus

• Kirsten Grauman

• Derek Hoiem

• Stefan Roth

• Jamie Shotton

• Antonio Criminisi

(7)

Recognition – What is the Goal ?

• Object instance recognition (more precise: known object instance recognition)

• We know exactly the instance

• Object class recognition (also called: Generic object recognition)

• Different instance of the same class

Techniques (see lecture 7):

- Robust (Ransac) matching with 𝐹, 𝐻 - Sparse Points (Harris)

- Geometric and Illumination invariant features (SIFT)

Proto-type images

Test image

Train-set Test-set

Result

(8)

Class versus Instance – a gray zone

Object class: coke cans

Same instance or not?

(9)

Class-based recognition: Level of Detail

• Image Categorization

• One or more categories per image

• Object Class Detection

• Also find bounding box

• Part-based Object Detection

• Find parts of the object

(and in this way the full object)

• Semantic Segmentation (see last lecture) (segmentation implies pixel-wise accuracy)

• Object-class segmentation

Frog, branch

2D bounding box for each frog

(10)

Recognition – Many Variants and extensions

• Range:

• One can also add Attributes

• Tall, flat, looks nice, “can be used for sitting on”, etc

• material

grass, sky Forest Tree People, animals

Class is defined predominately by:

Outline (segmentation) Individual parts

Layout of parts Predominantly Texture

defines the class So-called:

Stuff

So-called:

Things

(11)

Recognition – Many variants and possibilities

• Context is important (reminder first lecture)

• Model-based recognition

• Cues: texture, curves, etc (reminder lecture 1)

• Input Image:

• 2D (image), 2.5D (Kinect Camera), 3D scan

(12)

The Pascal VOC Challenge (FYI)

http://pascallin.ecs.soton.ac.uk/challenges/VOC/

20 classes

~10.000 labeled images 20 classes

~27.000 labeled images

(13)

The Pascal VOC Challenge (FYI)

http://pascallin.ecs.soton.ac.uk/challenges/VOC/

10.000 classes

~10.000.000 labeled images

(14)

Roadmap this lecture (chapter 14 in book)

• Defining the Problem

• Semantic segmentation

• Random Forests

• People tracking … what runs on Microsoft Xbox

• Image categorization: Generative versus Discriminative Approach

(15)

Random Forest – In a nutshell

04/02/2014 15

Slide credits: Jamie Shotton, and Antonio Criminisi

• Proven very capable, especially for real-time applications

• e.g. keypoint recognition, Kinect body part classification

• High accuracy with very low computational cost

• can exploit low-level features (e.g. raw pixel values)

• feature vector computed sparsely on-demand

• generalization through randomization

• Gives out confidence values

• Flexible, non-parametric model

• Easy to implement and parallelize

(16)

What can random forests do? Tasks

04/02/2014 16

Regression forests Classification forests

Manifold forests

Density forests Semi-supervised forests

(17)

What can random forests do? Applications

04/02/2014 17

Regression forests Classification forests

Manifold forests

Density forests Semi-supervised forests

e.g. semantic segmentation e.g. object localization

e.g. novelty detection e.g. dimensionality reduction e.g. semi-sup. semantic segmentation

(18)

A brief history of Decision Forests

04/02/2014 18

[ L. Breiman, J. Friedman, C.J. Stone, and R.A. Olshen. Classification and Regression Trees (CART). 1984 ]

[ Y. Amit and D. Geman. Randomized enquiries about shape; An application to handwritten digit recognition. Technical Report 1994]

[ Y. Amit and D. Geman. Shape quantization and recognition with randomized trees. 1997 ]

[ L. Breiman. Random forests. 1999, 2001 ]

[ V. Lepetit and P. Fua. Keypoint recognition using randomized trees. 2005, 2006 ]

[ F. Moosman, B. Triggs, F. Jurie. Fast discriminative visual codebooks using randomized clustering forests. 2006 ]

[ G. Rogez, J. Rihan, S. Ramalingam, P. Orrite, C. Torr. Randomized trees for human pose detection. 2008 ]

[ C. Leistner, A. Saffari, J. Santner, H. Bischoff. Semi-supervised random forests. 2009 ]

[ A. Saffari, C. Leistner, J. Santner, M. Godec, H. Bischoff. On-line random forests. 2009 ]

[ S. Nowozin, C. Rother, S. Bagon, T. Sharp, B. Yao, and P. Kohli. Decision tree fields. 2011 ]

(19)

Reminder: Semantic Segmentation

04/02/2014 19

The desired output

Label each pixel with one out of 21 classes

[TextonBoost; Shotton et al, ‘06]

(20)

Reminder: TextonBoost: How it is done

04/02/2014 20

(color model) (location prior)

(class)

(edge aware smoothess prior)

Define Energy:

Location Prior:

grass

As in GrabCut

𝜃_𝑖𝑗 𝑥_𝑖, 𝑥_𝑗 = 𝑤_𝑖𝑗|𝑥_𝑖 − 𝑥_𝑗|

As in GrabCut, each object has an associate GMM

sky

class information:

Each pixel gets a distribution over 21-classes:

𝜃_𝑖 𝑥_𝑖 = 𝑐, 𝒛 = 𝑃(𝑥_𝑖 = 𝑐|𝒛) - Using boosting – explained next lecture

Class and location only

+ edges + color

model

(22)

A variant of TextonBoost using Random Forests

Object Class Segmentation using Random Forests

Florian Schroff, Antonio Criminisi, and Andrew Zisserman, BMVC 2008

(23)

Let us talk about features and simple (weak) classifiers

04/02/2014 23

1) Get 𝑑-dimensional feature, depending on some parameters 𝜃:

1-dimensional:

We want to classify the white pixel Feature: color (e.g. red channel) of the green pixel

Parameters: 2D offset vector

1-dimensional: We want to classify the white pixel

Feature: average color (e.g. red channel) in the rectangle

Parameters: 4D (offset vector + size of rectangle)

Object Class Segmentation using Random Forests

Florian Schroff, Antonio Criminisi, and Andrew Zisserman, BMVC 2008 Computer Vision I: Recognition

(24)

Let us talk about features and simple (weak) classifiers

04/02/2014 24

2-dimensional:

We want to classify the white pixel Feature: color (e.g. red channel) of the green pixel and color (e.g. red channel) of the red pixel

Parameters: 4D (2 offset vectors) We will visualize in this way:

(25)

Let us talk about features and simple (weak) classifiers

1-dimensional:

We want to classify the white pixel Feature: color of the green pixel Parameters: 2D offset vector

1-dimensional: We want to classify the white pixel

Feature: average color in the rectangle Parameters: 4D (offset vector + size of rectangle)

(26)

Let us talk about features and simple (weak) classifiers

2-dimensional:

We want to classify the white pixel Feature: color of the green pixel and color of the red pixel

Parameters: 4D (2 offset vectors)

We will visualize in this way:

(27)

Let us talk about features and simple (weak) classifiers

Weak learner: axis aligned Weak learner: oriented line Weak learner: conic section

Examples of weak learners

Feature response for 2D example.

With a generic line in homog. coordinates.

With a matrix representing a conic.

In general may select only a very small subset of features

With or

Classifier has 2 parameter:

which axis, continuous threshold

Classifier has 2/3 parameter Classifier has 5/6 parameter

Axis aligned linear classifier

linear classifier conic classifier 𝑎𝑥₁ + 𝑏𝑥₂+ c <> 0 𝑎𝑥₁² + 𝑏𝑥₂² +

𝑐𝑥₁𝑥₂ + 𝑑𝑥₁ + 𝑒𝑥₂ + 𝑓 <> 0

(28)

Let us talk about features and simple (weak) classifiers

• We put all the parameters of the classifier and of the features into one vector: 𝜽 Axis aligned linear

classifier

linear classifier conic classifier 𝑎𝑥₁ + 𝑏𝑥₂+ c <> 0 𝑎𝑥₁² + 𝑏𝑥₂² +

𝑐𝑥₁𝑥₂ + 𝑑𝑥₁ + 𝑒𝑥₂ + 𝑓 <> 0

• They are called weak classifiers since they will be used to build a stronger classifier (here random forest, later Boosting)

• We denote the classifier as ℎ 𝜃, 𝑣 ∈ {𝑡𝑟𝑢𝑒(> 0), 𝑓𝑎𝑙𝑠𝑒(< 0)}

(29)

Decision Tree (Classification)

04/02/2014 29

[ Y. Amit and D. Geman. Shape quantization and recognition with randomized trees. Neural Computation. 9:1545--1588, 1997]

[ L. Breiman. Random forests. Machine Learning. 45(1):5--32, 2001]

terminal (leaf) node internal

(split) node

root node 0

1 2

3 4 5 6

7 8 9 10 11 12 13 14

A general tree structure

Is top part blue?

Is bottom

part green? Is bottom

part blue?

A decision tree

(30)

Decision Tree – Test Time

04/02/2014 30

Input test

point Go left or right according to:

Input data in feature space

Prediction at leaf (classification) 𝑝(𝑐)

(31)

Decision Tree – Train Time

04/02/2014 31

Input: all training points Input data in feature space

each point has a class label

The set of all labelled (training data) points, here 35 red and 23 blue.

Split the training set at each node

Measure 𝑝(𝑐) at each leave, it could be 3 red an 1 blue,

i.e. 𝑝(𝑟𝑒𝑑) = 0.75; 𝑝(𝑏𝑙𝑢𝑒) = 0.25 (remember, the feature space

is also optimized with 𝜃)

(32)

Random Forests – Training of features (illustration)

04/02/2014 32

What does it mean to optimize over 𝜃

• For each pixel the same feature test (at one split node) will be done.

• One has to define what happens with

feature tests that reaches outside the image Goal during Training: spate red pixel (class 1) from blue pixels (class 2)

Feature:

Value 𝑥₁: what is the value of green color channel (could also be red or blue) if you look: ^𝜃1 pixel right and 𝜃₂ pixels up

Value 𝑥₂: what is the value of green color channel (could also be red or blue) if you look: ^𝜃3 pixel right and 𝜃₄ pixels down

One choice of 𝜃 another choice of 𝜃

Goal: find a such a 𝜃 that it is best to separate the data 𝑝𝑜𝑠 + (𝜃₁, 𝜃₂)

𝑝𝑜𝑠 + (𝜃₃, 𝜃₄)

Image Labeling (2 classes, red and blue)

(33)

Decision Tree – Split Criteria

04/02/2014 33

Before split

Shannon’s entropy Node training

Split 1Split 2

Information gain

Think of minimizing Entropy

(34)

Example Calculation

04/02/2014 34

• We have 𝑆 = 12 with 𝑆^𝐿 = 6 and 𝑆^𝑅 = 6

• In 𝑆 we have 6 red and 6 blue points (2 classes)

• We look at two possible splits:

1) 50%-50% class-split (each side (𝑆^𝐿 and 𝑆^𝑅) gets 3 red and 3 blue) 𝐻 𝑆^𝐿 = − 0.5 log 0.5 + 0.5 log 0.5 = 1

𝐻 𝑆^𝑅 = − 0.5 log 0.5 + 0.5 log 0.5 = 1 𝐼(𝑆) = 𝐻(𝑆) – (0.5 + 0.5) = 𝐻(𝑆) – 1 = 0

2) 16%-84% class-split (left side has 5 red and 1 blue, right side has 5 blue and 1 red) 𝐻 𝑆^𝐿 = − ¹

6log ¹

6 + ⁵

6log ⁵

6 = 0.64 𝐻 𝑆^𝑅 = − ¹

6log ¹

6 + ⁵

6log ⁵

6 = 0.64

𝐼 𝑆 = 𝐻 𝑆 – 0.5 ∗ 0.64 + 0.5 ∗ 0.64 = 𝐻 𝑆 – 0.64 = 0.36

(Higher information gain) (Lower information gain)

(35)

Decision Forest

04/02/2014 35

Tree t=1 t=2 t=3

Forest output probability

The ensemble model

𝑝 𝑐 = 1 𝑇 𝑡

𝑇

𝑝_𝑡(𝑐) 𝑝𝑐

𝑇 is the number of trees

(36)

Randomness in the training set

04/02/2014 36

Bagging (randomizing the training set)

The full training set

The randomly sampled subset of training data made available for the tree t

Forest training

(37)

Example: Two classes; axis aligned linear classifier

04/02/2014 37

Training different trees in the forest

Testing different trees in the forest

Parameters: T=200, D=2, weak learner = aligned, leaf model = probabilistic Training points

(38)

Example: Two classes; linear classifier

04/02/2014 38

Parameters: T=200, D=2, weak learner = linear, leaf model = probabilistic Training points

(39)

Example: Two classes; conic classifier

04/02/2014 39

Parameters: T=200, D=2, weak learner = conic, leaf model = probabilistic Training points

(40)

Decision Tree - Randomization

04/02/2014 40

The full set of all possible node test parameters For each node the set of randomly sampled features Randomness control parameter.

For no randomness and maximum tree correlation.

For max randomness and minimum tree correlation.

Randomized node optimization

Small value of ; little tree correlation. Large value of ; large tree correlation.

The effect of

Node weak learner

Node test params Node training

(41)

Decision Forest the choices

04/02/2014 41

• What is the depth of the trees?

• How many trees?

• Choice of 𝜌 ?

• What are the features?

• What type of classifier (linear, conic, etc.) ?

• What split criteria? (other than information gain)

(42)

A crucial factor is tree depth

(43)

Definition

• Over-fitting: Is the effect that the model perfectly memorizes the training data, but does not perform well on test data

• Generalization: One of the most important aspect of a model is its ability to generalize. That means that new (unseen) test data is

correctly classified. A model which overfitts does not generalize well.

(44)

Half way slide

2 Minutes break

(45)

Example: four classes; conic classifier

04/02/2014 45

Parameters: T=200, D=3, weak learner = conic, leaf model = probabilistic Training points

(46)

Examples

04/02/2014 46

Parameters: T=200, D=13, w. l. = conic, predictor = prob.

Training points: 4-class spiral Training pts: 4-class spiral, large gaps Tr. pts: 4-class spiral, larger gaps

Testing posteriors

(47)

Examples - overfitting

04/02/2014 47

max tree depth, D

overfitting underfitting

T=200, D=3, w. l. = conic T=200, D=6, w. l. = conic T=200, D=15, w. l. = conic

Training points: 4-class mixed

(48)

Roadmap this lecture (chapter 14 in book)

• Defining the Problem

• Semantic segmentation

• Random Forests

• People tracking … what runs on Microsoft Xbox

• Image categorization: Generative versus Discriminative Approach

(49)

Body Tracking with Kinect Camera

04/02/2014 49

[ J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake.

Real-Time Human Pose Recognition in Parts from a Single Depth Image. In Proc. IEEE CVPR, June 2011.]

… what runs on Microsoft Xbox

(50)

Reminder: Kinect Camera

(51)

Example Depth: Person

04/02/2014 51

top view

side view

(52)

RGB vs depth for pose estimation

04/02/2014 52

• RGB

 Only works well lit

 Background clutter

 Scale unknown

Clothing & skin colour

• D ^EPTH

 Works in low light

 Person ‘pops’ out from bg

 Scale known

Uniform texture

(53)

Body Tracking: Pipeline overview

04/02/2014 53

body joint hypotheses

front view side view top view

input depth image body parts

Bodypart

Labelling Clustering

Body is divided into 31 body parts

(simple centroid computation)

(54)

Create lots and lots of training data

04/02/2014 54

Model all sorts of variations:

Record mocap

100,000s of poses

Retarget to varied body shapes

Render (depth, body parts) pairs + add noise

[Vicon]

04/02/2014 58

ground truth

1 tree 3 trees 6 trees

inferred body parts (most likely)

40%

45%

50%

55%

1 2 3 4 5 6

A ver ag e per -clas s acc ur acy

Number of trees

Test Performance

(59)

depth 1 depth 2 depth 3 depth 4 depth 5 depth 6 depth 7 depth 8 depth 9 depth 10 depth 11 depth 12 depth 13 depth 14 depth 15 depth 16 depth 17 depth 18

input depth ground truth parts inferred parts (soft)

Depth of Trees

04/02/2014 Computer Vision I: Recognition

(60)

Avoid Over-fitting

04/02/2014 60

The More (diverse) training images the better.

(61)

Results – Posterior Distributions

04/02/2014 61

Body Parts Distribution: 𝑃(𝑐)

(62)

Results - Tracking

(63)

Roadmap this lecture (chapter 14 in book)

• Defining the Problem

• Semantic segmentation

• Random Forests

• People tracking … what runs on Microsoft Xbox

• Image categorization: Generative versus Discriminative Approach

(64)

Image categorization – Bag of Words Approach

(65)

Also used in document search

(66)

Bag of Words

(67)

Bag of Words - Overview

(68)

Object Representation

(69)

Feature Detection and Representation

(70)

Feature Detection and Representation

(71)

Feature Detection and Representation

Take all training images

(Repeat means go to step 3)

(78)

Codeword dictionary visualization

K = 174 (averaged patches for each cluster)

[from Fei Fei Li]

(79)

Image Patch examples of Codewords

[from Josef Sivic]

Examples which are assigned to same codeword

(80)

Bag of Words – Image Representation

K = 174

(81)