Algorithms and Applications:

(1)

Computer Vision I -

Algorithms and Applications:

Semantic Segmentation

Carsten Rother

(2)

Roadmap this lecture (chapter 14.4.3, 5.5 in book)

• Interactive Image Segmentation

• From Generative models to

• Discriminative models to

• Discriminative function

• Image Segmentation using GrabCut

• Semantic Segmentation

(3)

Roadmap this lecture (chapter 14.4.3, 5.5 in book)

• Interactive Image Segmentation

• From Generative models to

• Discriminative models to

• Discriminative function

• Image Segmentation using GrabCut

• Semantic Segmentation

(4)

Probabilities - Reminder

• Discrete probability distribution: 𝑃(𝑥) satisfies

𝑥 𝑃(𝑥) = 1 where 𝑥 ∈ {0, … , 𝐾}

• Joint distribution of two variables: 𝑃(𝑥, 𝑧)

• Conditional distribution: 𝑃(𝑥|𝑧)

• Sum rule: 𝑃 𝑧 = _𝑥 𝑃(𝑥, 𝑧)

• Product rule: 𝑃 𝑥, 𝑧 = 𝑃 𝑧 𝑥 𝑃(𝑥)

• Bayes’ rule: 𝑃 𝑥|𝑧 = ^𝑃 𝑧 𝑥 ^{𝑃 𝑥}

𝑃(𝑧)

(5)

A Machine Learning View on Models

Modelling a problem:

• The data is 𝒛 and the desired output 𝒙

We can identify three different approaches:

[see details in Bishop, page 42ff]:

• Generative (probabilistic) models: 𝑃(𝒙, 𝒛)

• Discriminative (probabilistic) models: 𝑃(𝒙|𝒛)

• Discriminative functions: 𝑓(𝒙, 𝒛)

(6)

Generative Model

Models explicitly (or implicitly) the distribution of the input 𝒛 and output 𝒙

Joint Probablity 𝑃(𝒙, 𝒛) = 𝑃(𝒛|𝒙) 𝑃(𝒙)

Comment:

1. The joint distribution does not necessarily have to be decomposed into likelihood and prior, but in practice it (nearly) always is

2. Generative Models are used successfully when input 𝒛 and output 𝒙 are very related, e.g. image denoising.

Pros: 1. Possible to sample both: 𝒙 and 𝒛

2. Can be quite easily used for many applications (since prior and likelihood are modeled separately)

3. In some applications, e.g. biology, people want to model

likelihood and Prior explicitly, since the want to understand the model as much possible

4. Probability can be used in bigger systems

Cons: 1. might not always be possible to write down the full distribution (involves a distribution over images 𝒛).

likelihood prior

(7)

Generative Model – Example De-noising

Joint Probablity 𝑃 𝒛, 𝒙 = 𝑃 𝒛|𝒙 𝑃(𝒙)

likelihood prior

Pixel-wise likelihood: 𝑃 𝒛|𝒙 =

_𝒊

𝑵(𝑥

_𝑖

; 𝑧

_𝑖

, 𝜎) ~

_𝒊

𝒆𝒙𝒑{

¹

2𝜎²

𝑧

_𝑖

− 𝑥

_𝑖 ²

}

Data 𝒛

(pixel independent Gaussian noise) Label 𝒙

𝑧

_𝑖

𝑵(𝑥

_𝑖

; 𝑧

_𝑖

, 𝜎)

(sketched)

(8)

Generative Model – Example De-noising

Prior: 𝑃 𝒙 =

^𝟏

𝒇

𝒆𝒙𝒑{−

_{𝒊𝒋∈𝑵}_𝟒

𝑥

_𝑖

− 𝑥

_𝑗

}

𝒙

_𝒊

− 𝒙

_𝒋

“sketched”

Robust Prior:

𝑃 𝒙 = 𝟏

𝒇 𝒆𝒙𝒑{−

𝒊𝒋∈𝑵_𝟒

𝑚𝑖𝑛(− 𝑥

_𝑖

− 𝑥

_𝑗

, 𝜏}

Joint Probablity 𝑃 𝒛, 𝒙 = 𝑃 𝒛|𝒙 𝑃(𝒙)

likelihood prior

Pixel-wise likelihood: 𝑃 𝒛|𝒙 =

_𝒊

𝑵(𝑥

_𝑖

; 𝑧

_𝑖

, 𝜎) ~

_𝒊

𝒆𝒙𝒑{

¹

2𝜎²

𝑧

_𝑖

− 𝑥

_𝑖 ²

}

𝑃 𝑥

_𝑖

, 𝑥

_𝑗

Follows the statistic of gradients in natural images

(9)

Result of more advanced prior models

[Komodiakis et al. CVPR 2009]

(10)

Result of more advanced prior models

FoE

Leraned Prior on 5 × 5 patch

(11)

Change application: in-painting

[Field of Expert, Roth et al IJCV 2008]

FoE

(12)

Change application: in-painting

Joint Probablity 𝑃 𝒛, 𝒙 = 𝑃 𝒛|𝒙 𝑃(𝒙)

likelihood prior

Pixel-wise likelihood: 𝑃 𝒛|𝒙 =

_𝒊

𝑵(𝑥

_𝑖

; 𝑧

_𝑖

, 𝜎) ~

_𝒊

𝒆𝒙𝒑{

¹

2𝜎²

𝑧

_𝑖

− 𝑥

_𝑖 ²

}

Data 𝒛 Label 𝒙

Pixel-wise likelihood: 𝑃 𝒛|𝒙 = 𝒄𝒐𝒏𝒔𝒕 (for red text)

Data 𝒛 Label 𝒙

𝑃 𝒛|𝒙 = 𝜹(𝑥

_𝑖

= 𝑧

_𝑖

) (otherwise)

(13)

Generative Model for image segmentation

Goal

Given 𝒛; derive binary 𝒙:

Optmimal solution: 𝒙

^∗

= 𝑎𝑟𝑔𝑚𝑎𝑥

_𝑥

𝑃(𝒙, 𝒛) for a fixed 𝒛

(user-specified pixels are not optimized for)

𝒛 = 𝑅, 𝐺, 𝐵

^𝑛

𝒙 = 0,1

^𝑛

Interactive Segmentation

Statistical model 𝑃(𝒙, 𝒛) for both images 𝒛 and data 𝒙

(we then later come to 𝑃(𝒙|𝒛) and 𝑓(𝒙, 𝒛))

(14)

Generative Model for image segmentation - likelihood

Joint Probablity 𝑃 𝒛, 𝒙 = 𝑃 𝒛|𝒙 𝑃(𝒙)

likelihood prior

The red brush strokes give training data for foreground pixels The blue brush strokes give training data for background pixels

Red

Gr ee n

User labelled pixels Gaussian Mixture Model Fit Red

Gr ee n

(15)

Gaussian Mixture Model (GMM)

• Mixture Model: 𝑝 𝑧 = _𝑘=1 ^𝐾 𝑝 𝑘 𝑝 𝑧 𝑘

• “𝑘” is a latent variable we are not interested in

• 𝑘 ∈ 1, … , 𝐾 represents the 𝐾 mixtures.

• Each mixture 𝑘 is a 3D Gaussian distribution 𝑁 _𝑘 (𝑧; 𝜇 _𝑘 , Σ _𝑘 ) where 𝜇 _𝑘 , is a 3d vector and Σ _k a 3 × 3 matrix (positive-semidefinite) called covariance matrices:

𝑁 𝑧, 𝜇, Σ = 1

2𝜋

^𝑝/2

Σ

¹²

exp{− 1

2 𝑧 − 𝜇

^𝑇

Σ

⁻¹

(𝑧 − 𝜇)}

• 𝑝 𝑧 =

_𝑘=1^𝐾

𝜋

_𝑘

𝑁 _𝑘 (𝑧; 𝜇 _𝑘 , Σ _𝑘 )

Mixture coefficient

(16)

Gaussian Mixture Model (GMM)

• GMM probability ^{𝑝 𝑧 =}

_𝑘=1^𝐾

^𝜋

_𝑘

𝑁 _𝑘 (𝑧; 𝜇 _𝑘 , Σ _𝑘 )

• Unknown parameters: Θ = (𝜋 ₁ , … , 𝜋 _𝐾 , 𝜇 ₁ , … , 𝜇 _𝐾 , Σ ₁ , … , Σ _𝐾 )

• How to learn Θ given data 𝑧 :

• Maximum Likelihood estimation using EM (see machine learning lecture ML 1)

• Next: simpler version using to learn GMMs (close to k-means)

Example:

(17)

A simple procedure for GMM learning /fitting

Let us introduce an assignment variable for each data point (pixel) to which Gaussian it belongs to: 𝑘

₁

, … 𝑘

_𝑛

where 𝑘

_𝑖

∈ {1, … 𝐾}

𝑘_𝑖 = 𝑎𝑟𝑔𝑚𝑎𝑥_𝑘𝑁_𝑘(𝑧; 𝜇_𝑘, Σ_𝑘)

(18)

Extensions of K-means

• Choose 𝐾 automatically

• Go to probabilistic version using Expectation Maximization (EM).

Now 𝑘 _𝑖 are probabilistic assignments to all Gaussian (not only one)

• Faster versions:

• Fit GMM to all data points and then only change the mixture coefficients

• Use Histograms instead of GMMs

(19)

Illustration EM

Soft assignment: 𝑝(𝑎

_𝑖

)

[Bishop page 437]

(20)

Some comments on clustering

• More in CV2

• Clustering without spatial constraints:

(K-means, mean-shift, etc)

• Clustering with spatial constraints:

(super-pixels, normalized cut, etc)

• Gestalt Theory:

normalized cut

(sketched)

(21)

Joint Probability - Likelihood

Joint Probablity 𝑃 𝒛, 𝒙 = 𝑃 𝒛|𝒙 𝑃 𝒙

𝑃 𝒛|𝒙 =

𝒊

𝑃 𝑧 _𝑖 𝑥 _𝑖 =

𝑖

(

𝑘=1 𝐾

𝜋

_𝑘^𝑥^𝑖

𝑁

_𝑘^𝑥^𝑖

(𝑧

_𝑖

, 𝜇

_𝑘^𝑥^𝑖

, Σ

_𝑘^𝑥ⁱ

))

Θ = (𝜋 ₁ ⁰ , … , 𝜋 _𝐾 ⁰ , 𝜇 ₁ ⁰ , … , 𝜇 _𝐾 ⁰ , Σ ₁ ⁰ , … , Σ _𝐾 ⁰ , 𝜋 ₁ ¹ , … , 𝜋 _𝐾 ¹ , 𝜇 ₁ ¹ , … , 𝜇 _𝐾 ¹ , Σ ₁ ¹ , … , Σ _𝐾 ¹ )

All parameters with superscript 0 belong to background and all with superscript 1 belong to foreground:

Likelihood:

New query

image 𝑧

_𝑖

(22)

Joint Probability - Likelihood

Maximum likelihood estimation 𝑃 𝑧

_𝑖

𝑥

_𝑖

= 1 𝑃 𝑧

_𝑖

𝑥

_𝑖

= 0

𝒙

^∗

= 𝒂𝒓𝒈𝒎𝒂𝒙

_𝒙

𝑃 𝒛|𝒙 =

𝒊

𝑃(𝒛

_𝒊

|𝒙

_𝒊

) New query

image 𝑧

_𝑖

(23)

Joint Probability - Prior

Joint Probablity 𝑃 𝒛, 𝒙 = 𝑃 𝒛|𝒙 𝑃 𝒙

𝑃(𝒙) = 1

𝑓

𝒊,𝒋∈𝑵_𝟒

Θ

_𝑖,𝑗

(𝑥

_𝑖

, 𝑥

_𝑗

)

(exp{-1}=0.36; exp{0}=1)

𝑥

_𝑖

𝑥

_𝑗

Θ

_𝑖,𝑗

𝑥

_𝑖

, 𝑥

_𝑗

= exp{−|𝑥

_𝑖

− 𝑥

_𝑗

|} called “Ising prior”

𝑓 =

𝒙 𝑖,𝑗∈𝑁₄

Θ

_𝑖,𝑗

(𝑥

_𝑖

, 𝑥

_𝑗

) Partition function, sum over all possible results 𝒙

(24)

Joint Probability – Prior (4x4 Grid example)

Best Solutions sorted by probability

Pure Prior model:

“Smoothness prior needs the likelihood”

Worst Solutions sorted by probability

𝑥_𝑗 𝑥_𝑖

𝑃(𝒙) = 1

𝑓

𝑖,𝑗∈𝑁₄

exp{−|𝑥

_𝑖

− 𝑥

_𝑗

|}

(25)

Joint Probability – Prior (4x4 Grid example)

Distribution Samples

2

¹⁶

configurations

P robab ili ty

Pure Prior model: 𝑃(𝒙) = 1

𝑓

𝒊,𝒋∈𝑵_𝟒

exp{−|𝑥

_𝑖

− 𝑥

_𝑗

|}

𝑥_𝑗 𝑥_𝑖

(26)

Joint Probability – Result

Global optimum

𝒙

^∗

= 𝒂𝒓𝒈𝒎𝒂𝒙

_𝒙

𝑃 𝒛, 𝒙

ML solution:

𝒙

^∗

= 𝒂𝒓𝒈𝒎𝒂𝒙

_𝒙

𝑃 𝒛|𝒙

Joint Probablity

𝑃 𝒛, 𝒙 = 𝑃 𝒛|𝒙 𝑃 𝒙 =

𝑖

(

𝑘=1 𝐾

𝜋

_𝑘𝑖^𝑥^𝑖

𝑁

_𝑘^𝑥^𝑖

(𝑧

_𝑖

, 𝜇

_𝑘^𝑥^𝑖

, Σ

_𝑘^𝑥ⁱ

)) ¹

𝑓

𝑖,𝑗∈𝑁₄

exp{−|𝑥

_𝑖

− 𝑥

_𝑗

|}

𝑃(𝑥

_𝑖

= 0) = 0; 𝑃(𝑥

_𝑖

= 1) = 1;

Hard constraint:

(27)

Sample from the model

𝒙 Samples: True image:

Most likely:

𝑃 𝒛, 𝒙 = 𝑃 𝒛|𝒙 𝑃 𝒙

𝒛

𝒙 𝒛

(28)

Why does it still work?

• We only evaluate 𝒙 for a given 𝒛

Global optimum other likely solutions will

look similar (sketched)

(29)

Best Prior Models for Images

Best prior models for images 𝑃(𝒙) can give such results:

Looks good on texture level but not on global level (e.g. scene layout) Sampled 𝒙

Simple model for segmentations:

Remind denoising: 𝑃 𝒛, 𝒙 = 𝑃 𝒛|𝒙 𝑃(𝒙)

Pixel-wise likelihood: 𝑃 𝒛|𝒙 =

_𝒊

𝑵(𝑥

_𝑖

; 𝑧

_𝑖

, 𝜎) ~

_𝒊

𝒆𝒙𝒑{

¹

2𝜎²

𝑧

_𝑖

− 𝑥

_𝑖 ²

}

[Field of Expert, Roth et al IJCV 2008]

(30)

Is it the best we can do?

4-connected segmentation

Zoom-in on image

zoom zoom

(31)

Reminder: Going to 8-connectivty

Larger connectivity can model true Euclidean length (also other metric possible)

Eucl.

Length of the paths:

4-con.

5.65 8 1

8-con.

6.28 6.28

5.08 6.75

[Boykov et al. ‘03; ‘05]

(32)

Going to 8-connectivty

4-connected Euclidean

8-connected Euclidean (MRF)

Zoom-in image

(33)

Modelling edges

• How to put this into our model?

• 𝑃(𝑥) cannot depend on data!

• 𝑃 𝑧 𝑥 =

_{𝑖,𝑗∈𝑁}₄

𝑃(𝑧

_𝑖𝑗

|𝑥

_𝑖𝑗

) must be extended to model all possible

pairwise transitions from training data (e.g. with 6D Gaussian). But:

• This is difficult for the user to label

• Hard to get from other images

• There is a much simpler way:

model only 𝑃 𝑥 𝑧

• A transition is likely when two

neighboring pixels have different color.

(34)

Half way slide

3 Minutes break

(35)

Roadmap this lecture (chapter 14.4.3, 5.5 in book)

• Interactive Image Segmentation

• From Generative models to

• Discriminative models to

• Discriminative function

• Image Segmentation using GrabCut

• Semantic Segmentation

(36)

Discriminative model

𝑃 𝒙 𝒛 = 1

𝑓 exp −𝐸 𝒙, 𝒛 where 𝑓 =

𝒙

exp{−𝐸(𝒙, 𝒛)}

Models that model the Posterior directly are discriminative models.

In Computer Vision we use mostly the Gibbs distribution with an Energy 𝐸:

These are also called: “Conditional random field”

Pros: 1. Simpler to write down than generative model (no need to model 𝒛)

and goes directly for the desired output 𝒙 2. More flexible since energy is arbitrary

3. Probability can be used in bigger systems

Cons: we can no longer sample images 𝒛

(37)

Discriminative model

• Relation: Posterior and Joint: 𝑃 𝒙 𝒛 =

¹

𝑃 𝒛

𝑃 𝒙, 𝒛

• 𝑃(𝒙, 𝒛), 𝑃 𝒙 𝒛 and 𝐸(𝒙, 𝒛) all have the same optimal solution 𝒙

^∗

given z:

• 𝒙

^∗

= 𝑎𝑟𝑔𝑚𝑎𝑥

_𝒙

𝑃 𝒙, 𝒛 given 𝒛

• 𝒙

^∗

= 𝑎𝑟𝑔𝑚𝑎𝑥

_𝒙

𝑃 𝒙|𝒛 given 𝒛 (since 𝑃 𝒙 𝒛 =

¹

𝑃 𝒛

𝑃 𝒙, 𝒛 )

• 𝒙

^∗

= 𝑎𝑟𝑔𝑚𝑖𝑛

_𝒙

𝐸 𝒙, 𝒛 (since −log 𝑃 𝒙 𝒛 = log 𝑓 + 𝐸(𝒙, 𝒛))

(38)

How does 𝐸 looks like for our segmentation example?

• So that 𝑃(𝒙|𝒛), 𝑃(𝒙, 𝒛) have the same optimal solution 𝒙 ^∗ we need:

−log 𝑃 𝒙, 𝒛 = 𝐸 𝒙, 𝒛 + constant = !

_𝑖

Θ

_𝑖

𝑥

_𝑖

, 𝒛 +

_{𝑖,𝑗∈𝑁}₄

Θ

_𝑖𝑗

𝑥

_𝑖

, 𝑥

_𝑗

, 𝒛

+constant

𝑃 𝒙, 𝒛 ~ 𝑃 𝒙 𝒛 = 1

𝑓 𝑒𝑥𝑝 {−𝑬(𝒙, 𝒛)}

!

~means up to scale

=

^{+ constant}

(39)

Comment on Generative Models

One may also write the joint distribution 𝑃(𝒙, 𝒛) as a Gibbs distribution:

𝑃(𝒙, 𝒛) = 1

𝑓 exp −𝐸 𝒙, 𝒛 where 𝑓 =

𝒙,𝒛

exp{−𝐸(𝒙, 𝒛)}

If likelihood and prior are no longer modelled separately:

• sampling 𝒙, 𝒛 gets very difficult

• We can no longer learn prior and

likelihood separately (as in de-noising)

• We train 𝑃 𝒙, 𝒛 =

¹

𝑓

exp −𝐸 𝒙, 𝒛 , 𝑃(𝒙|𝒛) =

¹

𝑓

exp −𝐸 𝒙, 𝒛 in a similar way.

(see CV 2 lectures)

𝒙 Samples:

𝒛

𝒙 𝒛

The advantage of a generative model over a discriminative model are gone But … it lost the meaning of a “generative” model, since we don’t have

a likelihood which says how the data was “generated”.

(40)

Adding a contrast term

𝐸 𝒙, 𝒛 =

𝑖

Θ

_𝑖

𝑥

_𝑖

, 𝒛 +

𝑖,𝑗∈𝑁₄

Θ

_𝑖𝑗

𝑥

_𝑖

, 𝑥

_𝑗

, 𝒛

𝛽 = 2 𝑁

₄

𝑖𝑗∈𝑁₄

𝑧

_𝑖

− 𝑧

_{𝑗 2}²

−1

Θ

_𝑖𝑗

𝑥

_𝑖

, 𝑥

_𝑗

, 𝒛 = 𝑥

_𝑖

− 𝑥

_𝑗

(𝑒𝑥𝑝 −𝛽 𝑧

_𝑖

− 𝑧

_𝑗 ²

)

𝑒𝑥𝑝−𝛽𝑧𝑖−𝑧𝑗2

(41)

Roadmap this lecture (chapter 14.4.3, 5.5 in book)

• Interactive Image Segmentation

• From Generative models to

• Discriminative models to

• Discriminative function

• Image Segmentation using GrabCut

• Semantic Segmentation

(42)

Discriminative functions

𝐸 𝒙, 𝒛 : 𝑲 ^𝒏 → 𝑹

Models that model the classification problem via a function

Examples:

- Energy

- support vector machines - nearest neighbour classifier

Pros: most direct approach to model the problem Cons: no probabilities

𝒙 ^∗ = 𝑎𝑟𝑔𝑚𝑖𝑛 _𝒙 𝐸 𝒙, 𝒛

This is the most used approach in computer vision!

(43)

Recap

Modelling a problem:

• The input data is 𝒛 and the desired output 𝒙

We can identify three different approaches:

[see details in Bishop, page 42ff]:

• Generative (probabilistic) models: 𝑃(𝒙, 𝒛)

• Discriminative (probabilistic) models: 𝑃(𝒙|𝒛)

• Discriminative functions: 𝑓(𝒙, 𝒛)

The key difference are:

• Probabilistic or none-probabilistic model

• Generative models model also the data 𝒛

• Differences in Training (see CV 2)

(44)

Simple example: Learning Discriminative functions

𝜔 =0 𝜔 =10

𝜔 =200 𝜔 =40

𝐸 𝒙, 𝒛 = Θ (𝑥 , 𝒛) + 𝜔 Θ (𝑥 , 𝑥 , 𝒛)

(45)

Simple example: Learning Discriminative functions

Testing phase:

Training phase: infer 𝜔 given a set of training images

𝒙

^𝑡

, 𝒛

^𝑡

→ 𝜔

𝒛, 𝜔 → 𝒙

where 𝑡 denotes all training images (here around 50 images)

→

𝒛

^𝑡

𝒛

^𝑡

𝒙

^𝑡

𝑡 = 1 𝑡 = 2

𝒛 𝒛 𝒙

^∗

(46)

A simple procedure: Learning Discriminative functions

Questions:

- Is it the best and only way?

- Can we over-fit to training data?

1. Iterate 𝜔 = 0, … , 500

2. Compute 𝒙

^∗,𝑡

for all training images 𝒙

^𝑡

, 𝒛

^𝑡

3. Compute average error 𝐸𝑟𝑟𝑜𝑟 =

¹

𝑇 𝑡

C(𝒙

^𝑡

, 𝒙

^∗,𝑡

)

with loss/cost function: C 𝒙, 𝒙′ =

_𝑖

|𝑥

_𝑖

− 𝑥

_𝑖^′

| (called Hamming Error) 4. Take 𝜔 with smallest 𝐸𝑟𝑟𝑜𝑟

Hamming error: number of misclassified pixels

𝜔

𝒙 𝒙

^∗

𝐸𝑟𝑟 𝑜𝑟

(47)

Big Picture: Learning

Probabilistic Learning (for generative, discriminative models):

1. Training: Fit distribution 𝑃(𝒙, 𝒛), or 𝑃(𝒙|𝒛) to set training images (use e.g. maximum likelihood learning)

2. Test: Make a decision according to some cost (loss) function Δ

(depending on the cost function one computes optimal solution or marginal, etc)

Loss-based Learning (for discriminative functions)

1. Training: Fit 𝑓(𝒙, 𝒛) given a certain cost (loss) function (see above) 2. Test: Compute optimal value 𝒙

^∗

of function wrt test image

These are only high-level comments, we dive into that in CV 2 !

(48)

Roadmap this lecture (chapter 14.4.3, 5.5 in book)

• Interactive Image Segmentation

• From Generative models to

• Discriminative models to

• Discriminative function

• Image Segmentation using GrabCut

• Semantic Segmentation

(49)

Are we done?

[GrabCut, Rother et al. Sigraph 2004]

(50)

GrabCut Segmentation

Image z and user input

Global optimal solution

How to prevent the trivial solution?

𝐸 𝒙 =

𝒊

Θ

_𝑖

(𝑥

_𝑖

) + 𝜔

𝑖,𝑗∈𝑁₄

𝑤

_𝑖𝑗

|𝑥

_𝑖

− 𝑥

_𝑗

|

So far we had both foreground and background brushes

Θ

_𝑖

(𝑥

_𝑖

= 0) = ∞; Θ

_𝑖

(𝑥

_𝑖

= 1) = 0

Hard constraint:

Θ

_𝑖

(𝑥

_𝑖

= 0) = 0; Θ

_𝑖

(𝑥

_𝑖

= 1) = ∞

(51)

What is a good segmentation?

Objects (fore- and background) are self-similar wrt appearance

Input Image

Option 1 Option 2 Option 3

𝐸_{𝑢𝑛𝑎𝑟𝑦} = 460000 𝐸_{𝑢𝑛𝑎𝑟𝑦} = 482000 𝐸_{𝑢𝑛𝑎𝑟𝑦} = 483000

foreground background foreground background foreground background

Θ^𝐹 Θ^𝐵

𝐸_{𝑢𝑛𝑎𝑟𝑦} 𝑥, Θ^𝐹, Θ^𝐵 = −𝑙𝑜𝑔𝑃 𝑧 𝑥, Θ^𝐹, Θ^𝐵 =

𝑖

− log 𝑃 𝑧_𝑖 Θ^𝐹, 𝑥_𝑖 = 1 𝑥_𝑖 − log 𝑃 𝑧_𝑖 Θ^𝐵, 𝑥_𝑖 = 0 (1 − 𝑥_𝑖)

Θ^𝐹 Θ^𝐵 Θ^𝐹 Θ^𝐵

(52)

Full GrabCut functional

Gibbs distribution with energy:

𝐸 𝒙, Θ

^𝐹

, Θ

^𝐵

=

_𝑖

− log 𝑃 𝑧

_𝑖

Θ

^𝐹

, 𝑥

_𝑖

= 1 𝑥

_𝑖

− log 𝑃 𝑧

_𝑖

Θ

^𝐵

, 𝑥

_𝑖

= 0 (1 − 𝑥

_𝑖

) +

𝑖𝑗

(−exp{−ß||𝑧

_𝑖

− 𝑧

_𝑗

||}) |𝑥

_𝑖

− 𝑥

_𝑗

|

Goal is to compute optimal solution (we could also marginalize over Θ):

𝑥

^∗

= 𝑎𝑟𝑔𝑚𝑎𝑥

_𝑥

( max

Θ^𝐹,Θ^𝐵

𝐸(𝑥, Θ

^𝐹

, Θ

^𝐵

))

• So far, Θ was determined from brush strokes (training data)

• Now we estimate Θ from the segmentation 𝒙

𝑃 𝑧

_𝑖

𝑥

_𝑖

= 0, Θ

^𝐵

=

_𝑘=1^𝐾

𝜋

_𝑘^𝐵

𝑁

_𝑘^𝐵

(𝑧

_𝑖

, 𝜇

_𝑘^𝐵

, Σ

_𝑘^𝐵

)

𝑃 𝑧

_𝑖

𝑥

_𝑖

= 1, Θ

^𝐹

=

_𝑘=1^𝐾

𝜋

_𝑘^𝐹

𝑁

_𝑘^𝐹

(𝑧

_𝑖

, 𝜇

_𝑘^𝐹

, Σ

_𝑘^𝐹

)

(53)

Full GrabCut functional

Background

Foreground

G R

Output GMMs θ

^F

,θ

^B

Problem: Joint optimization of x,θ

^F

,θ

^B

is NP-hard

Image z

and user input Output xϵ {0,1}

Goal is to compute optimal solution:

𝑥

^∗

= 𝑎𝑟𝑔𝑚𝑎𝑥

_𝑥

( max

Θ^𝐹,Θ^𝐵

𝐸(𝑥, Θ

^𝐹

, Θ

^𝐵

))

Comment: Using histograms for Color models one can transform the problem to

a higher-order Random Field Model which can be solved (sometimes) globally

optimal for all unknowns: segmentation 𝒙 and Θ with Dual-Decomposition,

[Vicente, Kolmogorov, Rother, ICCV ‘09] (see CV 2)

(54)

GrabCut - optimization

GMM fitting to the current segmentation

Graph cut to infer segmentation min 𝐸(𝑥, 𝜃

^𝐹

, 𝜃

^𝐵

) x

𝜃

^𝐹

, 𝜃

^𝐵

min 𝐸(𝑥, 𝜃

^𝐹

, 𝜃

^𝐵

)

Image 𝒛 and user input

Initial segmentation 𝒙

Initial segmentation

(55)

GrabCut - Optimization

1 2 3 4

Energy after each Iteration Result

0

(56)

GrabCut - Optimization

Background

Foreground &

Background

G

R

Background

Foreground

G R

At initialization In the end

(57)

Comparison

input image

(58)

Roadmap this lecture (chapter 14.4.3, 5.5 in book)

• Interactive Image Segmentation

• From Generative models to

• Discriminative models to

• Discriminative function

• Image Segmentation using GrabCut

• Semantic Segmentation

(59)

Semantic Segmentation

The desired output

Label each pixel with one out of 21 classes

[TextonBoost; Shotton et al, ‘06]

(60)

Failure cases

(61)

TextonBoost: How it is done

(color model) (location prior)

(class)

(edge aware smoothess prior)

Define Energy:

[TextonBoost; Shotton et al, ‘06]

Location Prior:

grass

As in GrabCut

𝜃_𝑖𝑗 𝑥_𝑖, 𝑥_𝑗 = 𝑤_𝑖𝑗|𝑥_𝑖 − 𝑥_𝑗|

As in GrabCut, each object has an associate GMM

sky

class information:

Each pixel gets a distribution over 21-classes:

𝜃

_𝑖

𝑥

_𝑖

= 𝑐, 𝒛 = 𝑃(𝑥

_𝑖

= 𝑐|𝒛) - Using boosting – explained next lecture

- Using Random Forest – explained next 𝑥

_𝑖

Algorithms and Applications:

Computer Vision I -