• Keine Ergebnisse gefunden

Algorithms and Applications:

N/A
N/A
Protected

Academic year: 2022

Aktie "Algorithms and Applications:"

Copied!
63
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Computer Vision I -

Algorithms and Applications:

Semantic Segmentation

Carsten Rother

(2)

Roadmap this lecture (chapter 14.4.3, 5.5 in book)

β€’ Interactive Image Segmentation

β€’ From Generative models to

β€’ Discriminative models to

β€’ Discriminative function

β€’ Image Segmentation using GrabCut

β€’ Semantic Segmentation

(3)

Roadmap this lecture (chapter 14.4.3, 5.5 in book)

β€’ Interactive Image Segmentation

β€’ From Generative models to

β€’ Discriminative models to

β€’ Discriminative function

β€’ Image Segmentation using GrabCut

β€’ Semantic Segmentation

(4)

Probabilities - Reminder

β€’ Discrete probability distribution: 𝑃(π‘₯) satisfies

π‘₯ 𝑃(π‘₯) = 1 where π‘₯ ∈ {0, … , 𝐾}

β€’ Joint distribution of two variables: 𝑃(π‘₯, 𝑧)

β€’ Conditional distribution: 𝑃(π‘₯|𝑧)

β€’ Sum rule: 𝑃 𝑧 = π‘₯ 𝑃(π‘₯, 𝑧)

β€’ Product rule: 𝑃 π‘₯, 𝑧 = 𝑃 𝑧 π‘₯ 𝑃(π‘₯)

β€’ Bayes’ rule: 𝑃 π‘₯|𝑧 = 𝑃 𝑧 π‘₯ 𝑃 π‘₯

𝑃(𝑧)

(5)

A Machine Learning View on Models

Modelling a problem:

β€’ The data is 𝒛 and the desired output 𝒙

We can identify three different approaches:

[see details in Bishop, page 42ff]:

β€’ Generative (probabilistic) models: 𝑃(𝒙, 𝒛)

β€’ Discriminative (probabilistic) models: 𝑃(𝒙|𝒛)

β€’ Discriminative functions: 𝑓(𝒙, 𝒛)

(6)

Generative Model

Models explicitly (or implicitly) the distribution of the input 𝒛 and output 𝒙

Joint Probablity 𝑃(𝒙, 𝒛) = 𝑃(𝒛|𝒙) 𝑃(𝒙)

Comment:

1. The joint distribution does not necessarily have to be decomposed into likelihood and prior, but in practice it (nearly) always is

2. Generative Models are used successfully when input 𝒛 and output 𝒙 are very related, e.g. image denoising.

Pros: 1. Possible to sample both: 𝒙 and 𝒛

2. Can be quite easily used for many applications (since prior and likelihood are modeled separately)

3. In some applications, e.g. biology, people want to model

likelihood and Prior explicitly, since the want to understand the model as much possible

4. Probability can be used in bigger systems

Cons: 1. might not always be possible to write down the full distribution (involves a distribution over images 𝒛).

likelihood prior

(7)

Generative Model – Example De-noising

Joint Probablity 𝑃 𝒛, 𝒙 = 𝑃 𝒛|𝒙 𝑃(𝒙)

likelihood prior

Pixel-wise likelihood: 𝑃 𝒛|𝒙 =

π’Š

𝑡(π‘₯

𝑖

; 𝑧

𝑖

, 𝜎) ~

π’Š

𝒆𝒙𝒑{

1

2𝜎2

𝑧

𝑖

βˆ’ π‘₯

𝑖 2

}

Data 𝒛

(pixel independent Gaussian noise) Label 𝒙

𝑧

𝑖

𝑡(π‘₯

𝑖

; 𝑧

𝑖

, 𝜎)

(sketched)

(8)

Generative Model – Example De-noising

Prior: 𝑃 𝒙 =

𝟏

𝒇

𝒆𝒙𝒑{βˆ’

π’Šπ’‹βˆˆπ‘΅πŸ’

π‘₯

𝑖

βˆ’ π‘₯

𝑗

}

𝒙

π’Š

βˆ’ 𝒙

𝒋

β€œsketched”

Robust Prior:

𝑃 𝒙 = 𝟏

𝒇 𝒆𝒙𝒑{βˆ’

π’Šπ’‹βˆˆπ‘΅πŸ’

π‘šπ‘–π‘›(βˆ’ π‘₯

𝑖

βˆ’ π‘₯

𝑗

, 𝜏}

Joint Probablity 𝑃 𝒛, 𝒙 = 𝑃 𝒛|𝒙 𝑃(𝒙)

likelihood prior

Pixel-wise likelihood: 𝑃 𝒛|𝒙 =

π’Š

𝑡(π‘₯

𝑖

; 𝑧

𝑖

, 𝜎) ~

π’Š

𝒆𝒙𝒑{

1

2𝜎2

𝑧

𝑖

βˆ’ π‘₯

𝑖 2

}

𝑃 π‘₯

𝑖

, π‘₯

𝑗

Follows the statistic of gradients in natural images

(9)

Result of more advanced prior models

[Komodiakis et al. CVPR 2009]

(10)

Result of more advanced prior models

FoE

Leraned Prior on 5 Γ— 5 patch

(11)

Change application: in-painting

[Field of Expert, Roth et al IJCV 2008]

FoE

(12)

Change application: in-painting

Joint Probablity 𝑃 𝒛, 𝒙 = 𝑃 𝒛|𝒙 𝑃(𝒙)

likelihood prior

Pixel-wise likelihood: 𝑃 𝒛|𝒙 =

π’Š

𝑡(π‘₯

𝑖

; 𝑧

𝑖

, 𝜎) ~

π’Š

𝒆𝒙𝒑{

1

2𝜎2

𝑧

𝑖

βˆ’ π‘₯

𝑖 2

}

Data 𝒛 Label 𝒙

Pixel-wise likelihood: 𝑃 𝒛|𝒙 = 𝒄𝒐𝒏𝒔𝒕 (for red text)

Data 𝒛 Label 𝒙

𝑃 𝒛|𝒙 = 𝜹(π‘₯

𝑖

= 𝑧

𝑖

) (otherwise)

(13)

Generative Model for image segmentation

Goal

Given 𝒛; derive binary 𝒙:

Optmimal solution: 𝒙

βˆ—

= π‘Žπ‘Ÿπ‘”π‘šπ‘Žπ‘₯

π‘₯

𝑃(𝒙, 𝒛) for a fixed 𝒛

(user-specified pixels are not optimized for)

𝒛 = 𝑅, 𝐺, 𝐡

𝑛

𝒙 = 0,1

𝑛

Interactive Segmentation

Statistical model 𝑃(𝒙, 𝒛) for both images 𝒛 and data 𝒙

(we then later come to 𝑃(𝒙|𝒛) and 𝑓(𝒙, 𝒛))

(14)

Generative Model for image segmentation - likelihood

Joint Probablity 𝑃 𝒛, 𝒙 = 𝑃 𝒛|𝒙 𝑃(𝒙)

likelihood prior

The red brush strokes give training data for foreground pixels The blue brush strokes give training data for background pixels

Red

Gr ee n

User labelled pixels Gaussian Mixture Model Fit Red

Gr ee n

(15)

Gaussian Mixture Model (GMM)

β€’ Mixture Model: 𝑝 𝑧 = π‘˜=1 𝐾 𝑝 π‘˜ 𝑝 𝑧 π‘˜

β€’ β€œπ‘˜β€ is a latent variable we are not interested in

β€’ π‘˜ ∈ 1, … , 𝐾 represents the 𝐾 mixtures.

β€’ Each mixture π‘˜ is a 3D Gaussian distribution 𝑁 π‘˜ (𝑧; πœ‡ π‘˜ , Ξ£ π‘˜ ) where πœ‡ π‘˜ , is a 3d vector and Ξ£ k a 3 Γ— 3 matrix (positive-semidefinite) called covariance matrices:

𝑁 𝑧, πœ‡, Ξ£ = 1

2πœ‹

𝑝/2

Ξ£

12

exp{βˆ’ 1

2 𝑧 βˆ’ πœ‡

𝑇

Ξ£

βˆ’1

(𝑧 βˆ’ πœ‡)}

β€’ 𝑝 𝑧 =

π‘˜=1𝐾

πœ‹

π‘˜

𝑁 π‘˜ (𝑧; πœ‡ π‘˜ , Ξ£ π‘˜ )

Mixture coefficient

(16)

Gaussian Mixture Model (GMM)

β€’ GMM probability 𝑝 𝑧 =

π‘˜=1𝐾

πœ‹

π‘˜

𝑁 π‘˜ (𝑧; πœ‡ π‘˜ , Ξ£ π‘˜ )

β€’ Unknown parameters: Θ = (πœ‹ 1 , … , πœ‹ 𝐾 , πœ‡ 1 , … , πœ‡ 𝐾 , Ξ£ 1 , … , Ξ£ 𝐾 )

β€’ How to learn Θ given data 𝑧 :

β€’ Maximum Likelihood estimation using EM (see machine learning lecture ML 1)

β€’ Next: simpler version using to learn GMMs (close to k-means)

Example:

(17)

A simple procedure for GMM learning /fitting

Let us introduce an assignment variable for each data point (pixel) to which Gaussian it belongs to: π‘˜

1

, … π‘˜

𝑛

where π‘˜

𝑖

∈ {1, … 𝐾}

π‘˜π‘– = π‘Žπ‘Ÿπ‘”π‘šπ‘Žπ‘₯π‘˜π‘π‘˜(𝑧; πœ‡π‘˜, Ξ£π‘˜)

(18)

Extensions of K-means

β€’ Choose 𝐾 automatically

β€’ Go to probabilistic version using Expectation Maximization (EM).

Now π‘˜ 𝑖 are probabilistic assignments to all Gaussian (not only one)

β€’ Faster versions:

β€’ Fit GMM to all data points and then only change the mixture coefficients

β€’ Use Histograms instead of GMMs

(19)

Illustration EM

Soft assignment: 𝑝(π‘Ž

𝑖

)

[Bishop page 437]

(20)

Some comments on clustering

β€’ More in CV2

β€’ Clustering without spatial constraints:

(K-means, mean-shift, etc)

β€’ Clustering with spatial constraints:

(super-pixels, normalized cut, etc)

β€’ Gestalt Theory:

normalized cut

(sketched)

(21)

Joint Probability - Likelihood

Joint Probablity 𝑃 𝒛, 𝒙 = 𝑃 𝒛|𝒙 𝑃 𝒙

𝑃 𝒛|𝒙 =

π’Š

𝑃 𝑧 𝑖 π‘₯ 𝑖 =

𝑖

(

π‘˜=1 𝐾

πœ‹

π‘˜π‘₯𝑖

𝑁

π‘˜π‘₯𝑖

(𝑧

𝑖

, πœ‡

π‘˜π‘₯𝑖

, Ξ£

π‘˜π‘₯i

))

Θ = (πœ‹ 1 0 , … , πœ‹ 𝐾 0 , πœ‡ 1 0 , … , πœ‡ 𝐾 0 , Ξ£ 1 0 , … , Ξ£ 𝐾 0 , πœ‹ 1 1 , … , πœ‹ 𝐾 1 , πœ‡ 1 1 , … , πœ‡ 𝐾 1 , Ξ£ 1 1 , … , Ξ£ 𝐾 1 )

All parameters with superscript 0 belong to background and all with superscript 1 belong to foreground:

Likelihood:

New query

image 𝑧

𝑖

(22)

Joint Probability - Likelihood

Maximum likelihood estimation 𝑃 𝑧

𝑖

π‘₯

𝑖

= 1 𝑃 𝑧

𝑖

π‘₯

𝑖

= 0

𝒙

βˆ—

= π’‚π’“π’ˆπ’Žπ’‚π’™

𝒙

𝑃 𝒛|𝒙 =

π’Š

𝑃(𝒛

π’Š

|𝒙

π’Š

) New query

image 𝑧

𝑖

(23)

Joint Probability - Prior

Joint Probablity 𝑃 𝒛, 𝒙 = 𝑃 𝒛|𝒙 𝑃 𝒙

𝑃(𝒙) = 1

𝑓

π’Š,π’‹βˆˆπ‘΅πŸ’

Θ

𝑖,𝑗

(π‘₯

𝑖

, π‘₯

𝑗

)

(exp{-1}=0.36; exp{0}=1)

π‘₯

𝑖

π‘₯

𝑗

Θ

𝑖,𝑗

π‘₯

𝑖

, π‘₯

𝑗

= exp{βˆ’|π‘₯

𝑖

βˆ’ π‘₯

𝑗

|} called β€œIsing prior”

𝑓 =

𝒙 𝑖,π‘—βˆˆπ‘4

Θ

𝑖,𝑗

(π‘₯

𝑖

, π‘₯

𝑗

) Partition function, sum over all possible results 𝒙

(24)

Joint Probability – Prior (4x4 Grid example)

Best Solutions sorted by probability

Pure Prior model:

β€œSmoothness prior needs the likelihood”

Worst Solutions sorted by probability

π‘₯𝑗 π‘₯𝑖

𝑃(𝒙) = 1

𝑓

𝑖,π‘—βˆˆπ‘4

exp{βˆ’|π‘₯

𝑖

βˆ’ π‘₯

𝑗

|}

(25)

Joint Probability – Prior (4x4 Grid example)

Distribution Samples

2

16

configurations

P robab ili ty

Pure Prior model: 𝑃(𝒙) = 1

𝑓

π’Š,π’‹βˆˆπ‘΅πŸ’

exp{βˆ’|π‘₯

𝑖

βˆ’ π‘₯

𝑗

|}

π‘₯𝑗 π‘₯𝑖

(26)

Joint Probability – Result

Global optimum

𝒙

βˆ—

= π’‚π’“π’ˆπ’Žπ’‚π’™

𝒙

𝑃 𝒛, 𝒙

ML solution:

𝒙

βˆ—

= π’‚π’“π’ˆπ’Žπ’‚π’™

𝒙

𝑃 𝒛|𝒙

Joint Probablity

𝑃 𝒛, 𝒙 = 𝑃 𝒛|𝒙 𝑃 𝒙 =

𝑖

(

π‘˜=1 𝐾

πœ‹

π‘˜π‘–π‘₯𝑖

𝑁

π‘˜π‘₯𝑖

(𝑧

𝑖

, πœ‡

π‘˜π‘₯𝑖

, Ξ£

π‘˜π‘₯i

)) 1

𝑓

𝑖,π‘—βˆˆπ‘4

exp{βˆ’|π‘₯

𝑖

βˆ’ π‘₯

𝑗

|}

𝑃(π‘₯

𝑖

= 0) = 0; 𝑃(π‘₯

𝑖

= 1) = 1;

Hard constraint:

(27)

Sample from the model

𝒙 Samples: True image:

Most likely:

𝑃 𝒛, 𝒙 = 𝑃 𝒛|𝒙 𝑃 𝒙

𝒛

𝒙 𝒛

𝒙 𝒛

(28)

Why does it still work?

β€’ We only evaluate 𝒙 for a given 𝒛

Global optimum other likely solutions will

look similar (sketched)

(29)

Best Prior Models for Images

Best prior models for images 𝑃(𝒙) can give such results:

Looks good on texture level but not on global level (e.g. scene layout) Sampled 𝒙

Simple model for segmentations:

Remind denoising: 𝑃 𝒛, 𝒙 = 𝑃 𝒛|𝒙 𝑃(𝒙)

Pixel-wise likelihood: 𝑃 𝒛|𝒙 =

π’Š

𝑡(π‘₯

𝑖

; 𝑧

𝑖

, 𝜎) ~

π’Š

𝒆𝒙𝒑{

1

2𝜎2

𝑧

𝑖

βˆ’ π‘₯

𝑖 2

}

[Field of Expert, Roth et al IJCV 2008]

(30)

Is it the best we can do?

4-connected segmentation

Zoom-in on image

zoom zoom

(31)

Reminder: Going to 8-connectivty

Larger connectivity can model true Euclidean length (also other metric possible)

Eucl.

Length of the paths:

4-con.

5.65 8 1

8-con.

6.28 6.28

5.08 6.75

[Boykov et al. β€˜03; β€˜05]

(32)

Going to 8-connectivty

4-connected Euclidean

8-connected Euclidean (MRF)

Zoom-in image

(33)

Modelling edges

β€’ How to put this into our model?

β€’ 𝑃(π‘₯) cannot depend on data!

β€’ 𝑃 𝑧 π‘₯ =

𝑖,π‘—βˆˆπ‘4

𝑃(𝑧

𝑖𝑗

|π‘₯

𝑖𝑗

) must be extended to model all possible

pairwise transitions from training data (e.g. with 6D Gaussian). But:

β€’ This is difficult for the user to label

β€’ Hard to get from other images

β€’ There is a much simpler way:

model only 𝑃 π‘₯ 𝑧

β€’ A transition is likely when two

neighboring pixels have different color.

(34)

Half way slide

3 Minutes break

(35)

Roadmap this lecture (chapter 14.4.3, 5.5 in book)

β€’ Interactive Image Segmentation

β€’ From Generative models to

β€’ Discriminative models to

β€’ Discriminative function

β€’ Image Segmentation using GrabCut

β€’ Semantic Segmentation

(36)

Discriminative model

𝑃 𝒙 𝒛 = 1

𝑓 exp βˆ’πΈ 𝒙, 𝒛 where 𝑓 =

𝒙

exp{βˆ’πΈ(𝒙, 𝒛)}

Models that model the Posterior directly are discriminative models.

In Computer Vision we use mostly the Gibbs distribution with an Energy 𝐸:

These are also called: β€œConditional random field”

Pros: 1. Simpler to write down than generative model (no need to model 𝒛)

and goes directly for the desired output 𝒙 2. More flexible since energy is arbitrary

3. Probability can be used in bigger systems

Cons: we can no longer sample images 𝒛

(37)

Discriminative model

β€’ Relation: Posterior and Joint: 𝑃 𝒙 𝒛 =

1

𝑃 𝒛

𝑃 𝒙, 𝒛

β€’ 𝑃(𝒙, 𝒛), 𝑃 𝒙 𝒛 and 𝐸(𝒙, 𝒛) all have the same optimal solution 𝒙

βˆ—

given z:

β€’ 𝒙

βˆ—

= π‘Žπ‘Ÿπ‘”π‘šπ‘Žπ‘₯

𝒙

𝑃 𝒙, 𝒛 given 𝒛

β€’ 𝒙

βˆ—

= π‘Žπ‘Ÿπ‘”π‘šπ‘Žπ‘₯

𝒙

𝑃 𝒙|𝒛 given 𝒛 (since 𝑃 𝒙 𝒛 =

1

𝑃 𝒛

𝑃 𝒙, 𝒛 )

β€’ 𝒙

βˆ—

= π‘Žπ‘Ÿπ‘”π‘šπ‘–π‘›

𝒙

𝐸 𝒙, 𝒛 (since βˆ’log 𝑃 𝒙 𝒛 = log 𝑓 + 𝐸(𝒙, 𝒛))

(38)

How does 𝐸 looks like for our segmentation example?

β€’ So that 𝑃(𝒙|𝒛), 𝑃(𝒙, 𝒛) have the same optimal solution 𝒙 βˆ— we need:

βˆ’log 𝑃 𝒙, 𝒛 = 𝐸 𝒙, 𝒛 + constant = !

𝑖

Θ

𝑖

π‘₯

𝑖

, 𝒛 +

𝑖,π‘—βˆˆπ‘4

Θ

𝑖𝑗

π‘₯

𝑖

, π‘₯

𝑗

, 𝒛

+constant

𝑃 𝒙, 𝒛 ~ 𝑃 𝒙 𝒛 = 1

𝑓 𝑒π‘₯𝑝 {βˆ’π‘¬(𝒙, 𝒛)}

!

~means up to scale

=

+ constant

(39)

Comment on Generative Models

One may also write the joint distribution 𝑃(𝒙, 𝒛) as a Gibbs distribution:

𝑃(𝒙, 𝒛) = 1

𝑓 exp βˆ’πΈ 𝒙, 𝒛 where 𝑓 =

𝒙,𝒛

exp{βˆ’πΈ(𝒙, 𝒛)}

If likelihood and prior are no longer modelled separately:

β€’ sampling 𝒙, 𝒛 gets very difficult

β€’ We can no longer learn prior and

likelihood separately (as in de-noising)

β€’ We train 𝑃 𝒙, 𝒛 =

1

𝑓

exp βˆ’πΈ 𝒙, 𝒛 , 𝑃(𝒙|𝒛) =

1

𝑓

exp βˆ’πΈ 𝒙, 𝒛 in a similar way.

(see CV 2 lectures)

𝒙 Samples:

𝒛

𝒙 𝒛

The advantage of a generative model over a discriminative model are gone But … it lost the meaning of a β€œgenerative” model, since we don’t have

a likelihood which says how the data was β€œgenerated”.

(40)

Adding a contrast term

𝐸 𝒙, 𝒛 =

𝑖

Θ

𝑖

π‘₯

𝑖

, 𝒛 +

𝑖,π‘—βˆˆπ‘4

Θ

𝑖𝑗

π‘₯

𝑖

, π‘₯

𝑗

, 𝒛

𝛽 = 2 𝑁

4

π‘–π‘—βˆˆπ‘4

𝑧

𝑖

βˆ’ 𝑧

𝑗 22

βˆ’1

Θ

𝑖𝑗

π‘₯

𝑖

, π‘₯

𝑗

, 𝒛 = π‘₯

𝑖

βˆ’ π‘₯

𝑗

(𝑒π‘₯𝑝 βˆ’π›½ 𝑧

𝑖

βˆ’ 𝑧

𝑗 2

)

𝑒π‘₯π‘βˆ’π›½π‘§π‘–βˆ’π‘§π‘—2

(41)

Roadmap this lecture (chapter 14.4.3, 5.5 in book)

β€’ Interactive Image Segmentation

β€’ From Generative models to

β€’ Discriminative models to

β€’ Discriminative function

β€’ Image Segmentation using GrabCut

β€’ Semantic Segmentation

(42)

Discriminative functions

𝐸 𝒙, 𝒛 : 𝑲 𝒏 β†’ 𝑹

Models that model the classification problem via a function

Examples:

- Energy

- support vector machines - nearest neighbour classifier

Pros: most direct approach to model the problem Cons: no probabilities

𝒙 βˆ— = π‘Žπ‘Ÿπ‘”π‘šπ‘–π‘› 𝒙 𝐸 𝒙, 𝒛

This is the most used approach in computer vision!

(43)

Recap

Modelling a problem:

β€’ The input data is 𝒛 and the desired output 𝒙

We can identify three different approaches:

[see details in Bishop, page 42ff]:

β€’ Generative (probabilistic) models: 𝑃(𝒙, 𝒛)

β€’ Discriminative (probabilistic) models: 𝑃(𝒙|𝒛)

β€’ Discriminative functions: 𝑓(𝒙, 𝒛)

The key difference are:

β€’ Probabilistic or none-probabilistic model

β€’ Generative models model also the data 𝒛

β€’ Differences in Training (see CV 2)

(44)

Simple example: Learning Discriminative functions

πœ” =0 πœ” =10

πœ” =200 πœ” =40

𝐸 𝒙, 𝒛 = Θ (π‘₯ , 𝒛) + πœ” Θ (π‘₯ , π‘₯ , 𝒛)

(45)

Simple example: Learning Discriminative functions

Testing phase:

Training phase: infer πœ” given a set of training images

𝒙

𝑑

, 𝒛

𝑑

β†’ πœ”

𝒛, πœ” β†’ 𝒙

where 𝑑 denotes all training images (here around 50 images)

β†’

𝒛

𝑑

𝒛

𝑑

𝒙

𝑑

𝑑 = 1 𝑑 = 2

𝒛 𝒛 𝒙

βˆ—

(46)

A simple procedure: Learning Discriminative functions

Questions:

- Is it the best and only way?

- Can we over-fit to training data?

1. Iterate πœ” = 0, … , 500

2. Compute 𝒙

βˆ—,𝑑

for all training images 𝒙

𝑑

, 𝒛

𝑑

3. Compute average error πΈπ‘Ÿπ‘Ÿπ‘œπ‘Ÿ =

1

𝑇 𝑑

C(𝒙

𝑑

, 𝒙

βˆ—,𝑑

)

with loss/cost function: C 𝒙, 𝒙′ =

𝑖

|π‘₯

𝑖

βˆ’ π‘₯

𝑖′

| (called Hamming Error) 4. Take πœ” with smallest πΈπ‘Ÿπ‘Ÿπ‘œπ‘Ÿ

Hamming error: number of misclassified pixels

πœ”

𝒙 𝒙

βˆ—

πΈπ‘Ÿπ‘Ÿ π‘œπ‘Ÿ

(47)

Big Picture: Learning

Probabilistic Learning (for generative, discriminative models):

1. Training: Fit distribution 𝑃(𝒙, 𝒛), or 𝑃(𝒙|𝒛) to set training images (use e.g. maximum likelihood learning)

2. Test: Make a decision according to some cost (loss) function Ξ”

(depending on the cost function one computes optimal solution or marginal, etc)

Loss-based Learning (for discriminative functions)

1. Training: Fit 𝑓(𝒙, 𝒛) given a certain cost (loss) function (see above) 2. Test: Compute optimal value 𝒙

βˆ—

of function wrt test image

These are only high-level comments, we dive into that in CV 2 !

(48)

Roadmap this lecture (chapter 14.4.3, 5.5 in book)

β€’ Interactive Image Segmentation

β€’ From Generative models to

β€’ Discriminative models to

β€’ Discriminative function

β€’ Image Segmentation using GrabCut

β€’ Semantic Segmentation

(49)

Are we done?

[GrabCut, Rother et al. Sigraph 2004]

(50)

GrabCut Segmentation

Image z and user input

Global optimal solution

How to prevent the trivial solution?

𝐸 𝒙 =

π’Š

Θ

𝑖

(π‘₯

𝑖

) + πœ”

𝑖,π‘—βˆˆπ‘4

𝑀

𝑖𝑗

|π‘₯

𝑖

βˆ’ π‘₯

𝑗

|

So far we had both foreground and background brushes

Θ

𝑖

(π‘₯

𝑖

= 0) = ∞; Θ

𝑖

(π‘₯

𝑖

= 1) = 0

Hard constraint:

Θ

𝑖

(π‘₯

𝑖

= 0) = 0; Θ

𝑖

(π‘₯

𝑖

= 1) = ∞

(51)

What is a good segmentation?

Objects (fore- and background) are self-similar wrt appearance

Input Image

Option 1 Option 2 Option 3

πΈπ‘’π‘›π‘Žπ‘Ÿπ‘¦ = 460000 πΈπ‘’π‘›π‘Žπ‘Ÿπ‘¦ = 482000 πΈπ‘’π‘›π‘Žπ‘Ÿπ‘¦ = 483000

foreground background foreground background foreground background

Θ𝐹 Θ𝐡

πΈπ‘’π‘›π‘Žπ‘Ÿπ‘¦ π‘₯, Θ𝐹, Θ𝐡 = βˆ’π‘™π‘œπ‘”π‘ƒ 𝑧 π‘₯, Θ𝐹, Θ𝐡 =

𝑖

βˆ’ log 𝑃 𝑧𝑖 Θ𝐹, π‘₯𝑖 = 1 π‘₯𝑖 βˆ’ log 𝑃 𝑧𝑖 Θ𝐡, π‘₯𝑖 = 0 (1 βˆ’ π‘₯𝑖)

Θ𝐹 Θ𝐡 Θ𝐹 Θ𝐡

(52)

Full GrabCut functional

Gibbs distribution with energy:

𝐸 𝒙, Θ

𝐹

, Θ

𝐡

=

𝑖

βˆ’ log 𝑃 𝑧

𝑖

Θ

𝐹

, π‘₯

𝑖

= 1 π‘₯

𝑖

βˆ’ log 𝑃 𝑧

𝑖

Θ

𝐡

, π‘₯

𝑖

= 0 (1 βˆ’ π‘₯

𝑖

) +

𝑖𝑗

(βˆ’exp{βˆ’ΓŸ||𝑧

𝑖

βˆ’ 𝑧

𝑗

||}) |π‘₯

𝑖

βˆ’ π‘₯

𝑗

|

Goal is to compute optimal solution (we could also marginalize over Θ):

π‘₯

βˆ—

= π‘Žπ‘Ÿπ‘”π‘šπ‘Žπ‘₯

π‘₯

( max

Θ𝐹,Θ𝐡

𝐸(π‘₯, Θ

𝐹

, Θ

𝐡

))

β€’ So far, Θ was determined from brush strokes (training data)

β€’ Now we estimate Θ from the segmentation 𝒙

𝑃 𝑧

𝑖

π‘₯

𝑖

= 0, Θ

𝐡

=

π‘˜=1𝐾

πœ‹

π‘˜π΅

𝑁

π‘˜π΅

(𝑧

𝑖

, πœ‡

π‘˜π΅

, Ξ£

π‘˜π΅

)

𝑃 𝑧

𝑖

π‘₯

𝑖

= 1, Θ

𝐹

=

π‘˜=1𝐾

πœ‹

π‘˜πΉ

𝑁

π‘˜πΉ

(𝑧

𝑖

, πœ‡

π‘˜πΉ

, Ξ£

π‘˜πΉ

)

(53)

Full GrabCut functional

Background

Foreground

G R

Output GMMs ΞΈ

F

,ΞΈ

B

Problem: Joint optimization of x,ΞΈ

F

,ΞΈ

B

is NP-hard

Image z

and user input Output xΟ΅ {0,1}

Goal is to compute optimal solution:

π‘₯

βˆ—

= π‘Žπ‘Ÿπ‘”π‘šπ‘Žπ‘₯

π‘₯

( max

Θ𝐹,Θ𝐡

𝐸(π‘₯, Θ

𝐹

, Θ

𝐡

))

Comment: Using histograms for Color models one can transform the problem to

a higher-order Random Field Model which can be solved (sometimes) globally

optimal for all unknowns: segmentation 𝒙 and Θ with Dual-Decomposition,

[Vicente, Kolmogorov, Rother, ICCV β€˜09] (see CV 2)

(54)

GrabCut - optimization

GMM fitting to the current segmentation

Graph cut to infer segmentation min 𝐸(π‘₯, πœƒ

𝐹

, πœƒ

𝐡

) x

πœƒ

𝐹

, πœƒ

𝐡

min 𝐸(π‘₯, πœƒ

𝐹

, πœƒ

𝐡

)

Image 𝒛 and user input

Initial segmentation 𝒙

Initial segmentation

(55)

GrabCut - Optimization

1 2 3 4

Energy after each Iteration Result

0

(56)

GrabCut - Optimization

Background

Foreground &

Background

G

R

Background

Foreground

G R

At initialization In the end

(57)

Comparison

input image

(58)

Roadmap this lecture (chapter 14.4.3, 5.5 in book)

β€’ Interactive Image Segmentation

β€’ From Generative models to

β€’ Discriminative models to

β€’ Discriminative function

β€’ Image Segmentation using GrabCut

β€’ Semantic Segmentation

(59)

Semantic Segmentation

The desired output

Label each pixel with one out of 21 classes

[TextonBoost; Shotton et al, β€˜06]

(60)

Failure cases

(61)

TextonBoost: How it is done

(color model) (location prior)

(class)

(edge aware smoothess prior)

Define Energy:

[TextonBoost; Shotton et al, β€˜06]

Location Prior:

grass

As in GrabCut

πœƒπ‘–π‘— π‘₯𝑖, π‘₯𝑗 = 𝑀𝑖𝑗|π‘₯𝑖 βˆ’ π‘₯𝑗|

As in GrabCut, each object has an associate GMM

sky

class information:

Each pixel gets a distribution over 21-classes:

πœƒ

𝑖

π‘₯

𝑖

= 𝑐, 𝒛 = 𝑃(π‘₯

𝑖

= 𝑐|𝒛) - Using boosting – explained next lecture

- Using Random Forest – explained next π‘₯

𝑖

∈ {1, … , 𝐾}

𝐸 π‘₯, Θ =

𝑖

πœƒ 𝑖 π‘₯ 𝑖 , 𝑧 𝑖 , Θ + πœƒ 𝑖 π‘₯ 𝑖 + πœƒ 𝑖 π‘₯ 𝑖 , 𝒛 +

𝑖,𝑗

πœƒ 𝑖,𝑗 (π‘₯ 𝑖 , π‘₯ 𝑗 )

(62)

TextonBoost: Energy

Class and location only

+ edges + color

model

(63)

Roadmap this lecture (chapter 14.4.3, 5.5 in book)

β€’ Interactive Image Segmentation

β€’ From Generative models to

β€’ Discriminative models to

β€’ Discriminative function

β€’ Image Segmentation using GrabCut

β€’ Semantic Segmentation

Referenzen

Γ„HNLICHE DOKUMENTE

We then looked at the important properties of the feature map: its ability to approximate the input space, the topological ordering that emerges, the matching of the input

β€’ Undirected graphical model (also called Markov Random Field).. β€’ Factor graphs (which we will

We want to classify the white pixel Feature: color of the green pixel Parameters: 2D offset vector. 1-dimensional: We want to classify the

β€’ Image categorization: Generative versus Discriminative Approach (finalize from last lecture).. β€’

● Reason about β€˜shape of expected outcomes’ (with probabilistic concepts).. ● How to formally describe simulations/working

In particular, we found problems and strategies related to content, algorithm, user choice, and feedback. We discuss corresponding im- plications for designing user

If no parameter is specified, S (SOURCE) is assumed. A Specifies an RPG II program that contains auto report specifications. SEU displays the RPG specification for each

The goals (i)-(iv) will then be achieved based on an estimation and inference method for the change-point problem in exponential families: the Simultaneous MUltiscale Change-