Computer Vision I -
Basics of Image Processing – Part 2
Carsten Rother
07/11/2014
Roadmap: Basics of Digital Image Processing
• What is an Image?
• Point operators (ch. 3.1)
• Filtering: (ch. 3.2, ch 3.3, ch. 3.4) – main focus
• Linear filtering
• Non-linear filtering
• Multi-scale image representation (ch. 3.5) (will be done in SS15 as part of Image Processing)
• Edges detection and linking (ch. 4.2)
• Line detection and vanishing point detection (ch. 4.3) (will be done in SS15 as part of Image Processing)
• Interest Point detection (ch. 4.1.1)
Reminder: Convolution (linear filter)
• Replace each pixel by a linear combination of its neighbours and itself.
• 2D convolution (discrete) 𝑔 = 𝑓 ∗ ℎ
𝑔 𝑥, 𝑦 = 𝑘,𝑙 𝑓 𝑥 − 𝑘, 𝑦 − 𝑙 ℎ 𝑘, 𝑙
𝑓 𝑥, 𝑦 ℎ 𝑥, 𝑦 g 𝑥, 𝑦
Centred at 0,0
“the image f is implicitly mirrored”
Reminder: Median Filter
Replace each pixel with the median in a neighbourhood:
5 6 5
4 20 5
4 6 5
5 6 5
4 5 5
4 6 5
• No strong smoothing effect since values are not averaged
•
median
Median filter: order the values and take the middle one
Reminder: Motivation – Bilateral Filter
Original + Gaussian noise Gaussian filtered Bilateral filtered
Edge over-smoothed Edge not over-smoothed
Reminder: Bilateral Filter – in pictures
Centre pixel
Gaussian Filter weights
Noisy input
Output (sketched)
Reminder: Bilateral Filter – in equations
Filters looks at: a) distance to surrounding pixels (as Gaussian) b) Intensity of surrounding pixels
Problem: computation is slow 𝑂 𝑁𝑤 ; approximations can be done in 𝑂(𝑁)
See a tutorial on: http://people.csail.mit.edu/sparis/bf_course/
Same as Gaussian filter Consider intensity
Linear combination
Bilateral filter is a non-linear filter
Image:
Linear Filter satisfy Additivity: 𝑻[𝑿 + 𝒀] = 𝑻[𝑿] + 𝑻[𝒀]
0 1 0
1) Operator 𝑻 is (linear) boxfilter:
compute filter output for central element of the image:
𝑇 𝑋 + 𝑌 =
23
𝑇 𝑋 + 𝑇 𝑌 =
23
𝟏 𝟑
1 𝟑
𝟏 𝟑
2) Operator 𝑻 is (non-linear) Bilteral filter with 𝑤(𝑖, 𝑗, 𝑘, 𝑙) = exp(− 𝑓 𝑖, 𝑗 − 𝑓 𝑘, 𝑙
2) Compute filter output for central element of the image:
𝑇 𝑋 + 𝑌 = (0 ∗ 0.04 + 2 ∗ 1 + 0 ∗ 0.04) / (0.04 + 1 + 0.04) = 1.92
𝑇 𝑋 + 𝑇 𝑌 = 2 * [ (0 ∗ 0.36 + 1 ∗ 1 + 0 ∗ 0.36) / (0.36 + 1 + 0.36) ]
Application: Bilateral Filter
Cartoonization
HDR compression
(Tone mapping)
Joint Bilateral Filter
Same as Gaussian Consider intensity
f is the input image – which is processed
f is a guidance image – where we look for pixel similarity
~ ~
~
Application: combine Flash and No-Flash
[Petschnigg et al. Siggraph ‘04]
input image 𝑓 Guidance image 𝑓
We don‘t care about absolute colors
~ Joint Bilateral Filter
~ ~
Reminder: Application: Cost Volume Filtering
Goal
Given z; derive binary x:
𝒛 = 𝑅, 𝐺, 𝐵
𝑛x = 0,1
𝑛Reminder from first Lecture: Interactive Segmentation
Model: Energy function 𝑬 𝒙 = 𝜃 𝑥 + 𝜃 (𝑥 , 𝑥 )
Reminder: Unary term
Optimum with unary terms only
Dark means likely background
Dark means likely foreground
𝜃 𝑖 (𝑥 𝑖 = 0) 𝜃 𝑖 (𝑥 𝑖 = 1)
New query
image 𝑧
𝑖Cost Volumne for Binary Segmenation
𝜃
𝑖(𝑥
𝑖= 0) 𝜃
𝑖(𝑥
𝑖= 1)
Image (x,y)
Label Space (her e 2)
For two labels, we can also look at the ratio Image 𝑓:
Application: Cost Volume Filtering
Results are very similar: This is an alternative to energy minimization !
Filtered cost volume
Energy minimization Cost Volume filtering
(Winner takes all Result)
Ratio Cost-volume is the Input Image 𝑓
Guidance Input Image 𝑓 (also shown user brush strokes)
~
[C. Rhemann, A. Hosni, M. Bleyer, C. Rother, and M. Gelautz, Fast Cost- Volume Filtering for Visual Correspondence and Beyond, CVPR 11]
Morphological operations
Two steps:
1. Perform convolution with a “structural element”:
binary mask (e.g. circle or square)
2. Threshold the continuous output 𝑓 to recover a binary image:
black is 1
white is 0
Other non-linear operations on binary images
Distance transform Binary
Image
Skeleton
Binary Input Image Connected components
Roadmap: Basics of Digital Image Processing
• What is an Image?
• Point operators (ch. 3.1)
• Filtering: (ch. 3.2, ch 3.3, ch. 3.4) – main focus
• Linear filtering
• Non-linear filtering
• Multi-scale image representation (ch. 3.5) (will be done in SS15 as part of Image Processing)
• Edges detection and linking (ch. 4.2)
• Line detection and vanishing point detection (ch. 4.3) (will be done in SS15 as part of Image Processing)
• Interest Point detection (ch. 4.1.1)
Goal: Find long edge chains
What do we want :
• Good detection: we want to find edges not noise
• Good localization: find true edge
• Single response: one per edge
(independent of edge sharpness)
• Long edge-chains
Idealized edge types
We focus
on this
What are edges ?
• correspond to fast changes in the image
• The magnitude of the derivative is large
Image of 2 step edges
Slizce through the image
Image of 2 ramp edges
Slizce through
the image
What are fast changes in the image?
Image
Scanline 250
Texture or many edges?
Edges defined
after smoothing
Edges and Derivatives
We will
look at
this first
Edge filters in 1D
We can implement this as a linear filter:
Forward differences:
Central differences: 1/2 -1 0 1
1/2 -1 1
Reminder: Seperable Filters
This is the centralized difference-operator
1 2 1
1 4
Edge Filter in 1D: Example
Based on 1st derivative
• Smooth with Gaussian – to filter out noise
• Calculate derivative
• Find its optima
Edge Filtering in 1D
Simplification:
(saves one operation)
Derivative of Gaussian
Edge Filtering in 2D
Edge Filter in 2D: Example
𝑥
𝑦
Edge Filter in 2D
𝑥 -derivatives with different Gaussian smoothing
What is a gradient
What is a gradient
What is a gradient
What is a Gradient
Example – Gradient magnitude image
Our goal was to get thin edge chains?
First smoothed with Gaussian First smoothed with
broad Gaussian
How to get edge chains
1. Compute robust Gradient Image: ( (𝐷
𝑥∗ 𝐺) ∗ 𝐼 , (𝐷
𝑦∗ 𝐺) ∗ 𝐼)
2. Find edge-points (“edgels”): non-maximum suppression 3. Link-up edge-points to get chains
4. Do hysteresis to clean-up chains
Rough Outline of a good edge detector (such as Canny)
edge-points
or edgel
Non-maximum surpression
1. Check if pixel is “local maximum” at gradient
direction (+/- 180deg) (for this interpolate values for p,r) 2. Accept edge-point if above a threshold
Wich pixel is an edge point ? (non-max surpression)
Edge point Not Edge point
Magnitude image Zoom on pixel-grid
Link up edge-points to get chains
Edge point Not Edge point
Clean up chains with Hysteresis
High start threshold
Low threshold along the chain
Keep a chain:
Final Result
Image Not much smoothing (fine scale)
Future Lecture: Segmentation
• So far we looked at “jumps” in gray-scale images?
• Humans perceive edges very differently (edges depend on semantic)
“average human drawing”
[from Martin, Fowlkes and Mail 2004]
Hard for computer vision methods
without semantic reasoning
Try to learn semantically meaningful image edges
Half-way break
3 minutes break
Roadmap: Basics of Digital Image Processing
• What is an Image?
• Point operators (ch. 3.1)
• Filtering: (ch. 3.2, ch 3.3, ch. 3.4) – main focus
• Linear filtering
• Non-linear filtering
• Multi-scale image representation (ch. 3.5) (will be done in SS15 as part of Image Processing)
• Edges detection and linking (ch. 4.2)
• Line detection and vanishing point detection (ch. 4.3) (will be done in SS15 as part of Image Processing)
• Interest Point detection (ch. 4.1.1)
What region should we try to match?
Look for a region that is unique, i.e. not ambiguous
We want to find a few regions where this image pair matches: Applications later
Goal: Interest Point Detection
• Goal: predict a few “interest points” in order to remove redundant data efficiently
Should be invariant against:
a. Geometric transformation – scaling, rotation, translation, affine transformation, projective transformation etc.
b. Color transformation – additive (lightning change),
Points versus Lines
„Apeture problem“
Lines are not as good as points
?
Harris Detector
Local measure of feature uniqueness:
Shifting the window in any direction: how does it change
Harris Detector
How similar is the image to itself?
Autocorrelation function:
is a small window around
is a convolution kernel and used to decrease the influence of pixels far from , e.g. the Gaussian
For simplicity we use 𝑤(𝑢, 𝑣) = 1
Δ𝑥
(𝑥, 𝑦)
Δ𝑦
Harris Detector
One is interested in properties of at each position Let us look at a linear approximation of
Taylor expansion around
+ 𝜖(Δ𝑥, Δ𝑦)
Gradient at (𝑢, 𝑣)
Harris Detector
Put it together:
with
Q is call the Structure Tensor
We compute this at any image location (𝑥, 𝑦)
Harris Detector
The autocorrelation function
Function c is (after approximation) a quadratic function in and
• Isolines are ellipses ( is symmetric and positive definite);
• Eigenvector 𝑥
1with (larger) Eigenvalue 𝜆
1is the direction of fastest change in function 𝑐
• Eigenvector 𝑥
2with (smaller) Eigenvalue 𝜆
2is direction of slowest change in function 𝑐
Δ𝑥
𝑥
2Harris Detector
Some examples – isolines for :
(a) Flat (b) Edges (c) Corners a. Homogenous regions: both -s are small
b. Edges: one is small the other one is large
c. Corners: both -s are large (this is what we are looking for!)
Harris Detector
(smaller eigenvalue) (larger eigenvalue)
𝜆
1𝜆
2Image