Computer Vision I -
Algorithms and Applications:
Multi-View 3D reconstruction
Carsten Rother
09/12/2013
Roadmap this lecture
β’ Two-view reconstruction
β’ From projective to metric space (e.g. self-calibration)
β’ Multi-view reconstruction
β’ Iterative projective cameras
β’ Closed form: affine cameras
β’ Closed form: reference plane Next lecture:
β’ dense labeling problems in computer vision:
β’ stereo matching
β’ ICP, KinectFusion
3D reconstruction - Definitions
β’ Sparse Structure from Motion (SfM)
β’ SLAM in robotics: Simultaneous Localization and Mapping:
βPlace a robot in an unknown location and in an unknown environment and have the robot incrementally build a map of this environment while simultaneously using this
map to compute its locationβ
β’ Dense Multi-view reconstruction
Roadmap this lecture
β’ Two-view reconstruction
β’ From projective to metric space (e.g. self-calibration)
β’ Multi-view reconstruction
β’ Iterative projective cameras
β’ Closed form: affine cameras
β’ Closed form: reference plane Next lecture:
β’ dense labeling problems in computer vision:
β’ stereo matching
β’ ICP, KinectFusion
First: Two-View reconstruction
Epipolar Geometry (Reminder)
Fundamental matrix πΉ: π 2 π πΉπ 1 = 0
2-view reconstruction
β’ We have π₯β² π πΉπ₯ = 0
β’ Can we get π, πβ such that:
π₯ = ππ ; π₯β² = π β² π and π₯ β²π πΉπ₯ = 0 for all 3D points π
β’ Derivation (blackboard) see HZ page 256 π = πΌ 3Γ3 0 ]; π β² = π β² Γ πΉ πβ²]
3 Γ 4
3 Γ 4
Derivation
T
Derivation
Triangulation - algebraic
β’ Input: π₯, π₯β, π, πβ
β’ Output: πβπ
β’ Triangulation is also called intersection
β’ Simple algebraic solution:
1) ππ₯ = π π and πβ²π₯β² = πβ² π
2) Eliminate π, πβ² by taking ratios. This gives 4 linear independent equations for 3 unknowns: π = (π
1, π
2, π
3, π
4) where π = 1.
An example ratio is:
π₯π₯12
=
ππ1π1+π2π2+π3π3+π4π45π1+π6π2+π7π3+π8π4
3) This gives (as usual) a least square optimization problem:
π΄ π = 0 with π = 1 where π΄ is of size 4 Γ 4.
This can be solved in closed-form using SVD.
Triangulation - geometric
Minize re-projection error with Fixed fundamental matrix.
π₯, π₯ ^ ^
β²= ππππππ π π₯, π₯ ^
2+ π π₯
β², π₯β² ^
2subject to π₯ ^
β²πΉπ₯ = 0 ^
π₯, π₯^β²
^
Triangulation - geometric
β’ Solution can be expressed as a 6-degree polynomial in π‘
β’ This has up to 6 solutions and can computed (roots of polynomial) Minimize re-projection error with fixed fundamental matrix.
π₯, π₯ ^ ^
β²= ππππππ π π₯, π₯ ^
2+ π π₯
β², π₯β² ^
2subject to π₯ ^
β²πΉπ₯ = 0 ^
π₯, π₯^ ^β²
Triangulation - uncertainty
Large baseline Smaller area
smaller baseline Larger area
Very small baseline
Very larger area
Roadmap this lecture
β’ Two-view reconstruction
β’ From projective to metric space (e.g. self-calibration)
β’ Multi-view reconstruction
β’ Iterative projective cameras
β’ Closed form: affine cameras
β’ Closed form: reference plane Next lecture:
β’ dense labeling problems in computer vision:
β’ stereo matching
β’ ICP, KinectFusion
3D reconstruction
β’ Given: m cameras and n static 3D points
β’ Formally: π₯ ππ = π π π π for π = 1 β¦ π; π = 1 β¦ π
β’ Important in practice we do not have all points visible in all views, i.e. Number π₯ ππ β€ ππ
β’ Goal: find all π π βs and π π βs
Example: βVisibilityβ matrix
Iterative multi-view reconstruction β Method 1
Three views of an un-calibrated or calibrated camera
Iterative multi-view reconstruction β Method 1
Three views of an un-calibrated or calibrated camera
Similiar problem as intersection (and same as camera calibration):
ππ₯ = π π and π
β²π₯
β²= ππβ²
with 6 points you get 12 constraints
and we have 11 unknowns.
Iterative multi-view reconstruction β Method 1
Three views of an un-calibrated or calibrated camera
You can iterate between: intersection and re-sectioning to get all points and cameras reconstructed (in projective or metric space)
Similiar problem as intersection (and same as camera calibration):
ππ₯ = π π and π
β²π₯
β²= ππβ²
with 6 points you get 12 constraints
and we have 11 unknowns.
Iterative multi-view reconstruction β Method 2
Three views of an un-calibrated or calibrated camera
Reconstruct Points and
Camera 1 and 2
Iterative multi-view reconstruction β Method 2
Three views of an un-calibrated or calibrated camera
Reconstruct Points and Camera 2 and 3
Zipp the 2 reconstructions together.
1. They share a camera and 7+ points (needed for πΉ -matrix).
2. Get π» 4 Γ 4 from π
1β7= π»πβ²
1β7(here points π1β7β² are in second reconstruction and π1β7in first reconstruction)
3. Zipp them together:
π
3= πβ²
3π»
β1; π
π= π»πβ²
π(Here π
3β²is camera in
second reconstruction)
Note:
The full pipeline
[See page 453 HZ]
Comment: Between 3 and 4 views there exists Trifocal and
Quadrifocal tensor (as Fundamental matrix for 2 views)
β not discussed in this lecture
Bundle adjustment
β’ Global refinement of jointly structure (points) and cameras
β’ Minimize geometric error:
here πΌ
ππis 1 if π
πvisible in view π
π(otherwise 0)
β’ Non-linear optimization with e.g. Levenberg-Marquard
ππππππ
{πβ²π ,πβ²π }π π
πΌ
πππ(π
ππ
π, π₯
ππ)
Example
Main Problem of iterative methods is Drift
Solution: 1) look for βLoop closureβ if possible
2) global methods (next)
Roadmap this lecture
β’ Two-view reconstruction
β’ From projective to metric space (e.g. self-calibration)
β’ Multi-view reconstruction
β’ Iterative projective cameras
β’ Closed form: affine cameras
β’ Closed form: reference plane Next lecture:
β’ dense labeling problems in computer vision:
β’ stereo matching
β’ ICP, KinectFusion
Reminder: affine cameras (from lecture 4)
β’ Affine camera has 8 DoF:
π₯ π¦ 1 =
π π π π π π π β 0 0 0 1
π π π 1
β’ In short: π₯ = ππ + π‘ ~ ~
2 Γ 3 2 Γ 1 2 Γ 1
Reminder: affine cameras (from lecture 4)
(very large focal length) (normal focal length)
βClose to parallel projectionβ
Parallel 3D lines map to parallel 2D lines
(since points stay at infinity)
Multi-View Reconstruction for affine cameras
(derivation on blackboard)
Multi-View Reconstruction for affine cameras
(derivation on blackboard)
π β ππ = 0
π = ππ
Comments / Extensions
β’ Main restriction is that all points have to be visible in all views.
(can be used for a subset of views and then zipping subviews together)
β’ Extensions to missing data have been done (see HZ chapter 18)
β’ Extensions to projective cameras have been done (ch. 18.4)
β’ Extensions to non-rigidly moving scenes (ch. 18.3)
Roadmap this lecture
β’ Two-view reconstruction
β’ From projective to metric space (e.g. self-calibration)
β’ Multi-view reconstruction
β’ Iterative projective cameras
β’ Closed form: affine cameras
β’ Closed form: reference plane Next lecture:
β’ dense labeling problems in computer vision:
β’ stereo matching
β’ ICP, KinectFusion
Direct reference plane approach (DRP)
β’ π»
β= πΎπ is called infinity homography since it is the mapping from plane at infinity to image:
π₯ = πΎπ (πΌ| β πΆ) π₯ π¦ π§ 0
=πΎπ π₯ π¦
π§ = π»
βπ₯ π¦ π§
~
β’ Idea: simply define an arbitrary plane as π
β= 0,0,0,1
π(this can be done in projective space)
Direct reference plane approach (DRP)
Derivation on blackboard
[Rother PhD Thesis 2003]
Result
How to get infinite Homographies
β’ Real Plane in the scene:
β’ Fixed / known πΎ and π , e.g. translating camera with fixed camera intrinsics
β’ Orthogonal scene directions and a square pixel camera
see above we get out: πΎ, π (up to a small ambiguity)
Result: University Stockholm
(Show video)
Half-way slide
3 Minutes break
Roadmap this lecture
β’ Two-view reconstruction
β’ From projective to metric space (e.g. self-calibration)
β’ Multi-view reconstruction
β’ Iterative projective cameras
β’ Closed form: affine cameras
β’ Closed form: reference plane Next lecture:
β’ dense labeling problems in computer vision:
β’ stereo matching
β’ ICP, KinectFusion
Scale ambiguity
Is the pumpkin 5 meters or 30 cm high?
Scale ambiguity
Structure from Motion Ambiguity
β’ We can always write:
π₯ = π π
π π = 1
π π (ππ)
β’ It is impossible to recover the absolute scale of the scene!
Projective ambiguity
We can write (most general): π₯
ππ= π
ππ
π= π
ππ
β1ππ
π= π
πβ²π
πβ²β’ Q has 15 DoF (projective ambiguity)
β’ If we do not have any additional information about the cameras or points then we cannot recover π.
β’ Possible information (we will see details later)
β’ known calibration matrix
β’ calibration matrix is same for all cameras
β’ external constraints: given 3D points
Projective ambiguity
This is a βprotectivelyβ correct reconstruction
β¦ but not a nice looking one
3D points map to image points
Affine ambiguity
We can write (most general): π₯
ππ= π
ππ
π= π
ππ
β1ππ
π= π
πβ²π
πβ²β’ π has now 12 DoF (affine ambiguity)
β’ With affine cameras we get only affine ambiguity
β’ π leaves the plane at infinity π
β= 0,0,0,1
πin place:
any point on π
βmoves like: π π, π, π, 0
π= π
β², π
β², π
β², 0
Therefore parallel 3D lines stay parallel for any π
From Projective to Affine
The red directions point to places where the projection of a vanishing point lies.
π
1π
2π
3Step:
Take points π
1β3= π₯
1β3, π¦
1β3, π§
1β3, 1
πand move to a point at infinity:
π₯
1β3β², π¦
1β3β², π§
1β3β², 0
π.
Same as having plane at infinity in its
canonical position π
β= 0,0,0,1
πAffine ambiguity
3D Points at infinity stay at infinity
Similarity Ambiguity (Metric space)
β’ π has now 7 DoF (similarity ambiguity)
β’ π preserves angles, ratios of lengths, etc.
β’ For visualization purpose this ambiguity is sufficient.
We donβt need to know what 1m, 1cm, 1mm, etc. means
β’ Note, if we donβt care about the choice of Q we can set for instance the camera center of first camera to 0.
We can write (most general): π₯
ππ= π
ππ
π= π
ππ
β1ππ
π= π
πβ²π
πβ²From Projective/Affine to Metric
Essentially, we need the true camera calibration matrices πΎ (see details later)
Known πΎ
Steps: (see HZ page 277)
One option: with known πΎ
operate with essential matrix
and get out R,C of all cameras ~
Similarity Ambiguity
How to βupgradeβ a reconstruction
β’ Camera is calibrated
β’ Calibration from external constraints
β’ Calibration from a mix of in- and external constraints
β’ Calibration from internal constraints, called self/auto calibration
Illustrating some ways to upgrade from Projective to Affine and Metric (see details in HZ page 270ff and chapter 19)
β’ Find plane at infinity and move in canonical position:
β’ One of the cameras is affine see HZ page 271)
β’ 3 non-collinear 3D vanishing points
β’ Translational motion (HZ page 268)
π₯ππ = ππππ = πππβ1πππ = ππβ²ππβ²
Projective to Affine β Three 3D vanishing points
Possible method
1. Identify three pairs of lines in the 3D reconstruction that have to be parallel, respectively.
2. Compute the three 3D intersection points π
1β33. Move π
1β3somewhere on the plane at infinity
π
β= 0,0,0,1
π. Define the equation:
πβ²
1β3= π π
1β3where
πβ²
1β3= 1,0,0,0
π, 0,1,0,0
π, 0,0,1,0
π4. Compute an π using SVD
5. Update all points and cameras π
π= π
β1π
π; π
π= ππ
πThe red directions point to places where the projection of a vanishing point lies.
Goal: Find plane at infinity
and move to π
β= 0,0,0,1
πProjective to Metric β Direct Method
Given: five known 3D points Compute π:
1) ππ
π= π
πβ²(each 3D point gives 3 linear independent equations) 2) 5 points give 15 equations, enough to compute π using SVD Upgrade cameras and points:
π
πβ²= π
ππ
β1and π
πβ²= ππ
πβ’ For a camera π = πΎ [πΌ | 0] the ray outwards is:
π₯ = π π hence π = πΎ
β1π₯
β’ The angle Ξ is computed as the normalized rays π
1, π
2:
β’ We define the matrix: π = πΎ
βππΎ
β1β’ Comment: (πΎ
β1)
π= (πΎ
π)
β1=: πΎ
βπBut without external knowledge?
cos Ξ = π
1ππ
2π
1ππ
1π
2ππ
2= πΎ
β1π₯
1 ππΎ
β1π₯
2β πΎ
β1π₯
1 ππΎ
β1π₯
1β πΎ
β1π₯
2 ππΎ
β1π₯
2= π₯
1ππ π₯
2π₯
1πππ₯
1π₯
2πππ₯
2But without external knowledge?
cos Ξ = π₯
1ππ π₯
2π₯
1πππ₯
1π₯
2πππ₯
2β’ We have:
β’ If we were to know π then we can compute angle Ξ
β’ πΎ can be derived from π = πΎ
βππΎ
β1using Cholesky decomposition (see HZ page 582)
β’ Comment, if Ξ = 90
πthen we have π₯
1ππ π₯
2= 0
β’ How do we get π ?
π is the Image of the absolute Conic (IAC)
β’ π is called the image of the absolute conic Ξ©
β= πΌ
3Γ3on the plane at infinity π
β= 0,0,0,1
πβ’ Remember a conic πΆ is defined as: π₯
ππΆ π₯ = 0 . Here it is: π₯, π¦, 1 Ξ©
βπ₯, π¦, 1
π= 0
That is: π₯
2+ π¦
2= β1
an imaginary circle with radius π
the points (π, 0) and (0, π) lie on the conic.
image
π»
βπ
π is the Image of the absolute Conic (IAC)
β’ Proof.
1. The homography π»
β= πΎπ is the mapping from the pane at infinity to the image plane.
Since:
π₯ = πΎπ πΌ β πΆ] π₯, π¦, π§, 0
πis π₯ = πΎπ π₯, π¦, π§
π2. The conic C = Ξ©
β= I
3Γ3maps from the plane at infinity to π
βto the image as:
π = π»
ββTCπ»
ββ1= πΎπ
βππΌ πΎπ
β1= πΎ
βππ
βππ
β1πΎ
β1= πΎ
βππΎ
β1β’ Note, π depends on πΎ only and not on π , πΆ.
Hence it plays a central role in auto-calibration.
~
~
image
π»
βπ
β’ π is called the image of the absolute conic Ξ©
β= πΌ
3Γ3on the plane at infinity π
β= 0,0,0,1
πDegrees of Freedom of π
β’ We have:
π = (πΎ
β1)
ππΎ
β1= π 0 0 π π 0 π π 1
π π π 0 π π 0 0 1
=
π
2ππ ππ
ππ π
2+ π
2ππ + ππ ππ ππ + ππ π
2+ π
2+ 1
=
π
1π
2π
3π
2π
4π
5π
3π
5π
6π² =
π π π
π₯0 ππ π
π¦0 0 1
then π²
β1= π π 0 π π π 0 0 1
where π, π, π, π, π are some values that depend on: π, π, π , π
π₯, π
π¦β’ Then it is:
β’ This means that π has 5 DoF (scale is not unique)
How to compute the IAC?
β’ External constraints:
orthogonal directions, etc.
β’ Internal constraints:
β’ Square pixels (i.e. π = 1, π = 0 in πΎ)
β’ Camera matrix is the same in two or more views (e.g. video sequence with no zooming)
β’ If only internal constraints are used then it is called auto/self-calibration.
π² =
π π ππ₯ 0 ππ ππ¦
0 0 1
camera matrix
How to compute the IAC?
We need 5 constraints on π to determine it uniquely
Example: internal + external constraints
β’ Square pixel cameras (i.e. π = 1, π = 0 in πΎ) gives two constraints: π
1= π
4and π
2= 0
β’ Then
π =π1 0 π3 0 π1 π5
π3 π5 π€6
with only 3 DoF
β’ Given 3 image points π£
1β3that point to orthogonal directions, respectively:
π£
1ππ π£
2= 0; π£
1ππ π£
3= 0; π£
2ππ π£
3= 0
β’ This gives an equation system π΄π = 0 with π΄ of size 3 Γ 4.
Hence π can be obtained with SVD
β’ πΎ can be derived from π using Cholesky decomposition
π£
1π£
2π£
3π² =
π π ππ₯ 0 ππ ππ¦
0 0 1
camera matrix
Example: internal constraints (practically important)
β’ Assume two cameras with: π = 0, and π, π
π₯, π
π¦known
β’ The only unknowns are the two focal lengths: π
0, π
1β’ Then you get the so-called Kruppa equations:
π’
1ππ
0β1π’
1π
02π£
0ππ
1β1π£
0= π’
0ππ
0β1π’
1π
0π
1π£
0ππ
1β1π£
1= π’
0ππ
0β1π’
0π
12π£
1ππ
1β1π£
1where SVD of πΉ = π’
0π’
1π
1π
00 0 0 π
10
0 0 0
π£
0ππ£
1ππ
0πand π
πβ1= (πΎ
πβππΎ
πβ1) = πΎ
ππΎ
ππ= diag(π
π2, π
π2, 1)
β’ This can be solved in closed form (see next slide)
π² =
π π ππ₯ 0 ππ ππ¦
0 0 1
camera matrix
β1
Sketch of the solution for π 0 , π 1
Example: internal constraints (sketch)
β’ Assume πΎ is constant over 3+ frames then πΎ can be computed
β’ We know (lecture 6) we can get πΎ, π , πΆ from π = πΎ π (πΌ
3Γ3| β πΆ)
β’ We have already π
1, π
2, π
3and it is π₯
π1= π
1π
π= π
1π
β1ππ
π= π
1β²π
πβ²π₯
π2= π
2π
π= π
2π
β1ππ
π= π
2β²π
πβ²π₯
π3= π
3π
π= π
3π
β1ππ
π= π
3β²π
πβ²β’ Try to find a π such that all π
1, π
2, π
3have the same πΎ but different π
1β3and πΆ
1β3β’ See details in chapter 19 HZ
~
~
~
How to βupgradeβ a reconstruction
β’ Camera is calibrated
β’ Calibration from external constraints (example: 5 known 3D points)
β’ Calibration from a mix of in- and external constraints
(example: single camera:
3 orthogonal vanishing points and a square pixel camera)
β’ Calibration from internal constraints, called self/auto calibration
(example: 3+ views with same πΎ;
or two cameras with unknown focal length)
Illustrating some ways to upgrade from Projective to Affine and Metric (see details in HZ page 270ff and chapter 19)
β’ Find plane at infinity and move in canonical position:
β’ One of the cameras is affine. See HZ page 271)
β’ 3 non-collinear 3D vanishing points
β’ Translational motion (HZ page 268)