Computer Vision I -
Multi-View 3D reconstruction
Carsten Rother
24/01/2015
Roadmap this lecture
β’ Multi-view reconstruction β general scenario (calibrated, un-calibrated cameras)
β’ From projective to metric space: auto-calibration
β’ Multi-view reconstruction - special scenarios
β’ affine cameras
β’ reference plane
3D reconstruction β Problem definition
β’ Given image observations in π cameras of π static 3D points
β’ Formally: π₯ππ = ππππ for π = 1 β¦ π; π = 1 β¦ π
β’ Important: In practice we do not have all points visible in all views, i.e. the number of π₯ππ β€ ππ (this is captured by the βvisibilty matrixβ)
β’ Goal: find all ππβs and ππβs
Example: βVisibilityβ matrix
Names: 3D reconstruction
Sparse Structure from Motion (SfM)
In Robotics it is known as SLAM (Simultaneous Localization and Mapping):
βPlace a robot in an unknown location in an unknown environment and have the robot incrementally build a map of this environment while simultaneously using the map to compute the vehicle locationβ
2) Dense Multi-view reconstruction 1)
Example: Dense Reconstruction
Reconstruction Algorithm
Generic Outline (calibrated and un-calibrated cameras)
1) Compute robust πΉ/πΈ-matrix between each pair of neighboring views 2) Compute initial reconstruction of each pair of views
3) Compute an initial full 3D reconstruction
4) Bundle-Adjustment to minimize overall geometric error
5) If cameras are not calibrated then perform auto-calibration (also known as self-calibration)
[See page 453 HZ]
Reconstruct in step 2): π1, π2 ; (π2, π3); π3, π4 β¦
Step 2: Compute initial reconstruction of each pair of views
Input:
β’ Calibrated Cameras: πΈ-matrix, πΎ, πΎβ, 5+ matching points (π₯π, π₯β²π)
β’ Un-calibration Cameras: πΉ-matrices, 7+ matching points (π₯π, π₯β²π) Output: π, πβ, ππβ²π such that geometric error between πππ π‘π π₯π and
πβ²ππ π‘π π₯β²π is small 2-Step Method:
1. Derive π, πβ
2. Compute ππβ²π (called Triangulation)
Derive π, πβ: calibrated case
β’ We have seen that we can get: π , π(up to scale) from πΈ
β’ We have set in previous lecture the camera matrices to:
π~
~
π₯0 = πΎ0 πΌ 0 π and π₯1 = πΎ1π β1 πΌ βπ π~ We have done this already:
π πβ²
Derive π, πβ: un-calibrated case
β’ Derivation (blackboard) see HZ page 256 π = πΌ3Γ3 0 ]; πβ² = πβ² ΓπΉ πβ²]
Derivation
Derivation
Compute π π
β²π (Triangulation) β algebraic error
β’ Input: π₯, π₯β, π, πβ
β’ Output: ππβ²π
β’ Triangulation is also called intersection
β’ Simple algebraic solution:
1) ππ₯ = π π and πβ²π₯β² = πβ² π
2) Eliminate π by taking ratios. This gives 4 linear independent equations for 4 unknowns: π = (π1, π2, π3, π4) where π = 1.
An example ratio is: π₯1
π₯2 = π1π1+π2π2+π3π3+π4π4
π5π1+π6π2+π7π3+π8π4
3) This gives (as usual) a least square optimization problem:
π΄ π = 0 with π = 1 where π΄ is of size 4 Γ 4.
This can be solved in closed-form using SVD.
3x4 matrix
Compute π π
β²π (Triangulation) - geometric error
Minimize re-projection error with fixed fundamental matrix πΉ (Essential matrix πΈ can be done in a smart way)
π₯, π₯^ ^β² = ππππππ π π₯, π₯^ 2 + π π₯β², π₯β²^ 2 subject to π₯^β²πΉπ₯ = 0^
^ ^
^
Compute π π
β²π (Triangulation) - geometric error
β’ Solution can be expressed as a 6-degree polynomial in π‘
β’ This has up to 6 solutions and can be computed (roots of polynomial)
β’ If you now put these π₯, π₯β into the algebraic error computation, we will get the true 3D point π (1D null-space) since all βgeometrically correctβ.
Minimize re-projection error with fixed fundamental matrix.
π₯, π₯^ ^β² = ππππππ π π₯, π₯^ 2 + π π₯β², π₯β²^ 2 subject to π₯^β²πΉπ₯ = 0^
π₯, π₯^ ^β²
Triangulation - uncertainty
Large baseline
Smaller uncertainty area
Smaller baseline Larger uncertainty area
Very small baseline Very large uncertainty area
Reconstruction Algorithm
Generic Outline (calibrated and un-calibrated cameras)
1) Compute robust πΉ/πΈ-matrix between each pair of neighboring views 2) Compute initial reconstruction of each pair of views
3) Compute an initial full 3D reconstruction
4) Bundle-Adjustment to minimize overall geometric error
5) If cameras are not calibrated then perform auto-calibration (also known as self-calibration)
[See page 453 HZ]
Reconstruct in step 2): π1, π2 ; (π2, π3); π3, π4 β¦
Step 3: Compute initial reconstruction
Three views of an un-calibrated or calibrated camera:
Reconstruct Points and Camera 1 and 2
Reconstruct Points and Camera 2 and 3 (denote with a dash)
β’ Both reconstructions share: 5+ 3D points and one camera (here π2, π2β²).
We denote the second reconstruction with a dash
β’ Why are πi, ππβ² not the same?
In general we have the following ambiguity: π₯ππ = ππππ = πππβ1πππ = ππβ²ππβ²
β’ Our Goal: make ππ = ππβ² and π2 = π2β² such that
ππ, π1, π2 ππβ², π2β², π3β²
π₯ππ = ππππ
Step 3: Compute initial reconstruction
Three views of an un-calibrated or calibrated camera:
Method:
β’ Compute π such that π1β5 = ππ1β5β²
β’ This can be done from 5+ 3D points in usual least-square system
( π1β5 β ππ1β5β² ), since each point gives 3 equations and π has 15 DoF.
β’ Convert the second reconstruction into the first one:
π2,3 = π2,3β² πβ1; ππ = πππβ² (note: π₯ππ = ππππ = πππβ1πππ)
β’ In this way you can βzippβ all reconstructions into one in sequential order
Reconstruct Points and Camera 1 and 2
Reconstruct Points and Camera 2 and 3 (denote with a dash)
Reconstruction Algorithm
Generic Outline (calibrated and un-calibrated cameras)
1) Compute robust πΉ/πΈ-matrix between each pair of neighboring views 2) Compute initial reconstruction of each pair of views
3) Compute an initial full 3D reconstruction
4) Bundle-Adjustment to minimize overall geometric error
5) If cameras are not calibrated then perform auto-calibration (also known as self-calibration)
Bundle adjustment
β’ Global refinement of jointly structure (points) and cameras
β’ Minimize geometric error:
here πΌππ is 1 if ππ visible in view ππ (otherwise 0)
β’ Non-linear optimization with e.g. Levenberg-Marquard ππππππ{ππ,ππ}
π π
πΌππ π(ππππ, π₯ππ)
Reconstruction Algorithm
Generic Outline (calibrated and un-calibrated cameras)
1) Compute robust πΉ/πΈ-matrix between each pair of neighboring views 2) Compute initial reconstruction of each pair of views
3) Compute an initial full 3D reconstruction
4) Bundle-Adjustment to minimize overall geometric error
5) If cameras are not calibrated then perform auto-calibration (also known as self-calibration)
Auto-calibration
β’ All is correct: π₯ππ = ππππ for π = 1 β¦ π; π = 1 β¦ π
β’ But does the reconstruction look already nice?
Roadmap this lecture
β’ Multi-view reconstruction β general scenario (calibrated, un-calibrated cameras)
β’ From projective to metric space: auto-calibration
β’ Multi-view reconstruction - special scenarios
β’ affine cameras
β’ reference plane
Scale ambiguity
Is the pumpkin 5m or 30cm tall?
Structure from Motion Ambiguity
β’ We can always write:
π₯ = π π
π π = 1
π π (ππ)
β’ It is impossible to recover the absolute scale of the scene where π β π
Scale ambiguity
Projective ambiguity
We can write (most general): π₯ππ = ππππ = πππβ1πππ = ππβ²ππβ²
β’ π has 15 DoF (projective ambiguity)
β’ If we do not have any additional information about the cameras or points then we cannot recover π.
β’ Possible information (we will see details later)
β’ Calibration matrix is same for all cameras
β’ External constraints: orthogonal vanishing points
Projective ambiguity
This is a βprotectivelyβ correct reconstruction
β¦ but not a nice looking one
3D points map to image points
Affine ambiguity
We can write (most general): π₯ππ = ππππ = πππβ1πππ = ππβ²ππβ²
β’ π has now 12 DoF (affine ambiguity)
β’ π leaves the plane at infinity πβ = 0,0,0,1 π in place:
any point on πβ moves like: π π, π, π, 0 π = (πβ², πβ², πβ², 0)
β’ Therefore parallel 3D lines stay parallel for any π
Affine ambiguity
3D Points at infinity stay at infinity
Similarity Ambiguity (Metric space)
β’ π has now 12 DoF (similarity ambiguity)
β’ π preserves angles, ratios of lengths, etc.
β’ For visualization purpose this ambiguity is sufficient. (We donβt need to know if reconstruction is the size of 1m, 1cm, etc. means)
β’ Note, if we do not care about the choice of π we can set for instance the We can write (most general): π₯ππ = ππππ = πππβ1πππ = ππβ²ππβ²
Similarity Ambiguity
How to βupgradeβ a reconstruction
β’ Camera is calibrated
β’ Calibration from external constraints (Example(1): 5 known 3D points)
β’ Calibration from a mix of in- and external constraints
(Example(2): single camera and 3 orthogonal vanishing points and a square-pixel camera)
β’ Calibration from internal constraints only (known as auto-calibration)
(Examples(3): 2 views with unknown focal lengths)
Illustrating some ways to upgrade from Projective to Affine and then to Metric Space (see details in HZ page 270ff and chapter 19)
β’ Find plane at infinity and move in canonical position:
β’ One of the cameras is affine (3rdof camera matrix is plane at infinity. See HZ page 271)
β’ 3 non-collinear 3D vanishing points
β’ Translational motion (HZ page 268)
Projective to Metric: Direct Method (Example 1)
Given: Five known 3D points (e.g. measured) Compute π:
1) πππ = ππβ² (each 3D point gives 3 linear independent equations)
2) 5 points give 15 equations, enough to compute π (15 DoF) using SVD Upgrade cameras and points:
ππβ² = πππβ1 and ππβ² = πππ (remember: π₯ππ = ππππ = πππβ1πππ)
(Same method as above: βStep 3: Compute initial reconstructionβ)
β’ For a camera π = πΎ [πΌ | 0] the ray outwards is:
π₯ = π π hence π = πΎβ1π₯
β’ The angle Ξ is computed as the normalized rays π1, π2:
β’ We define the matrix: π = πΎβππΎβ1
β’ Comment: (πΎβ1)π = (πΎπ)β1 =: πΎβπ
But without external knowledge?
cos Ξ = π1ππ2
π1ππ1 π2ππ2 = πΎβ1π₯1 π πΎβ1π₯2
β πΎβ1π₯1 π πΎβ1π₯1 β πΎβ1π₯2 π πΎβ1π₯2
= π₯1ππ π₯2
π₯1πππ₯1 π₯2πππ₯2
~
But without external knowledge?
cos Ξ = π₯1ππ π₯2
π₯1πππ₯1 π₯2πππ₯2
β’ We have:
β’ If we were to know π then we can compute angle Ξ (Comment, if Ξ = 90π then we have π₯1ππ π₯2 = 0)
β’ πΎ can be derived from π = πΎβππΎβ1 using Cholesky decomposition (see HZ page 582)
β’ Note, π depends on πΎ only and not on π , πΆ. Hence it plays a central role in auto-calibration.
β’ How do we get π?
~
Degrees of Freedom of π
β’ We have:
π = (πΎβ1)ππΎβ1 =
π 0 0 π π 0 π π 1
π π π 0 π π 0 0 1
=
π2 ππ ππ
ππ π2 + π2 ππ + ππ ππ ππ + ππ π2 + π2 + 1
=
π1 π2 π3 π2 π4 π5 π3 π5 π6
πΎ =
π π ππ₯ 0 ππ ππ¦
0 0 1
then πΎβ1 =
π π π 0 π π 0 0 1
where π, π, π, π, π are some values that depend on: π, π, π , ππ₯, ππ¦
β’ Then it is:
β’ This means that π has 5 DoF (scale is not unique)
β’
Degrees of Freedom of π (special case)
β’ Assume we have a βsquare-pixelβ camera, i.e. π = 1 and π = 0 (practically this is often the case)
β’ We have:
π = (πΎβ1)ππΎβ1 =
πβ1 0 0 0 πβ1 0
π π 1
πβ1 0 π 0 πβ1 π
0 0 1
=
πβ2 0 πβ1π 0 πβ2 πβ1π πβ1π πβ1π π2 + π2 + 1
=
π1 0 π2 0 π1 π3 π2 π3 π4
πΎ =
π 0 ππ₯ 0 π ππ¦
0 0 1
then πΎβ1 =
πβ1 0 π 0 πβ1 π
0 0 1
where π, π are some values that depend on: π, ππ₯, ππ¦
β’ Then it is:
β’ This means that π has 3 DoF (scale is not unique)
Single Camera: internal + external constraints (Example 2)
β’ Square pixel cameras (i.e. π = 1, π = 0 in πΎ) gives
π =
π1 0 π2 0 π1 π3 π2 π3 π4
with only 3 DoF
β’ Given 3 image points π£1β3 that correspond to orthogonal directions We know: π£1ππ π£2 = 0; π£1ππ π£3 = 0; π£2ππ π£3 = 0
β’ This gives an linear system of equations π΄π = 0 with π΄ of size 3 Γ 4.
π£1
π£2 π£3
cos Ξ = π₯1ππ π₯2
π₯1πππ₯1 π₯2πππ₯2
Auto-Calibration: Only internal constraints
β’ Chapter 19 HZ
β’ Insight: Multiple views automatically give extra constraints (not discussed here in great detail)
intrinsic
πΎ =
π π ππ₯ 0 ππ ππ¦
0 0 1
Remember: We have 5 intrinsic parameters:
intrinsic
Example β Reconstruction from a Video
Building Rome in a day β Reconstruction from Flickr
[Agarwal, Snavely, Simon, Seitz, Szeliski; ICCV β99]
Main Problem of iterative methods is Drift
Solutions: 1) look for βLoop closureβ if possible (not discussed)
This is a different, probabilistic system with additional uncertainty, but it
illustrate the main problem of iterative reconstruction methods (before bundle adjustment): βDriftβ
Roadmap this lecture
β’ Multi-view reconstruction β general scenario (calibrated, un-calibrated cameras)
β’ From projective to metric space: auto-calibration
β’ Multi-view reconstruction - special scenarios
β’ affine cameras
β’ reference plane
Reminder: affine cameras (from previous lecture)
β’ Affine camera has 8 DoF:
β’ Parallel 3D lines map to parallel 2D lines (since points stay at infinity)
π₯ π¦ 1
=
π π π π π π π β 0 0 0 1
π π π 1
π₯ π¦ 0
=
π π π π π π π β 0 0 0 1
π ππ 0
β’ In short: π₯ = ππ + π‘
~ ~2 Γ 3 2 Γ 1
Reminder: Affine cameras (from previous lecture)
(very large focal length) (normal focal length)
βClose to parallel projectionβ
Affine Cameras give affine reconstruction
Assume we have reconstructed the scene with
ππ =
π π π π π π π β 0 0 0 1
Then the transformations π has to be an affine transformation in order to keep cameras affine:
π₯ππ = ππππ = ππππβ1ππ = ππβ²ππβ²
not:
Multi-View Reconstruction for affine cameras
(derivation on blackboard)
Multi-View Reconstruction for affine cameras
(derivation on blackboard)
Note, Frobenius norm:
π΄ πΉ = ( π π πππ 2)
1 2
Comments / Extensions
β’ Main restriction is that all points have to be visible in all views.
(can be used for a subset of views and then βzippingβ sub-views together)
β’ Extensions to missing data have been done (see HZ ch. 18)
β’ Extensions to projective cameras have been done (see HZ ch. 18.4)
β’ Extensions to non-rigidly moving scenes (see HZ ch. 18.3)
Roadmap this lecture
β’ Multi-view reconstruction β general scenario (calibrated, un-calibrated cameras)
β’ From projective to metric space: auto-calibration
β’ Multi-view reconstruction - special scenarios
β’ affine cameras
β’ reference plane (see extra slides below)
The following slides contain additional Information, which
is not relevant for the exam
Direct reference plane approach (DRP)
β’ π»β = πΎπ is called infinity homography since it is the mapping from the plane at infinity to the image:
π₯ = π»β(πΌ| β πΆ) π₯π¦ π§ 0
= π»β π₯π¦ π§
~
β’ Basic Idea: simply define any plane as plane at infinity πβ = 0,0,0,1 π (this can be done in projective space)
Direct reference plane approach (DRP)
Derivation on blackboard
[Rother PhD Thesis 2003]
Results
How to get infinite Homographies
β’ Real Plane in the scene:
β’ Fixed / known πΎ and π , e.g. translating camera with fixed camera intrinsic
β’ Orthogonal scene directions and a square pixel camera.
We can get out: πΎ, π (up to a small, discrete ambiguity)
Results: University Stockholm
β’ Between 2 views we have the so-called Kruppa equations: (see HZ ch. 19.4)
π’1ππ0β1π’1
π02π£0ππ1β1π£0
=
π’0ππ0β1π’1π0π1π£0ππ1β1π£1
=
π’0ππ0β1π’0π12π£1ππ1β1π£1
where SVD of πΉ = π’0 π’1 π1
π0 0 0 0 π1 0
0 0 0
π£0π π£1π π0π
and ππβ1 = (πΎπβππΎπβ1) = πΎπ πΎππ = diag(ππ2, ππ2, 1)
β’ This can be solved for π0, π1 in closed form (see blackboard)
Practically most important case (Example 3)
β’ Assume two cameras with: π = 0, π = 1, πππ ππ₯, ππ¦ known
β’ Let us shift images to get ππ₯ = 0, ππ¦ = 0
πΎ =
π π ππ₯ 0 ππ ππ¦
0 0 1
See HZ, example 19.8 (page 472)
π =
1 0 βππ₯ 0 1 βππ¦
0 0 1
we get: ππ₯ = ππΎ π (πΌ3Γ3 | β πΆ) π
ππΎ =
π π 0
0 ππ 0
0 0 1
The solution for π 0 , π 1
(
Constant intrinsic parameters (sketch only)
β’ Assume πΎ is constant over 3+ Frames then πΎ can be computed
β’ We know that we can get πΎ, π , πΆ from π = πΎ π (πΌ3Γ3 | β πΆ)
β’ We have π1, π2, π3 and it is
π₯π1 = π1ππ = π1πβ1πππ = π1β²ππβ² π₯π2 = π2ππ = π2πβ1πππ = π2β²ππβ² π₯π3 = π3ππ = π3πβ1πππ = π3β²ππβ²
β’ Try to find a π such that all π1, π2, π3 have the same πΎ but different π 1β3 and πΆ1β3
β’ See details in chapter 19 HZ
β’ (Note: this does not work if camera zooms during capture)
~
~
~
Side comment: Where does π come from?
β’ There a βstrange thingβ call the absolute conic Ξ©β = πΌ3Γ3 that lives on the plane at infinity πβ = 0,0,0,1 π
β’ The absolute conic is an βimaginary circle with radius πβ:
π₯, π¦, 1 Ξ©β π₯, π¦, 1 π = 0 hence: π₯2 + π¦2 = β1
β’ π is called the βimage of the absolute conicβ,
since it is the mapping of the absolute conic onto the image plane
β’ Proof:
image
π»β
π
1. The homography π»β = πΎπ is the mapping from the pane at infinity to the image plane. Since
π₯ = πΎπ πΌ β πΆ] π₯, π¦, π§, 0 π hence π₯ = πΎπ π₯, π¦, π§ π
2. The conic Ξ©β = I3Γ3 maps from the plane at infinity to πβ to the image as:
π»ββTΞ©βπ»ββ1 = πΎπ βππΌ πΎπ β1 = πΎβππ βππ β1πΎβ1 = πΎβππΎβ1 = π