Virtual Depth Estimation - Robust Tracking and Mapping with a Light Field Camera

The estimation of the inverse virtual depth z is performed for each pixel x_R = [x_R, y_R]^T in the raw (sensor) imageIM L(xR). As already mentioned, the depth observations are obtained starting from the shortest baseline and working up to the largest possible baseline. Based on each new observation, the inverse depth hypothesis Z_{M L}(xR) of a raw image pixel x_R is updated, thus becoming more reliable.

Z_{M L}(x_R)∼ N(z(x_R), σ²_z(x_R)) (5.3) To reduce computational effort, the first step is to check for each baseline whether the pixel under consideration xR has a sufficiently high intensity gradient along the epipolar line, as defined in eq. (5.4).

|g_I(xR)^Tep| ≥TH (5.4)

Here g_I(xR) represents the intensity gradient vector at the coordinates xR (eq. (5.5)) and TH

represents a predefined threshold.

g_I(xR) =g_I(xR, yR) =h

∂I(xR,y_R)

∂xR

∂I(xR,y_R)

∂yR

(5.5) Figure5.2bshows an example of the matching cost (cf. eq. (5.8)) dependent on the inverse virtual depth z of a certain image point for three different epipolar line angles. In this specific example the intensity gradient g_I(xR) is almost orthogonal to e⁽¹⁾p = [1,0]^T. Therefore, no minimum is obtained in the cost function and hence, depth estimation does not have to be performed for this baseline. However, for the same point x_R both e⁽²⁾p =

0.5,−√ 0.75T

and e⁽³⁾p = 0.5,√

0.75T

result in unique minima.

5.3.1 Stereo Matching

To find the pixel in a certain micro image which corresponds to the pixel of interest x_R, it is searched for the minimum intensity difference along the epipolar line in the corresponding micro image.

If no inverse virtual depth observation for the pixel of interestx_R has been obtained yet, an exhaustive search along the epipolar line must be performed. For this case, the search range is limited by the micro lens border on the one end and by the coordinates of xR with respect to the micro lens center on the other end. A pixel on the micro lens border results in the maximum observable disparityµ and thus in the minimum observable virtual depth v, while a pixel at the same coordinates as the pixel of interest in the corresponding micro image equals a disparity µ = 0 and thus also equals a virtual depth v = ∞. Furthermore, the virtual depth range is

5.3. VIRTUAL DEPTH ESTIMATION 51

lower bounded by the total covering plane (TCP). The total covering plane (TCP) defines the plane closest to theMLAfor which it can be guaranteed that any virtual image pointx_V on the plane appears focused in at least one micro image. For a multi-focus plenoptic camera with a hexagonally arranged MLA, consisting of three different types of micro lenses, the TCP is at a virtual depth v= 2 (seePerwaß and Wietzke [2012]).

If an inverse virtual depth hypothesisZ_{M L}(x_R) for a pointx_Ralready exists, the search range can be limited to z(xR)±nσz(xR), where n is usually chosen to be n= 2. In the following we define the search range along the epipolar line as given in eq. (5.6).

x^s_R(µ) =x^s_R0−µ·e_p. (5.6)

Here x^s_R0 is defined as the coordinate of a point on the epipolar line at the disparity µ = 0, as given in eq. (5.7).

x^s_R0=xR+∆cM L·ep (5.7)

Within the search range, the sum of squared intensity differences eISS over a 1D pixel patch (1×N) along the epipolar line is calculated, as defined in eq. (5.8).

e_ISS(µ) =

N−1

k=−^N−1₂

[I_{M L}(x_R+ke_p)−I_{M L}(x^s_R(µ) +ke_p)]² (5.8)

The resulting estimate is the disparity µ which minimizes e_ISS(µ). For the implementation we found N = 5 to be a good choice for the patch size. It is important to emphasize that µ is estimated with sub-pixel accuracy by interpolating linearly between the samples of the intensity imageI_{M L}(xR) and therefore the disparity is not restricted to any regular grid. In the following it is referred to the estimated disparity by ˆµ, which defines the corresponding pixel coordinate x^s_R(ˆµ).

Observation Uncertainty

For each inverse virtual depth observation, an observation uncertainty, represented by the inverse virtual depth variance σ_z², is calculated. Hence, the variance σ²_z defines a measure of certainty with which an estimate represents the corresponding real value. For the determination of the variance σ_z², on the one hand, a photometric component is considered, which models the effect of sensor noise, and on the other hand a focus component is considered, which models the effect of differently focused micro images, as they occur in a multi-focus plenoptic camera (Perwaß and Wietzke[2012]).

Photometric Disparity Error The effect of sensor noise ǫ_n on the estimated disparity µ is modeled in a photometric disparity error, defined by the variance σ_µ(I)² . This error source is inspired by Engel et al.[2013] and is modeled in a similar way.

The variance of the sensor noise σ²_n can be considered to be the same for each pixel x_R. In doing so, the effect of vignetting correction is neglected. To compensate for vignetting of the micro lenses pixel intensities at micro image boundaries and, accordingly, the additive noise components are amplified.

In the following it will be derived how σ²_n affects the disparity estimation. To do this, we formulate the stereo matching by the minimization problem given in eq. (5.9), where the estimated

disparity ˆµis the one which minimizes the squared intensity difference e_I(µ)². In the interest of temporarily simplifying the expressions, the sum over a number of pixels is omitted, as defined foreISS(µ) in eq. (5.8).

µ= arg min

e_I(µ)²

= arg min

h(IM L(xR)−IM L(x^s_R(µ)))²i

(5.9) To find the minimum, the first derivative of e_I(µ)² with respect to µ is calculated and is set to zero. This results in eq. (5.11) as long asgI(µ)6= 0 holds true (which is guaranteed by eq. (5.4)).

∂e_I(µ)²

∂µ = ∂[I_{M L}(x_R)−I_{M L}(x^s_R(µ))]²

∂µ

= 2 [I_{M L}(x_R)−I_{M L}(x^s_R(µ))]·[−g_I(µ)] (5.10) 0=^! I_{M L}(xR)−I_{M L}(x^s_R(µ)) (5.11) In eq. (5.10) the intensity gradient along the epipolar lineg_I(µ) is defined as follows:

g_I(µ) =g_I(x^s_R(µ)) = ∂IM L(x^s_R0−µep)

∂µ =g_I(x^s_R(µ))^Tep. (5.12) After approximating eq. (5.11) by its first-order Taylor-series at the disparity µ = µ₀, which is close to ˆµ, it can be solved for ˆµas given in eq. (5.13).

µ= I_{M L}(xR)−I_{M L}(x^s_R(µ0))

gI x^s_R(µ₀) +µ₀ (5.13)

If one now considers IM L(xR) in eq. (5.13) to be disturbed byAWGN, the variance σ_µ(I)² of the disparityµ can be derived as given in eq. (5.14).

σ_µ(I)² = Var{I_{M L}(xR)}+ Var{I_{M L}(x^s_R(µ0))} gI(x^s_R(µ₀))²

= 2σ_n²

gI(x^s_R(µ₀))² (5.14)

Figure5.3illustrates how the gradientg_I affects the estimation ofµ. The blue line represents the tangent at the disparityµ₀, at which the intensity values are projected onto the disparities.

Focus Disparity Error In contrast to regular cameras, which generally are focused to infinity, the micro images of the plenoptic camera can not be considered to be in focus for the entire operating range, especially not for a multi-focus plenoptic camera (Perwaß and Wietzke [2012]).

Hence, the focusing itself also affects the stereo observation. In the following the varianceσ²_µ(v,k,j) of the focus disparity error is derived.

Letkbe the index of the micro image for which mapping is performed, whilej is the index of the stereo reference. To model the focus disparity error, it is considered that the real edgeI_real(x), which is observed in the micro images, is a perfect Heaviside-step-function h(x) with amplitude A and offsetB along the epipolar line:

I_real(x) =A·h(x) +B. (5.15)

5.3. VIRTUAL DEPTH ESTIMATION 53

gI(µ₀)µ IM L(µ)

Figure 5.3: Visualization of the photometric disparity error. Sensor noise ǫn can be considered additive white Gaussian noise (AWGN) which disturbs the intensity values I_{M L}(xR) and thus affects the disparity observation asAWGN. As shown on the left, for a low image gradient along the epipolar line, the influence of the sensor noisenI is stronger than for a high image gradient, as it is shown on the right.

The variable x is the position on the respective epipolar line in relation to the step position µ_0i (i∈ {k, j}):

x=µ_k−µ_0k=µ_j −µ_0j. (5.16)

Therefore, the correct disparity is defined by µ^∗=µ_0j−µ_0k. During the imaging process the edgeI_real(x) is filtered by a Gaussian filter with varianceσ²_i, which in turn depends on the virtual depthv, the micro lens type and the aperture of a pixel¹.

Thus, on the sensor one receives the following intensity functions along the epipolar line:

I_{M Li}(x) =B+A 2

1 + erf

x σ_i√

i∈ {k, j}. (5.17)

The estimated disparity ˆµ=µ^∗+ǫ_f is the one for which both intensities have the same value and therefore, the following condition is fulfilled:

IM Lk(x)=^! IM Lj(x−ǫf), (5.18) erf

x σk

√2 !

= erf x−ǫ_f σj

√2

. (5.19)

Since the error function (erf(·)) can not be solved analytically, eq. (5.19) is linearized and one obtains, after some rearranging, the following relationship between the focus disparity error ǫf

and the position on the edgex:

ǫf =x·

1− σ_j σ_k

. (5.20)

Figure5.4visualizes the focus disparity errorǫ_f, by way of example, for two different positions (x₁ and x₂), in relation to the real edge position on the epipolar line. Here, the red and blue curves represent the intensity along the epipolar line in two different micro images with different blur radii. For a certain positionxthe estimated disparity ˆµwill be the one for which both curves have the same value and therefore will be shifted by ǫ_f with respect to the real disparity µ^∗, as

1Pixel aperture results rather in a sinc response function. However, the Gaussian kernel gives a good approxi-mation as defocus blur will generally be dominate.

x₁ 0 x₂ B

A+B

ǫf₁

ǫ_f₂

x IML(x)

I_{M Lk}(x) I_{M Lj}(x)

Figure 5.4: Visualization of the focus disparity error ǫ_f for two different positions (x₁ and x₂), in relation to the real edge position, on the epipolar line. The red and blue curves represent the intensity along the epipolar line in two different micro images with different blur radii. For a certain position x on the edge the estimated disparity ˆµ will be the one for which both curves have the same value and therefore will be shifted by ǫ_f with respect to the real disparity µ^∗. given in eq. (5.18). For the case that both images have the same blur radius, both curves are perfectly overlaid and therefore ǫ_f = 0 will hold for any position x.

Considering the position in relation to the real edgexas a random variable with variance σ_x², the varianceσ²_µ(v,k,j) of the focus disparity error is as follows:

σ_µ(v,k,j)² =σ_x²·

1−σ_j σ_k

(5.21) The standard deviationσi (i∈ {k, j}) depends on the blur diametersi of the respective micro image, which is calculated based on the virtual depthv and the micro lens parameters (DM,f_M, B) using the thin lens equation:

s_i =D_M ·

1 vi

+ B

fM i −1

(5.22)

Since the minimum blur radius is limited by the pixel pitch and the overall optical system, the blur radius has a lower boundarys₀. Thus the following variancesσ²_k and σ_j² result to be:

σ_k²=β²·min{s²_k, s²₀}, σ²_j =β²·min{s²_j, s²₀} (5.23) The constant parameterβ models the scaling from blur diameter to the standard deviation of the Gaussian filter.

Considering the two error sources, photometric disparity error and focus disparity error, as independent random variables, the complete observation uncertainty for the inverse virtual depth z can be defined as given in eq. (5.24)

σ²_z =∆c⁻²_{M L}·

σ²_µ(I₎+σ²_µ(v,k,j)

(5.24)

Im Dokument Robust Tracking and Mapping with a Light Field Camera (Seite 70-74)