Error Propagation - Error Propagation

The idea of error propagation — or propagation of statistical properties — is the following: given a vector of measurements x^∈IRⁿ with pdfpx, and a derived vector y^∈ IR^m such that

g:x→y=g(x) (3.40)

60 Error Propagation find the pdf py for y depending onx and px.

An example might make this more transparent: calculating the line ` through two points p₁ and p₂. Displacing one or both of the points will generally change the position of the line. So if we know that the two points (which could, e. g., be measured image coordinates) are random variables with probability distributions pp₁ andpp₂, it suggests itself to ask what the resulting line’s probability distribution p` will be.

3.3.1 Principle

For simplicity, let us consider the one-dimensional case first, i. e. random variables x=x ^∈IRandy=y ^∈IR. We would expect that the probability for any eventx⁰ to fall into a small region dx aroundx should be equal to the probability of an event y⁰ falling into the corresponding region dy around y in the limit dx→ 0. This can be written as

px(x)|dx|=py(y)|dy|. (3.41) If we assume that the inverse function

g⁻¹ :y→x=g⁻¹(y) (3.42)

is defined, we can write

p_x(x)|dx|=p_x g⁻¹(y)

∂g⁻¹(y)

∂y dy

=p_y(y)|dy| (3.43) or

py(y) =px g⁻¹(y)

∂g⁻¹(y)

∂y

. (3.44)

Taking the absolute value in Equations (3.43) and (3.44) ensures the correct sign of py(y). This is not a problem, since both g(x) and g⁻¹(y) are monotonic (otherwise they would not be invertible).

The extension to the multidimensional case x,y^∈ IRⁿ is straightforward [100], it is py(y) =px g⁻¹(y)

|J_xy|, (3.45)

where|J_xy|=|∂x/∂y|is the determinant of the Jacobian of the inverse transforma-tion x=g⁻¹(y) with respect to y.

The difficulty with the propagation of distributions is that we have to assume the existence of the inverse function. This means in particular that usually no solution is possible forx^∈IRⁿ,y^∈IR^m, m6=n, where, in general, no inverse transformation exists. This is, for example, the case with our example of the line through two points, since it is not possible to infer from the parameters of the line the positions of the points. The necessity to use the inverse function severely limits the usefulness of propagation of statistical properties. And even if Equations (3.44) or (3.45) can be applied, so will the resulting pdf, in general, become arbitrarily complex.

3.3.2 Linear Case 61

3.3.2 Linear Case

All these problems do not exist if we concentrate on linear functions only, i. e. func-tions of the form y = Ax+y₀ with x ^∈ IRⁿ, y,y₀ ^∈ IR^m, and A ^∈ IR^m×n. Linear functions will, in general, not change the particular type of distribution [100]. What is more, we have already seen from Equations 3.31 and 3.35 that

E(y) = E(Ax+b) =AE(x) +b (3.46)

Σ_y = Σ_Ax+b =AΣ_xA^T (3.47)

and in the case of a multidimensional normal distribution, we can directly write down the new distribution³.

Unfortunately, most interesting functions are not linear. It is therefore necessary to find a way to apply the above equations to any arbitrary, nonlinear function g(x). This is usually done by approximating g(x) by a linear function f(x). Series expansion, and Taylor series expansion⁴ in particular, is generally used for this purpose. The next section gives a short introduction into Taylor series expansion and the resulting laws for the propagation of statistical properties.

3.3.3 Explicit Functions

Any C¹ functiony=g(x) can be written as

y=g(x0+ ∆x) =g(x0) +J_yx₀∆x+O²(k∆xk²). (3.48) In the vicinity ofx₀ this can usually be approximated by a linear functionf(x) with y=g(x₀+ ∆x)≈y⁰ =f(x₀+ ∆x) =g(x₀) +J_yx₀∆x=y₀+J_yx₀∆x, (3.49) where J_yx₀ = ^∂y_∂x

x0 is the Jacobian of g(x) with respect to xat the point x₀. Note that the statistical properties are now associated with ∆x instead ofx, it is

E(∆x) = E(x−x₀) =E(x)−E(x0) =µ_x−x₀ (3.50) Σ_∆x = Σ_x−x₀ =Σ_x+Σ_x₀ =Σ_x. (3.51) The linearisation will usually introduce an error. In order to keep this error small, it is customary to set x₀ = µ_x and therefore E(∆x) = 0. From there and Equa-tion (3.49) we can calculate the mean and variance for the linearised funcEqua-tion y⁰ =f(x) as

E(y⁰) = E(y₀+J_yx₀∆x) =E(y₀) +J_yx₀E(∆x)y₀ =µ_y⁰ (3.52) Σ_y⁰ = Σ_y₀_+J_yx

0∆x=Σ_y₀ +J_yx₀Σ_∆xJ^T_yx₀ =J_yx₀Σ_xJ^T_yx₀ (3.53)

3It is possible to derive similar equations for higher-order central moments, independent of the distribution [100].

4It should be noted that a Taylor series will not necessarily converge, and even if it does, it will not necessarily converge towards the functiong(x) it is meant to represent — although it normally does.

62 Error Propagation

PSfrag replacements

y g(x)

f(x)

Figure 3.1: Bias and deformation of the pdf due to nonlinearities. The true transformed pdf (grey) is deformed and also slightly offset with respect to the linear approximation (black).

with y₀ =g(x0). Note that, in general, µ_y⁰ 6=µ_y =g(µ_x), that is the linearisation introduces a bias. This is always the case if g(x) is not well approximated by its tangent within the region of dispersion of its random variables, as seen in Figure 3.1.

This bias will always inflate the estimated covariance matrix, while the truncation of the Taylor series could either increase or decrease the result [29].

Just how good or bad this approximation is within a given region kx−x₀k² ≤∆x can be computed by calculating an upper bound on the remainderO²(k∆xk²). Four different forms for x,g(x) ^∈ IR are given in [65], and these are easily extended to x ^∈ IRⁿ, compare e. g. [22, p. 279]; in this thesis I will however only deal with first order approximations.

It is worth noting that, although the application of Equation (3.52) and in particular Equation (3.53) can result in extremely complex expressions, generating and using these expressions is a purely mechanical task which can in theory⁵ be done by any computer algebra program. For our example of a line ` ^∈ IR³ though two points p₁,p₂ ^∈IR³ this can be done as follows.

We know from Equation (2.25) that ` can be calculated as `=p₁×p₂, or alterna-tively as `=p₁

×p₂ =p₂

×p₁ with p_i× =





0 −z_i y_i zi 0 −xi

−yi xi 0



. (3.54)

5I still have to meet a computer algebra program which producesC-code that will not happily divide by zero or take the square-root of a negative argument.

3.3.4 Implicit Functions 63 The Jacobian is therefore J_`_p_i = (p₂_×,p₁_×). If we know that the two points are normally distributed with covariance matrices Σ_p₁,Σ_p₂ ∈ IR^3×3 we can calculate the line’s covariance matrix as using Equation (3.53). The mean is the line ` itself. Note that the covariance matrices are of course all singular, since both points and lines only have 2 DOF each. This will be discussed in more detail in Section 4.

3.3.4 Implicit Functions

In practice, a result y₀ ^∈ IR^m is often not calculated by an explicit function, but rather found as the set of parameters which extremises some function of the original data, i. e. the functiong(x) is not explicitly known (and its Jacobian therefore cannot be calculated as usual). What is known instead is a cost-functionC(x,y)^∈IRwhich we are trying to minimise. The necessary condition for an extremum is ^∂C(x,y)_∂y |x0 = 0, and the implicit function theorem (compare [29, 43]) gives us the JacobianJ_y₀_x for an unknown functiony=g(x) as

J_y₀_x = − idea of Lagrange multipliers, we can extend the above even further to constrained minimisation, see [29, 43] for more details.

Note that using Equation (3.53) a result’s uncertainty can be calculated even if the result itself has not been obtained as the solution to some optimal algorithm min-imising both the error and uncertainty of a particular calculation (a problem which often has no closed form solution), but rather by some faster but less accurate algo-rithm. However, using a faster, closed form solution might introduce a considerable bias and blow up Σ_y⁰, and the selection of a function g(x) that is both fast and accurate enough can become somewhat of an artform. Section 4 is practising this art for a number of common computer-vision constructs, mainly the ones introduced in Section 2.

3.3.5 Monte-Carlo Simulations

A second method for the propagation of statistical properties which is completely different from the analytical method given above is the Monte Carlo simulation. The basic idea is simple: given a function y =g(x) and a vector x (assumed perfectly

64 χ² Testing known), a large number of corrupted vectors x_i =x+r_i is created, where the r_i are distributed according to the measurement error’s pdf p∆x (assumed known). This large population is then used to estimate the pdf py of the samples y_i = g(x_i).

In particular, the mean µ_y and covariance matrix Σ_y can be approximated by the sample meanyand sample covariance matrixS_y using Equations (3.37) and (3.38).

Note that g(x) does not need to be given explicitly; the y_i could just as well have been found by minimisation or any other technique.

The quality of Monte Carlo simulation relies only on the number of samples — between 10,000 and 1,000,000 samples are not unusual — and the quality of the pseudo random number generator, which should have a period at least 10 times greater than the number of samples required [29].

The two methods — analytical first order error propagation and Monte Carlo simu-lation — complement each other. First order error propagation is a fast and — for sufficiently small errors — reliable method which gives an analytical, closed form solution for py. However, the equations used can become unwieldy, and it relies on the goodness of the linear approximation (which could in theory be assessed by calculating an upper bound for the remainder O2(k∆xk²), compare Section 3.3.3).

Monte Carlo simulation, on the other hand, makes no assumptions on the function g(x) or resulting pdf py and is easy to program; these advantages are offset by an extremely long execution time (several minutes or even hours). Also, the result of Monte Carlo simulation is just a high number of points y_i instead of a closed-form probability density function.

In practice, Monte Carlo simulation is therefore often used to assess the goodness of the analytical solution found by first order error propagation; either visually, plotting both the points found by Monte Carlo simulation as well as the confidence regions found by first order error propagation, or analytically, where for example the χ² distribution described in the next section is used to compute the probability that the points y_i follow the distribution py.

Im Dokument Error Propagation (Seite 59-64)