• Keine Ergebnisse gefunden

Error Propagation

Im Dokument Error Propagation (Seite 59-64)

The idea of error propagation — or propagation of statistical properties — is the following: given a vector of measurements xIRn with pdfpx, and a derived vector y IRm such that

g:x→y=g(x) (3.40)

60 Error Propagation find the pdf py for y depending onx and px.

An example might make this more transparent: calculating the line ` through two points p1 and p2. Displacing one or both of the points will generally change the position of the line. So if we know that the two points (which could, e. g., be measured image coordinates) are random variables with probability distributions pp1 andpp2, it suggests itself to ask what the resulting line’s probability distribution p` will be.

3.3.1 Principle

For simplicity, let us consider the one-dimensional case first, i. e. random variables x=x IRandy=y IR. We would expect that the probability for any eventx0 to fall into a small region dx aroundx should be equal to the probability of an event y0 falling into the corresponding region dy around y in the limit dx→ 0. This can be written as

px(x)|dx|=py(y)|dy|. (3.41) If we assume that the inverse function

g−1 :y→x=g−1(y) (3.42)

is defined, we can write

px(x)|dx|=px g−1(y)

∂g−1(y)

∂y dy

=py(y)|dy| (3.43) or

py(y) =px g−1(y)

∂g−1(y)

∂y

. (3.44)

Taking the absolute value in Equations (3.43) and (3.44) ensures the correct sign of py(y). This is not a problem, since both g(x) and g−1(y) are monotonic (otherwise they would not be invertible).

The extension to the multidimensional case x,y IRn is straightforward [100], it is py(y) =px g−1(y)

|Jxy|, (3.45)

where|Jxy|=|∂x/∂y|is the determinant of the Jacobian of the inverse transforma-tion x=g−1(y) with respect to y.

The difficulty with the propagation of distributions is that we have to assume the existence of the inverse function. This means in particular that usually no solution is possible forxIRn,yIRm, m6=n, where, in general, no inverse transformation exists. This is, for example, the case with our example of the line through two points, since it is not possible to infer from the parameters of the line the positions of the points. The necessity to use the inverse function severely limits the usefulness of propagation of statistical properties. And even if Equations (3.44) or (3.45) can be applied, so will the resulting pdf, in general, become arbitrarily complex.

3.3.2 Linear Case 61

3.3.2 Linear Case

All these problems do not exist if we concentrate on linear functions only, i. e. func-tions of the form y = Ax+y0 with x IRn, y,y0 IRm, and A IRm×n. Linear functions will, in general, not change the particular type of distribution [100]. What is more, we have already seen from Equations 3.31 and 3.35 that

E(y) = E(Ax+b) =AE(x) +b (3.46)

Σy = ΣAx+b =AΣxAT (3.47)

and in the case of a multidimensional normal distribution, we can directly write down the new distribution3.

Unfortunately, most interesting functions are not linear. It is therefore necessary to find a way to apply the above equations to any arbitrary, nonlinear function g(x). This is usually done by approximating g(x) by a linear function f(x). Series expansion, and Taylor series expansion4 in particular, is generally used for this purpose. The next section gives a short introduction into Taylor series expansion and the resulting laws for the propagation of statistical properties.

3.3.3 Explicit Functions

Any C1 functiony=g(x) can be written as

y=g(x0+ ∆x) =g(x0) +Jyx0∆x+O2(k∆xk2). (3.48) In the vicinity ofx0 this can usually be approximated by a linear functionf(x) with y=g(x0+ ∆x)≈y0 =f(x0+ ∆x) =g(x0) +Jyx0∆x=y0+Jyx0∆x, (3.49) where Jyx0 = ∂y∂x

x0 is the Jacobian of g(x) with respect to xat the point x0. Note that the statistical properties are now associated with ∆x instead ofx, it is

E(∆x) = E(x−x0) =E(x)−E(x0) =µx−x0 (3.50) Σ∆x = Σx−x0xx0x. (3.51) The linearisation will usually introduce an error. In order to keep this error small, it is customary to set x0 = µx and therefore E(∆x) = 0. From there and Equa-tion (3.49) we can calculate the mean and variance for the linearised funcEqua-tion y0 =f(x) as

E(y0) = E(y0+Jyx0∆x) =E(y0) +Jyx0E(∆x)y0y0 (3.52) Σy0 = Σy0+Jyx

0∆xy0 +Jyx0Σ∆xJTyx0 =Jyx0ΣxJTyx0 (3.53)

3It is possible to derive similar equations for higher-order central moments, independent of the distribution [100].

4It should be noted that a Taylor series will not necessarily converge, and even if it does, it will not necessarily converge towards the functiong(x) it is meant to represent — although it normally does.

62 Error Propagation

PSfrag replacements

x

y g(x)

f(x)

Figure 3.1: Bias and deformation of the pdf due to nonlinearities. The true transformed pdf (grey) is deformed and also slightly offset with respect to the linear approximation (black).

with y0 =g(x0). Note that, in general, µy0 6=µy =g(µx), that is the linearisation introduces a bias. This is always the case if g(x) is not well approximated by its tangent within the region of dispersion of its random variables, as seen in Figure 3.1.

This bias will always inflate the estimated covariance matrix, while the truncation of the Taylor series could either increase or decrease the result [29].

Just how good or bad this approximation is within a given region kx−x0k2 ≤∆x can be computed by calculating an upper bound on the remainderO2(k∆xk2). Four different forms for x,g(x) IR are given in [65], and these are easily extended to x IRn, compare e. g. [22, p. 279]; in this thesis I will however only deal with first order approximations.

It is worth noting that, although the application of Equation (3.52) and in particular Equation (3.53) can result in extremely complex expressions, generating and using these expressions is a purely mechanical task which can in theory5 be done by any computer algebra program. For our example of a line ` IR3 though two points p1,p2 IR3 this can be done as follows.

We know from Equation (2.25) that ` can be calculated as `=p1×p2, or alterna-tively as `=p1

×p2 =p2

×p1 with p =

0 −zi yi zi 0 −xi

−yi xi 0

. (3.54)

5I still have to meet a computer algebra program which producesC-code that will not happily divide by zero or take the square-root of a negative argument.

3.3.4 Implicit Functions 63 The Jacobian is therefore J`pi = (p2×,p1×). If we know that the two points are normally distributed with covariance matrices Σp1p2 IR3×3 we can calculate the line’s covariance matrix as using Equation (3.53). The mean is the line ` itself. Note that the covariance matrices are of course all singular, since both points and lines only have 2 DOF each. This will be discussed in more detail in Section 4.

3.3.4 Implicit Functions

In practice, a result y0 IRm is often not calculated by an explicit function, but rather found as the set of parameters which extremises some function of the original data, i. e. the functiong(x) is not explicitly known (and its Jacobian therefore cannot be calculated as usual). What is known instead is a cost-functionC(x,y)IRwhich we are trying to minimise. The necessary condition for an extremum is ∂C(x,y)∂y |x0 = 0, and the implicit function theorem (compare [29, 43]) gives us the JacobianJy0x for an unknown functiony=g(x) as

Jy0x = − idea of Lagrange multipliers, we can extend the above even further to constrained minimisation, see [29, 43] for more details.

Note that using Equation (3.53) a result’s uncertainty can be calculated even if the result itself has not been obtained as the solution to some optimal algorithm min-imising both the error and uncertainty of a particular calculation (a problem which often has no closed form solution), but rather by some faster but less accurate algo-rithm. However, using a faster, closed form solution might introduce a considerable bias and blow up Σy0, and the selection of a function g(x) that is both fast and accurate enough can become somewhat of an artform. Section 4 is practising this art for a number of common computer-vision constructs, mainly the ones introduced in Section 2.

3.3.5 Monte-Carlo Simulations

A second method for the propagation of statistical properties which is completely different from the analytical method given above is the Monte Carlo simulation. The basic idea is simple: given a function y =g(x) and a vector x (assumed perfectly

64 χ2 Testing known), a large number of corrupted vectors xi =x+ri is created, where the ri are distributed according to the measurement error’s pdf p∆x (assumed known). This large population is then used to estimate the pdf py of the samples yi = g(xi).

In particular, the mean µy and covariance matrix Σy can be approximated by the sample meanyand sample covariance matrixSy using Equations (3.37) and (3.38).

Note that g(x) does not need to be given explicitly; the yi could just as well have been found by minimisation or any other technique.

The quality of Monte Carlo simulation relies only on the number of samples — between 10,000 and 1,000,000 samples are not unusual — and the quality of the pseudo random number generator, which should have a period at least 10 times greater than the number of samples required [29].

The two methods — analytical first order error propagation and Monte Carlo simu-lation — complement each other. First order error propagation is a fast and — for sufficiently small errors — reliable method which gives an analytical, closed form solution for py. However, the equations used can become unwieldy, and it relies on the goodness of the linear approximation (which could in theory be assessed by calculating an upper bound for the remainder O2(k∆xk2), compare Section 3.3.3).

Monte Carlo simulation, on the other hand, makes no assumptions on the function g(x) or resulting pdf py and is easy to program; these advantages are offset by an extremely long execution time (several minutes or even hours). Also, the result of Monte Carlo simulation is just a high number of points yi instead of a closed-form probability density function.

In practice, Monte Carlo simulation is therefore often used to assess the goodness of the analytical solution found by first order error propagation; either visually, plotting both the points found by Monte Carlo simulation as well as the confidence regions found by first order error propagation, or analytically, where for example the χ2 distribution described in the next section is used to compute the probability that the points yi follow the distribution py.

Im Dokument Error Propagation (Seite 59-64)