• Keine Ergebnisse gefunden

Error Propagation

Im Dokument Error Propagation (Seite 64-70)

The idea of error propagation — or propagation of statistical properties — is the following: given a vector of measurements x IRn with pdfpx, and a derived vectoryIRmsuch that

g:x→y=g(x) (3.40)

find the pdfpy forydepending onxandpx.

An example might make this more transparent: calculating the line ℓ passing through the two points p1 and p2. Displacing one or both of the points will generally change the position of the line. So if we know that the two points (which could, e. g., be measured image coordinates) are random variables with probability distributionspp1 andpp2, it suggests itself to ask what the resulting line’s probability distributionpwill be.

3.3.1 Principle 65

3.3.1 Principle

For simplicity, let us consider the one-dimensional case first, i. e. random variables x=xIRandy=yIR. We would expect that the probability for any event x to fall into a small region dx aroundxshould be equal to the probability of an eventyfalling into the corresponding region dyaroundyin the limit dx→0.

This can be written as

px(x)|dx|=py(y)|dy|. (3.41) If we assume that the inverse function

g−1:y→x=g−1(y) (3.42)

is defined, we can write

px(x)|dx|=px g−1(y)

∂g−1(y)

∂y dy

=py(y)|dy| (3.43) or

py(y) =px g−1(y)

∂g−1(y)

∂y

. (3.44)

Taking the absolute value in Equations (3.43) and (3.44) ensures the correct sign of py(y). This is not a problem, since bothg(x) andg−1(y) are monotonic (otherwise they would not be invertible).

The extension to the multidimensional case withx,yIRnis straightforward [100], it is

py(y) =px g−1(y)

|Jxy|, (3.45)

where |Jxy|=|∂x/∂y|is the determinant of the Jacobian of the inverse transfor-mationx=g−1(y) with respect toy.

The difficulty with the propagation of distributions is that we have to assume the existence of the inverse function. This means in particular that usually no solution is possible for x IRn, y IRm,m6= n, where, in general, no inverse transformation exists. This is, for example, the case with our example of the line through two points, since it is not possible to infer from the parameters of the line the positions of the points. The necessity to use the inverse function severely limits the usefulness of propagation of statistical properties. And even if Equations (3.44) or (3.45) can be applied, so will the resulting pdf, in general, become arbitrarily complex.

66 Error Propagation

3.3.2 Linear Case

All these problems do not exist if we concentrate on linear functions only, i. e.

functions of the formy=Ax+y0withxIRn,y,y0IRm, andAIRm×n. Lin-ear functions will, in general, not change the particular type of distribution [100].

What is more, we have already seen from Equations 3.31 and 3.35 that

E(y) =E(Ax+b) =AE(x) +b (3.46)

ΣyAx+b =AΣxAT (3.47)

and in the case of a multidimensional normal distribution, we can directly write down the new distribution3.

Unfortunately, most interesting functions are not linear. It is therefore necessary to find a way to apply the above equations to any arbitrary, nonlinear function g(x). This is usually done by approximatingg(x) by a linear functionf(x). Series expansion, and Taylor series expansion4 in particular, is generally used for this purpose. The next section gives a short introduction into Taylor series expansion and the resulting laws for the propagation of statistical properties.

3.3.3 Explicit Functions

AnyC1 functiony=g(x) can be written as

y=g(x0+ ∆x) =g(x0) +Jyx0∆x+O2(k∆xk2). (3.48) In the vicinity ofx0 this can usually be approximated by a linear function f(x) with

y=g(x0+ ∆x)≈y=f(x0+ ∆x) =g(x0) +Jyx0∆x=y0+Jyx0∆x, (3.49) whereJyx0= ∂y∂x

x0

is the Jacobian ofg(x) with respect toxat the pointx0. Note that the statistical properties are now associated with ∆xinstead ofx, it is

E(∆x) =E(x−x0) =E(x)−E(x0) =µx−x0 (3.50) Σ∆xx−x0xx0x. (3.51)

3It is possible to derive similar equations for higher-order central moments, independent of the distribution [100].

4It should be noted that a Taylor series will not necessarily converge, and even if it does, it will not necessarily converge towards the functiong(x) it is meant to represent — although it normally does.

3.3.3 Explicit Functions 67

x

y g(x)

f(x)

Figure 3.1: Bias and deformation of the pdf due to nonlinearities. The true transformed pdf (grey) is deformed and also slightly offset with respect to the linear approximation (black).

The linearisation will usually introduce an error. In order to keep this error small, it is customary to setx0x and therefore E(∆x) =0. From there and Equa-tion (3.49) we can calculate the mean and variance for the linearised funcEqua-tion y=f(x) as

E(y) =E(y0+Jyx0∆x) =E(y0) +Jyx0E(∆x)y0y (3.52) Σyy

0+Jyx0∆xy

0+Jyx0Σ∆xJTyx

0=Jyx0ΣxJTyx

0 (3.53)

withy0=g(x0). Note that, in general,µy6=µy=g(µx), that is the linearisation introduces a bias. This is always the case ifg(x) is not well approximated by its tangent within the region of dispersion of its random variables, as seen in Figure 3.1.

This bias will always inflate the estimated covariance matrix, while the truncation of the Taylor series could either increase or decrease the result [29].

Just how good or bad this approximation is within a given regionkx−x0k2≤∆x can be computed by calculating an upper bound on the remainder O2(k∆xk2).

Four different forms forx,g(x)IRare given in [65], and these are easily extended toxIRn, compare e. g. [22, p. 279]; in this thesis I will however only deal with first order approximations.

It is worth noting that, although the application of Equation (3.52) and in partic-ular Equation (3.53) can result in extremely complex expressions, generating and

68 Error Propagation

using these expressions is a purely mechanical task which can in theory5 be done by any computer algebra program. For our example of a lineℓIR3 though two pointsp1,p2IR3this can be done as follows.

The Jacobian is thereforeJp

i = (p2×,p1×). If we know that the two points are normally distributed with covariance matricesΣp

1p

2 IR3×3we can calculate the line’s covariance matrix as

Σ= (p2×,p1×) using Equation (3.53). The mean is the lineℓ itself. Note that the covariance matrices are of course all singular, since both points and lines only have 2 DOF each. This will be discussed in more detail in Section 4.

3.3.4 Implicit Functions

In practice, a result y0 IRm is often not calculated by an explicit function, but rather found as the set of parameters which extremises some function of the original data, i. e. the function g(x) is not explicitly known (and its Jacobian therefore cannot be calculated as usual). What is known instead is a cost-function C(x,y) IR which we are trying to minimise. The necessary condition for an extremum is ∂C(x,y)∂y |x0= 0, and theimplicit function theorem (compare [29, 43]) gives us the JacobianJy0x for anunknownfunctiony=g(x) as idea of Lagrange multipliers, we can extend the above even further to constrained minimisation, see [29, 43] for more details.

5I still have to meet a computer algebra program which producesC-code that will not happily divide by zero or take the square-root of a negative argument.

3.3.5 Monte-Carlo Simulations 69

Note that using Equation (3.53) a result’s uncertainty can be calculated even if the result itself hasnot been obtained as the solution to some optimal algorithm minimising both the error and uncertainty of a particular calculation (a problem which often has no closed form solution), but rather by some faster but less ac-curate algorithm. However, using a faster, closed form solution might introduce a considerable bias and blow upΣy, and the selection of a functiong(x) that is both fast and accurate enough can become somewhat of an artform. Section 4 is practising this art for a number of common computer-vision constructs, mainly the ones introduced in Section 2.

3.3.5 Monte-Carlo Simulations

A second method for the propagation of statistical properties which is completely different from the analytical method given above is the Monte Carlo simulation.

The basic idea is simple: given a function y = g(x) and a vector x (assumed perfectly known), a large number of corrupted vectors xi = x+ri is created, where the ri are distributed according to the measurement error’s pdf p∆x (as-sumed known). This large population is then used to estimate the pdfpy of the samplesyi = g(xi). In particular, the meanµy and covariance matrix Σy can be approximated by the sample meany and sample covariance matrix Sy using Equations (3.37) and (3.38). Note thatg(x) does not need to be given explicitly;

theyicould just as well have been found by minimisation or any other technique.

The quality of Monte Carlo simulation relies only on the number of samples — between 10,000 and 1,000,000 samples are not unusual — and the quality of the pseudo random number generator, which should have a period at least 10 times greater than the number of samples required [29].

The two methods — analytical first order error propagation and Monte Carlo simulation — complement each other. First order error propagation is a fast and

— for sufficiently small errors — reliable method which gives an analytical, closed form solution for py. However, the equations used can become unwieldy, and it relies on the goodness of the linear approximation (which could in theory be assessed by calculating an upper bound for the remainderO2(k∆xk2), compare Section 3.3.3).

Monte Carlo simulation, on the other hand, makes no assumptions on the function g(x) or resulting pdfpyand is easy to program; these advantages are offset by an extremely long execution time (several minutes or even hours). Also, the result of Monte Carlo simulation is just a high number of pointsyiinstead of a closed-form probability density function.

70 χ2Testing

In practice, Monte Carlo simulation is therefore often used to assess the goodness of the analytical solution found by first order error propagation; either visually, plotting both the points found by Monte Carlo simulation as well as the confidence regions found by first order error propagation, or analytically, where for example theχ2distribution described in the next section is used to compute the probability that the pointsyifollow the distributionpy.

Im Dokument Error Propagation (Seite 64-70)