Error Propagation - Error Propagation

The idea of error propagation — or propagation of statistical properties — is the following: given a vector of measurements x ∈ IRⁿ with pdfpx, and a derived vectory^∈IR^msuch that

g:x→y=g(x) (3.40)

find the pdfp_y forydepending onxandp_x.

An example might make this more transparent: calculating the line ℓ passing through the two points p₁ and p₂. Displacing one or both of the points will generally change the position of the line. So if we know that the two points (which could, e. g., be measured image coordinates) are random variables with probability distributionspp₁ andpp₂, it suggests itself to ask what the resulting line’s probability distributionpℓwill be.

3.3.1 Principle 65

3.3.1 Principle

For simplicity, let us consider the one-dimensional case first, i. e. random variables x=x^∈IRandy=y^∈IR. We would expect that the probability for any event x^′ to fall into a small region dx aroundxshould be equal to the probability of an eventy^′falling into the corresponding region dyaroundyin the limit dx→0.

This can be written as

px(x)|dx|=py(y)|dy|. (3.41) If we assume that the inverse function

g⁻¹:y→x=g⁻¹(y) (3.42)

is defined, we can write

px(x)|dx|=px g⁻¹(y)

∂g⁻¹(y)

∂y dy

=py(y)|dy| (3.43) or

py(y) =px g⁻¹(y)

∂g⁻¹(y)

∂y

. (3.44)

Taking the absolute value in Equations (3.43) and (3.44) ensures the correct sign of py(y). This is not a problem, since bothg(x) andg⁻¹(y) are monotonic (otherwise they would not be invertible).

The extension to the multidimensional case withx,y∈IRⁿis straightforward [100], it is

py(y) =px g⁻¹(y)

|J_xy|, (3.45)

where |Jxy|=|∂x/∂y|is the determinant of the Jacobian of the inverse transfor-mationx=g⁻¹(y) with respect toy.

The difficulty with the propagation of distributions is that we have to assume the existence of the inverse function. This means in particular that usually no solution is possible for x^∈ IRⁿ, y ^∈IR^m,m6= n, where, in general, no inverse transformation exists. This is, for example, the case with our example of the line through two points, since it is not possible to infer from the parameters of the line the positions of the points. The necessity to use the inverse function severely limits the usefulness of propagation of statistical properties. And even if Equations (3.44) or (3.45) can be applied, so will the resulting pdf, in general, become arbitrarily complex.

66 Error Propagation

3.3.2 Linear Case

All these problems do not exist if we concentrate on linear functions only, i. e.

functions of the formy=Ax+y₀withx^∈IRⁿ,y,y₀^∈IR^m, andA∈IR^m×n. Lin-ear functions will, in general, not change the particular type of distribution [100].

What is more, we have already seen from Equations 3.31 and 3.35 that

E(y) =E(Ax+b) =AE(x) +b (3.46)

Σ_y=ΣAx+b =AΣ_xA^T (3.47)

and in the case of a multidimensional normal distribution, we can directly write down the new distribution³.

Unfortunately, most interesting functions are not linear. It is therefore necessary to find a way to apply the above equations to any arbitrary, nonlinear function g(x). This is usually done by approximatingg(x) by a linear functionf(x). Series expansion, and Taylor series expansion⁴ in particular, is generally used for this purpose. The next section gives a short introduction into Taylor series expansion and the resulting laws for the propagation of statistical properties.

3.3.3 Explicit Functions

AnyC¹ functiony=g(x) can be written as

y=g(x0+ ∆x) =g(x0) +J_yx₀∆x+O²(k∆xk²). (3.48) In the vicinity ofx0 this can usually be approximated by a linear function f(x) with

y=g(x0+ ∆x)≈y^′=f(x0+ ∆x) =g(x0) +J_yx₀∆x=y₀+J_yx₀∆x, (3.49) whereJ_yx₀= ^∂y_∂x

is the Jacobian ofg(x) with respect toxat the pointx0. Note that the statistical properties are now associated with ∆xinstead ofx, it is

E(∆x) =E(x−x₀) =E(x)−E(x0) =µ_x−x₀ (3.50) Σ_∆x=Σ_x−x₀ =Σ_x+Σ_x₀ =Σ_x. (3.51)

3It is possible to derive similar equations for higher-order central moments, independent of the distribution [100].

4It should be noted that a Taylor series will not necessarily converge, and even if it does, it will not necessarily converge towards the functiong(x) it is meant to represent — although it normally does.

3.3.3 Explicit Functions 67

y g(x)

f(x)

Figure 3.1: Bias and deformation of the pdf due to nonlinearities. The true transformed pdf (grey) is deformed and also slightly offset with respect to the linear approximation (black).

The linearisation will usually introduce an error. In order to keep this error small, it is customary to setx0=µ_x and therefore E(∆x) =0. From there and Equa-tion (3.49) we can calculate the mean and variance for the linearised funcEqua-tion y^′=f(x) as

E(y^′) =E(y₀+J_yx₀∆x) =E(y₀) +J_yx₀E(∆x)y₀=µ_y′ (3.52) Σ_y^′=Σ_y

0+Jyx0∆x=Σ_y

0+J_yx₀Σ_∆xJ^T_yx

0=J_yx₀Σ_xJ^T_yx

0 (3.53)

withy₀=g(x0). Note that, in general,µ_y′6=µ_y=g(µ_x), that is the linearisation introduces a bias. This is always the case ifg(x) is not well approximated by its tangent within the region of dispersion of its random variables, as seen in Figure 3.1.

This bias will always inflate the estimated covariance matrix, while the truncation of the Taylor series could either increase or decrease the result [29].

Just how good or bad this approximation is within a given regionkx−x0k²≤∆x can be computed by calculating an upper bound on the remainder O²(k∆xk²).

Four different forms forx,g(x)^∈IRare given in [65], and these are easily extended tox∈IRⁿ, compare e. g. [22, p. 279]; in this thesis I will however only deal with first order approximations.

It is worth noting that, although the application of Equation (3.52) and in partic-ular Equation (3.53) can result in extremely complex expressions, generating and

68 Error Propagation

using these expressions is a purely mechanical task which can in theory⁵ be done by any computer algebra program. For our example of a lineℓ^∈IR³ though two pointsp₁,p₂^∈IR³this can be done as follows.

The Jacobian is thereforeJ_ℓ_p

i = (p₂_×,p₁_×). If we know that the two points are normally distributed with covariance matricesΣ_p

1,Σ_p

2 ∈IR^3×3we can calculate the line’s covariance matrix as

Σ_ℓ= (p₂_×,p₁_×) using Equation (3.53). The mean is the lineℓ itself. Note that the covariance matrices are of course all singular, since both points and lines only have 2 DOF each. This will be discussed in more detail in Section 4.

3.3.4 Implicit Functions

In practice, a result y₀ ^∈ IR^m is often not calculated by an explicit function, but rather found as the set of parameters which extremises some function of the original data, i. e. the function g(x) is not explicitly known (and its Jacobian therefore cannot be calculated as usual). What is known instead is a cost-function C(x,y) ^∈ IR which we are trying to minimise. The necessary condition for an extremum is ^∂C(x,y)_∂y |x0= 0, and theimplicit function theorem (compare [29, 43]) gives us the JacobianJ_y₀_x for anunknownfunctiony=g(x) as idea of Lagrange multipliers, we can extend the above even further to constrained minimisation, see [29, 43] for more details.

5I still have to meet a computer algebra program which producesC-code that will not happily divide by zero or take the square-root of a negative argument.

3.3.5 Monte-Carlo Simulations 69

Note that using Equation (3.53) a result’s uncertainty can be calculated even if the result itself hasnot been obtained as the solution to some optimal algorithm minimising both the error and uncertainty of a particular calculation (a problem which often has no closed form solution), but rather by some faster but less ac-curate algorithm. However, using a faster, closed form solution might introduce a considerable bias and blow upΣ_y^′, and the selection of a functiong(x) that is both fast and accurate enough can become somewhat of an artform. Section 4 is practising this art for a number of common computer-vision constructs, mainly the ones introduced in Section 2.

3.3.5 Monte-Carlo Simulations

A second method for the propagation of statistical properties which is completely different from the analytical method given above is the Monte Carlo simulation.

The basic idea is simple: given a function y = g(x) and a vector x (assumed perfectly known), a large number of corrupted vectors xi = x+ri is created, where the ri are distributed according to the measurement error’s pdf p∆x (as-sumed known). This large population is then used to estimate the pdfp_y of the samplesy_i = g(xi). In particular, the meanµ_y and covariance matrix Σ_y can be approximated by the sample meany and sample covariance matrix S_y using Equations (3.37) and (3.38). Note thatg(x) does not need to be given explicitly;

they_icould just as well have been found by minimisation or any other technique.

The quality of Monte Carlo simulation relies only on the number of samples — between 10,000 and 1,000,000 samples are not unusual — and the quality of the pseudo random number generator, which should have a period at least 10 times greater than the number of samples required [29].

The two methods — analytical first order error propagation and Monte Carlo simulation — complement each other. First order error propagation is a fast and

— for sufficiently small errors — reliable method which gives an analytical, closed form solution for py. However, the equations used can become unwieldy, and it relies on the goodness of the linear approximation (which could in theory be assessed by calculating an upper bound for the remainderO²(k∆xk²), compare Section 3.3.3).

Monte Carlo simulation, on the other hand, makes no assumptions on the function g(x) or resulting pdfpyand is easy to program; these advantages are offset by an extremely long execution time (several minutes or even hours). Also, the result of Monte Carlo simulation is just a high number of pointsy_iinstead of a closed-form probability density function.

70 χ²Testing

In practice, Monte Carlo simulation is therefore often used to assess the goodness of the analytical solution found by first order error propagation; either visually, plotting both the points found by Monte Carlo simulation as well as the confidence regions found by first order error propagation, or analytically, where for example theχ²distribution described in the next section is used to compute the probability that the pointsy_ifollow the distributionpy.

Im Dokument Error Propagation (Seite 64-70)