Remote Sensing of Atmospheric State Variables:

(1)

IMK-ASF-SAT

Remote Sensing of Atmospheric State Variables:

An Introduction

Thomas von Clarmann DATAR

CC BY-NC 4.0

(2)

At some places, direct measurements are

•  inconvenient

•  expensive

•  risky

•  technically unfeasible

(3)

Remote sensing often is more practicable and offers the bigger bang for the buck

(4)

Now I interrupt for a commercial …

(5)

Example:

MIPAS on ENVISAT:

Measurements of atmospheric

temperature and composition

(6)

Limb sounding

Infrared spectral region Emission Spectroscopy

Polar sun-synchronous LEO

(7)

MIPAS Biomass Burning

(8)

MIPAS Solar Proton Event

(9)

MIPAS Age of stratospheric air

(10)

Back to the regular programme:

Today:

  No details on atmospheric research topics,

  Instead: overarching, general, methodical thoughts

on remote sensing of the atmosphere

(11)

The Ancient Greek …

(12)

The Ancient Greek had no remote sensing projects as we have today,

  because they had no satellites…

(13)

The Ancient Greek had no remote sensing projects as we have today,

  because they had no satellites…

  because they did not know differential calculus…

(14)

The Ancient Greek had no remote sensing projects as we have today,

  because they had no satellites…

  because they did not know differential calculus…

  because Aristotle forgot to invent what we call

today “abduction”…

(15)

The Ancient Greek had no remote sensing projects as we have today,

  because they had no satellites…

  because they did not know differential calculus…

  because Aristotle forgot to invent what we call today “abduction”…

His concept of logic included only deduction and

induction

(16)

What is Abduction?

(as opposed to deduction and induction)

Fro Aristotle’s logic we know

  “Deduction”

(17)

Deduction

The conclusion where a general rule is applied to a particular (less general):

  We know a rule or a law or a theory;

  We know the antecedent (input data; initial condition, etc);

  With this we infer (“deduce”) a particular statement.

(18)

Deduction: Example in the context of remote sensing of the atmosphere

  We know the radiative transfer equation;

  We know the atmospheric state (pressure, temperature, composition);

  With this we calculate the radiance field.

In remote sensing, the solution of the so-called forward problem is deductive. The solution thus is unambiguous.

(19)

What is Abduction?

(as opposed to deduction and induction)

From Aristotle’s logic we know   “Deduction”

  “Induction”

(20)

Induction

The inference of a general law from singular observations;

  We know the resultant (output);

  We know the antecedent (input);

  From these we infer (“induce”) the law which

connects the antecedent with the resultant.

(21)

Induction: Example in the context of remote sensing of the

  We measure the radiance field;

  We know the atmospheric state (pressure, temperature, composition);

  With this we infer the laws of radiative transfer.

(22)

Induction: Example in the context of remote sensing of the

  We measure the radiance field;

  We know the atmospheric state (pressure, temperature, composition);

  With this we infer the laws of radiative transfer.

In 1739 David Hume showed that induction is

logically not conclusive!

(23)

What about the third scheme of inference?

  We know the (causal) law.

  We know the resultant.

  We infer the antecedent.

(24)

KIT-IMK-ASF-SAT 24

What about the third scheme of inference?

  We know the (causal) law.

  We know the resultant.

  We infer the antecedent.

For Example:

  We know the radiative transfer equation;

  We know the measured radiances;

  From this we infer the state of the atmosphere

(temperature, composition etc).

(25)

  This scheme of inference was first discussed systematically

by C.S. Peirce (1839-1914) and called “abduction”.

  This scheme of inference is the

basis of remote sensing!

(26)

Indirect measurements: we do not measure the composition of the atmosphere but we measure radiances, transmittances, or similar. We infer the composition of the atmosphere

(27)

From radiances to the atmospheric state:

•  We have a forward model y=F(x);

(28)

•  We have a forward model y=F(x);=F(x)

•  We linearize this: F(x) = y

₀

+K(x-x

₀

);

(29)

•  We have a forward model y=F(x);=F(x)

•  We linearize this: F(x) = y

₀

+K(x-x

₀

);

•  The pdf of y for a given x

pdf(y) = c exp -½(y-F(x))

^T

S

_y^-1

(y-F(x))

(30)

•  We have a forward model y=F(x);=F(x)

•  We linearize this: F(x) = y

₀

+K(x-x

₀

);

•  The pdf of y for a given x

pdf(y|x) = c exp -½(y-F(x))

^T

S

_y^-1

(y-F(x))

We assume a Gaussian distribution of measurement errors

(31)

•  We have a forward model y=F(x);=F(x)

•  We linearize this: F(x) = y

₀

+K(x-x

₀

);

•  The pdf of y for a given x

pdf(y|x) = c exp -½(y-F(x))

^T

S

_y^-1

(y-F(x)) ≈ c exp -½(y- y

₀

-K(x-x

₀

))

^T

S

_y^-1

(y- y

₀

-K(x-x

₀

))

We use the linearization

(32)

•  We have a forward model y=F(x);=F(x)

•  We linearize this: F(x) = y

₀

+K(x-x

₀

);

•  The pdf of y for a given x

pdf(y|x) = c exp -½(y-F(x))

^T

S

_y^-1

(y-F(x)) ≈ c exp -½(y- y

₀

-K(x-x

₀

))

^T

S

_y^-1

(y- y

₀

-K(x-x

₀

))

•  The most plausible (=“most likely”) x

maximizes this for a given y

(33)

• 

We have a forward model y=F(x);=F(x)

•  We linearize this: F(x) = y₀+K(x-x₀);

•  The pdf of y for a given x

•  pdf(y|x) = c exp -½(y-F(x))^TS_y^-1(y-F(x)) ≈

•  ≈ c exp -½(y- y₀-K(x-x₀)) ^TS_y^-1(y- y₀-K(x-x₀))

•  The most plausible (=“most likely”) x maximizes this for a given y

•  We minimize

½(y- y

₀

-K(x-x

₀

))

^T

S

_y^-1

(y- y

₀

-K(x-x

₀

))

By setting the derivative of this function zero

(34)

• 

We have a forward model y=F(x);=F(x)

•  We linearize this: F(x) = y₀+K(x-x₀);

•  The pdf of y for a given x

•  pdf(y|x) = c exp -½(y-F(x))^TS_y^-1(y-F(x)) ≈

•  ≈ c exp -½(y- y₀-K(x-x₀)) ^TS_y^-1(y- y₀-K(x-x₀))

•  The most plausible (=“most likely”) x maximizes this for a given y

•  We minimize

½(y- y₀-K(x-x₀)) ^TS_y^-1(y- y₀-K(x-x₀))

•   We solve this to get x

(35)

…the “maximum likelihood solution”

of the inverse problem:

x

_ml

= x

₀

+ (K

^T

S

_y^-1

K)

^-1

K

^T

S

_y^-1

(y-F(x

₀

))

(36)

…the “maximum likelihood solution”

of the inverse problem:

x

_ml

= x

₀

+ (K

^T

S

_y^-1

K)

^-1

K

^T

S

_y^-1

(y-F(x

₀

))

R.A. Fisher, 1912

If Fisher had read C.F. Gauss (1809) he would have noticed that an almost identical

approach had long been

known!

(37)

…the “maximum likelihood solution of the inverse problem:

x

_ml

= x

₀

+ (K

^T

S

_y^-1

K)

^-1

K

^T

S

_y^-1

(y-F(x

₀

))

x_ml best ml-estimate of the true state x x₀ initial guess of x

K Jacobian matrix of the partial derivatives ∂y/∂x S_yCovariance matrix of measurement errors

y measured radiances

F radiative transfer model

(38)

KIT-IMK-ASF-SAT 38

…the “maximum likelihood solution”

of the invers problem

x

_ml

= x

₀

+ (K

^T

S

_y^-1

K)

^-1

K

^T

S

_y^-1

(y-F(x

₀

))

If : the measurement errors follow a Gaussian distribution as characterized S_y,

F is correct and linear, and

y (along with F and K) is the only information we have,

Then: x_ml is the most likely state of the atmosphere.

(39)

But:…

…the inversion of (K^TS_y^-1K) could cause problems, because

(40)

But:…

•  there might be more unknowns than measured values;

(41)

But:…

•  the system of equations might be (almost) linearly dependent;

(42)

But:…

•  measurement errors might be too large;

(43)

But:…

•  the signal might be too weak.

(44)

But:…

•  In other words: We might not have enough information.

(45)

But:…

•  In other words: We might not have enough information.

Attention:

(46)

(47)

What have you seen?

(48)

What have you seen?

  A white butterfly ... but the details???

  Perhaps a

- bath white (pontia daplidice, ^{Linn. 1758})

(49)

  A white butterfly ... but the details???

  Perhaps a

- black-veined moth (siona lineata, Scopoli 1763)

(50)

  A white butterfly … but the details ???

  Perhaps a

- black-veined moth (siona lineata, Scopoli 1763) - apollo (parnassius mnemosyne, ^{Linn. 1758})

(51)

  Perhaps a

- black-veined white (aporia crataegi, ^{Linn. 1758})

(52)

  Perhaps a

- black-veined white (aporia crataegi, ^{Linn. 1758}) - wood white (leptidea sinapis, Linn. 1758)

(53)

  Perhaps a

- green veined white (pieris napi, ^{Linn. 1758})

(54)

  Perhaps a

- small cabbage white (pieris rapae, Linnaeus 1758)

(55)

  Perhaps a

- small cabbage white (pieris rapae, Linnaeus 1758)

- large cabbage white (pieris brassicae, Linnaeus 1758)

(56)

Bayesian Statistics:

Thomas Bayes 1701-1761

(57)

Bayesian Statistics

The most frequent butterfly in central Europe is the small cabbage white.;

(58)

If I use this scheme of inference very often, most times my result will be correct…

…but sometimes

(59)

…the method fails!

(60)

This time it was a

female

orange tip (anthocharis

cardamines, Linnaeus

1758)

(61)

Back to remote sensing:

  Often the pure measurements are insufficient.

  We use a priori information, e.g. climatological data.

  The best estimate (the most probable posterior state

estimate) is the weighted mean of observations and a priori information, both weighted with their inverse covariance

matrix.

(62)

Maximum a posteriori estimates:

aka “optimal estimation”

x_map best Bayesian estimate of state variable x xa a priori information on state variable x

S_aa priori covariance matrix Maximize:

pdf =c₁ exp-[½(y-F(x)^TS_y^-1(y-F(x)] c₂ exp -[½(x-x_a)^TS_a^-1(x-x_a)]

= c₃ exp-[½(y-F(x)^TS_y^-1(y-F(x) + (x-x_a)^TS_a^-1(x-x_a) ] x_map = x_a+ (K^TS_y^-1K + S_a^-1)^-1K^TS_y^-1(y-F(x_a))

(c.f. e.g. Rodgers 2000)

If the true state is part of the ensemble used to build the climatology (x_a, S_a), then x_map is the most probable

(63)

Using this estimator, the estimation error will be minimal in the long run…

… but

(64)

“In the long run we are all dead”

(John Meynard Keynes, 1883-1946)

(65)

Using this estimator, the estimation error will be minimal in the long run…

… but

(66)

But sometimes it fails!

(E.g. I could miss the ozone hole, when it appears for the first time and is not part of my a priori

climatology.)

(67)

Sometimes I may be wrong!

( e.g. I might miss the ozone hole when it appears the first time since it’s then not included in the a priori climatology)

Can we avoid to include so much prior information in

the data?

(68)

Back to the butterflies:

What a butterfly is the right one?

Observed characteristics might not be sufficient to determine which species it is.

But we observe it together with a male “orange tip”.

So, isn’t it reasonable to believe that this is a female “orange tip”?

Note: This conclusion does not need an a priori frequency

(69)

We can apply the same rationale to retrieval theory:

  If it is hot at one point in the atmosphere, it is very unlikely that it is very cold in the vicinity of this point.

  If there is much ozone at one altitude, it is very

unlikely that there is little ozone at a similar altitude.

minimize differences of values at

adjacent points

(70)

We need to know, how our estimate

depends on the prior information. There Is a difference

•  “I believe to have seen a small cabbage white because the butterfly has been white, and small cabbage whites are very”

and

•  ”I have seen a small cabbage white. The black pattern on the wing tips are unambiguous

indication.”

(71)

We define:

A = (∂x_map/∂x_wahr)

(72)

We define:

A = (∂x_map/∂x_wahr) From this follows:

I-A = (∂x_map/∂x_a)

(73)

We define:

I-A = (∂x_map/∂x_a) We use:

x_map = x_a+ (K^TS_y^-1K + S_a^-1)^-1K^TS_y^-1(y-F(x_a))

(74)

We define:

I-A = (∂x_map/∂x_a) We use:

x_map = x_a+ (K^TS_y^-1K + S_a^-1)^-1K^TS_y^-1(y-F(x_a)) And we calculate the derivative:

A = (K^TS_y^-1K + S_a^-1)^-1K^TS_y^-1K

(75)

We can now characterize our estimate:

(76)

We can now characterize our estimate:

  How depends the estimate on the true state?

(77)

We can now characterize our estimate:

A = (K^TS_y^-1K + S_a^-1)^-1K^TS_y^-1K

x_map = Ax + (I-A) x_a

(78)

We can now characterize our estimate:

A = (K^TS_y^-1K + S_a^-1)^-1K^TS_y^-1

  Columns: Response of the estimate to a delta perturbation of the true vertical profile;

(79)

We can now characterize our estimate:

A = (K^TS_y^-1K + S_a^-1)^-1K^TS_y^-1

  Rows: Weights which control how the true values at the different altitudes contribute to the estimate.

(80)

KIT-IMK-ASF-SAT 80

We can now characterize our estimate:

A = (K^TS_y^-1K + S_a^-1)^-1K^TS_y^-1

  Rows: Weights which control how the true values at the different altitudes contribute to the estimate.

  A is the averaging kernel matrix

(81)

:

  To compare our estimates (= remote sensing results)

which contain a priori information with independent data, we might wish to remove the a priori information from the estimate.

(82)

:

  This is not possible!

(83)

:

  This is not possible

  Instead we have to …

(84)

:

  Instead we have to …

…figure out a different strategy!

(85)

Cannot the influence of the a priori information just be made part of the error bar?

The fraction of a priori information is:

(86)

Cannot the influence of the a priori information just be made part of the error bar?

Climatological variability of x around x_ais S_a

(87)

Cannot the influence of the a priori information just be made part of the error bar?

Climatological variability of x around x_ais S_a Gaussian error estimation yields:

S_smoothing = (I-A)^T S_a (I-A) (c.f. Rodgers 2000)

(88)

Cannot the influence of the a priori information just be made part of the error bar?

Climatological variability of x around x_ais S_a Gaussian error estimation yields:

S_smoothing = (I-A)^T S_a (I-A) (c.f. Rodgers 2000)

Thus it seems that the influence of the a priori information can be included in the error budget.

(89)

Attention: there is a trap!

  Transformation of x_map onto a finer grid gives:

x_fine = Wx_map

(90)

Attention: there is a trap!

  Transformation of x_map onto a finer grid gives:

x_fine = Wx_map

  If S_smoothing is really, in its essence, an error covariance matrix, then Gaussian error propagation must hold:

Ssmoothing,fine = WS_smoothingW^T

(91)

But

^:

The smoothing error calculated on the fine grid

Ssmoothing,fine = (I_fine-A_fine)^T S_a,fine (I_fine-A_fine)

(92)

But

^:

Ssmoothing,fine = (I_fine-A_fine)^T S_a,fine (I_fine-A_fine) Is much larger that that we get by Gaussian error propagation :

Ssmoothing,fine = WS_smoothingW^T

(93)

But

^:

Ssmoothing,fine = (I_fine-A_fine)^T S_a,fine (I_fine-A_fine) Is much larger that that we get by Gaussian error propagation :

Ssmoothing,fine = WS_smoothingW^T Something is wrong!

(94)

von Clarmann, AMT, 2014

(95)

Reductio ad absurdum:

  Is there something wrong with Gaussian error propagation?

No!

(96)

Reductio ad absurdum:

No!

  Is interpolation too non-linear to justify Gaussian error propagation? No! It is exactly linear.

(97)

Reductio ad absurdum:

No!

  Is interpolation too non-linear to justify Gaussian error propagation? No! It is exactly linear.

  Thus S_smoothingcannot be an error covariance matrix; the smoothing error cannot be just included in the error bar!

(98)

Why is this?

  An error we understand is the difference between an estimate and the true value, or its statistical estimate.

(99)

Why is this?

  An error we understand is the difference between an estimate and the true value.

S_smoothingrepresents the difference between the estimate

and a representation of the truth on some finite grid (but not the truth).

(100)

Why is this?

  An error we understand is the difference between an estimate and the true value.

S_smoothingrepresents the difference between the estimate

and a representation of the truth on some finite grid (but not the truth).

  This subtle difference is the source of the problem!

(101)

My personal consequence:

S_smoothing = (I-A)^T S_a (I-A)

(102)

:

(103)

:

  This is not possible!

(104)

:

  We cannot just put the related uncertainty into the error bar

(105)

:

  We cannot just put the related uncertainty into the error bar   Instead we can apply our a priori information to the

independent data

x_degraded = Ax_reference + (I-A) x_a

(106)

:

  We cannot just put the related uncertainty into the error bar   Instead we can apply our a priori information to the

independent data

x_degraded = Ax_reference + (I-A) x_a

With this the independent data are “seen with the eyes of our remote sensing systems” and the data become comparable.

(107)

Is this important?

(108)

Jackman et al. ACP 2008

(109)

(110)

Remotely sensed data are often presented in fancy figures …

MIPAS C₂H₆ at 275 hPa, autumn 2003

(111)

…but for quantitative work you cannot avoid this tedious algebra.

(112)

Summary:

  Depending on the target quantity to be remotely measured, different measurement systems (geometry, frequency range, emission vs.

absorption, platform) can be adequate.

(113)

Summary:

  To get the state of the atmosphere from the measurements, the inverse solution of the radiative transfer equation is sought.

(114)

Summary:

  Maximum likelihood solutions are often instable because the pure measurement information is insufficient.

(115)

Summary:

  Retrievals using prior information are more stable but the content of priori information can be misleading.

(116)

Summary:

  The averaging kernel matrix helps to understand which fraction of the retrieval is measurement information and which is a priori assumption.

(117)

Summary:

  The averaging kernel matrix helps to understand which fraction of the retrieval is measurement information and which is a priori assumption.

  Application of the averaging kernel matrix to high resolution profiles makes them comparable to low resolution profiles.

(118)

…but for quantitative work you cannot avoid this tedious algebra.

As a compensation for the torture with the matrices I have some more butterflies for you…

(119)

(120)

(121)

Picture credits: yAstronaut Envisat Aristoteles David Hume C.S. Peirce R. A. Fischer Aurorafalter Thomas Bayes Kleiner Kohlweiβling Baumweiβling Senfweissling Karstweiβling Rapsweiβling Zitronenfalter Groβer Kohlweiβling Roter Apollofalter Schachbrettfalter Aurorafalter John Maynard Keynes Schwalbenschwanz Admiral Widderchen Pfauenauge Schachbrettfalter Hauhechel-Bläuling

DLR, CC-By 3.0 ESA

Public domain Public domain Public domain Public domain

Teun Spaans, CC By-SA 3.0 Public domain

Darkone CC SA 2.5 Generic Olaf Leillinger CC BY SA 2.0

Friedrich Böhringer, CC SA 2.5 Gen.

Public domain

Jörg Hempel, CC 2.0 Deutschland Richard Bartz CC SA 2.5 Generic Quartl CC BY-SA 3.0

Kristian Peters GNU 1.2

Leviathan1983 CC BY-SA 3.0

Jean-Pierre Hamon CC BY-SA-3.0 Public Domain

Jean-Pierre Hamon CC BY-SA-3.0 Samashy GNU 1.2

Bernd Haynold, CC BY-SA 2.5 Jörg Hempel, CC BY-SA 2.0-de Michael Apel, CC BY-SA 2.5 Luc Viatour CC BY-SA 3.0

Remote Sensing of Atmospheric State Variables: