IMK-ASF-SAT
Remote Sensing of Atmospheric State Variables:
An Introduction
Thomas von Clarmann DATAR
CC BY-NC 4.0
At some places, direct measurements are
• inconvenient
• expensive
• risky
• technically unfeasible
Remote sensing often is more practicable and offers the bigger bang for the buck
Now I interrupt for a commercial …
Example:
MIPAS on ENVISAT:
Measurements of atmospheric
temperature and composition
Limb sounding
Infrared spectral region Emission Spectroscopy
Polar sun-synchronous LEO
MIPAS Biomass Burning
MIPAS Solar Proton Event
MIPAS Age of stratospheric air
Back to the regular programme:
Today:
No details on atmospheric research topics,
Instead: overarching, general, methodical thoughts
on remote sensing of the atmosphere
The Ancient Greek …
The Ancient Greek had no remote sensing projects as we have today,
because they had no satellites…
The Ancient Greek had no remote sensing projects as we have today,
because they had no satellites…
because they did not know differential calculus…
The Ancient Greek had no remote sensing projects as we have today,
because they had no satellites…
because they did not know differential calculus…
because Aristotle forgot to invent what we call
today “abduction”…
The Ancient Greek had no remote sensing projects as we have today,
because they had no satellites…
because they did not know differential calculus…
because Aristotle forgot to invent what we call today “abduction”…
His concept of logic included only deduction and
induction
What is Abduction?
(as opposed to deduction and induction)
Fro Aristotle’s logic we know
“Deduction”
Deduction
The conclusion where a general rule is applied to a particular (less general):
We know a rule or a law or a theory;
We know the antecedent (input data; initial condition, etc);
With this we infer (“deduce”) a particular statement.
Deduction: Example in the context of remote sensing of the atmosphere
We know the radiative transfer equation;
We know the atmospheric state (pressure, temperature, composition);
With this we calculate the radiance field.
In remote sensing, the solution of the so-called forward problem is deductive. The solution thus is unambiguous.
What is Abduction?
(as opposed to deduction and induction)
From Aristotle’s logic we know “Deduction”
“Induction”
Induction
The inference of a general law from singular observations;
We know the resultant (output);
We know the antecedent (input);
From these we infer (“induce”) the law which
connects the antecedent with the resultant.
Induction: Example in the context of remote sensing of the
We measure the radiance field;
We know the atmospheric state (pressure, temperature, composition);
With this we infer the laws of radiative transfer.
Induction: Example in the context of remote sensing of the
We measure the radiance field;
We know the atmospheric state (pressure, temperature, composition);
With this we infer the laws of radiative transfer.
In 1739 David Hume showed that induction is
logically not conclusive!
What about the third scheme of inference?
We know the (causal) law.
We know the resultant.
We infer the antecedent.
KIT-IMK-ASF-SAT 24
What about the third scheme of inference?
We know the (causal) law.
We know the resultant.
We infer the antecedent.
For Example:
We know the radiative transfer equation;
We know the measured radiances;
From this we infer the state of the atmosphere
(temperature, composition etc).
This scheme of inference was first discussed systematically
by C.S. Peirce (1839-1914) and called “abduction”.
This scheme of inference is the
basis of remote sensing!
Indirect measurements: we do not measure the composition of the atmosphere but we measure radiances, transmittances, or similar. We infer the composition of the atmosphere
From radiances to the atmospheric state:
• We have a forward model y=F(x);
From radiances to the atmospheric state:
• We have a forward model y=F(x);=F(x)
• We linearize this: F(x) = y
0+K(x-x
0);
From radiances to the atmospheric state:
• We have a forward model y=F(x);=F(x)
• We linearize this: F(x) = y
0+K(x-x
0);
• The pdf of y for a given x
pdf(y) = c exp -½(y-F(x))
TS
y-1(y-F(x))
From radiances to the atmospheric state:
• We have a forward model y=F(x);=F(x)
• We linearize this: F(x) = y
0+K(x-x
0);
• The pdf of y for a given x
pdf(y|x) = c exp -½(y-F(x))
TS
y-1(y-F(x))
We assume a Gaussian distribution of measurement errors
From radiances to the atmospheric state:
• We have a forward model y=F(x);=F(x)
• We linearize this: F(x) = y
0+K(x-x
0);
• The pdf of y for a given x
pdf(y|x) = c exp -½(y-F(x))
TS
y-1(y-F(x)) ≈ c exp -½(y- y
0-K(x-x
0))
TS
y-1(y- y
0-K(x-x
0))
We use the linearization
From radiances to the atmospheric state:
• We have a forward model y=F(x);=F(x)
• We linearize this: F(x) = y
0+K(x-x
0);
• The pdf of y for a given x
pdf(y|x) = c exp -½(y-F(x))
TS
y-1(y-F(x)) ≈ c exp -½(y- y
0-K(x-x
0))
TS
y-1(y- y
0-K(x-x
0))
• The most plausible (=“most likely”) x
maximizes this for a given y
From radiances to the atmospheric state:
•
We have a forward model y=F(x);=F(x)• We linearize this: F(x) = y0+K(x-x0);
• The pdf of y for a given x
• pdf(y|x) = c exp -½(y-F(x))TSy-1(y-F(x)) ≈
• ≈ c exp -½(y- y0-K(x-x0)) TSy-1(y- y0-K(x-x0))
• The most plausible (=“most likely”) x maximizes this for a given y
• We minimize
½(y- y
0-K(x-x
0))
TS
y-1(y- y
0-K(x-x
0))
By setting the derivative of this function zero
From radiances to the atmospheric state:
•
We have a forward model y=F(x);=F(x)• We linearize this: F(x) = y0+K(x-x0);
• The pdf of y for a given x
• pdf(y|x) = c exp -½(y-F(x))TSy-1(y-F(x)) ≈
• ≈ c exp -½(y- y0-K(x-x0)) TSy-1(y- y0-K(x-x0))
• The most plausible (=“most likely”) x maximizes this for a given y
• We minimize
½(y- y0-K(x-x0)) TSy-1(y- y0-K(x-x0))
• We solve this to get x
…the “maximum likelihood solution”
of the inverse problem:
x
ml= x
0+ (K
TS
y-1K)
-1K
TS
y-1(y-F(x
0))
…the “maximum likelihood solution”
of the inverse problem:
x
ml= x
0+ (K
TS
y-1K)
-1K
TS
y-1(y-F(x
0))
R.A. Fisher, 1912
If Fisher had read C.F. Gauss (1809) he would have noticed that an almost identical
approach had long been
known!
…the “maximum likelihood solution of the inverse problem:
x
ml= x
0+ (K
TS
y-1K)
-1K
TS
y-1(y-F(x
0))
xml best ml-estimate of the true state x x0 initial guess of x
K Jacobian matrix of the partial derivatives ∂y/∂x Sy Covariance matrix of measurement errors
y measured radiances
F radiative transfer model
KIT-IMK-ASF-SAT 38
…the “maximum likelihood solution”
of the invers problem
x
ml= x
0+ (K
TS
y-1K)
-1K
TS
y-1(y-F(x
0))
If : the measurement errors follow a Gaussian distribution as characterized Sy,
F is correct and linear, and
y (along with F and K) is the only information we have,
Then: xml is the most likely state of the atmosphere.
But:…
…the inversion of (KTSy-1K) could cause problems, because
But:…
…the inversion of (KTSy-1K) could cause problems, because
• there might be more unknowns than measured values;
But:…
…the inversion of (KTSy-1K) could cause problems, because
• there might be more unknowns than measured values;
• the system of equations might be (almost) linearly dependent;
But:…
…the inversion of (KTSy-1K) could cause problems, because
• there might be more unknowns than measured values;
• the system of equations might be (almost) linearly dependent;
• measurement errors might be too large;
But:…
…the inversion of (KTSy-1K) could cause problems, because
• there might be more unknowns than measured values;
• the system of equations might be (almost) linearly dependent;
• measurement errors might be too large;
• the signal might be too weak.
But:…
…the inversion of (KTSy-1K) could cause problems, because
• there might be more unknowns than measured values;
• the system of equations might be (almost) linearly dependent;
• measurement errors might be too large;
• the signal might be too weak.
• In other words: We might not have enough information.
But:…
…the inversion of (KTSy-1K) could cause problems, because
• there might be more unknowns than measured values;
• the system of equations might be (almost) linearly dependent;
• measurement errors might be too large;
• the signal might be too weak.
• In other words: We might not have enough information.
Attention:
What have you seen?
What have you seen?
A white butterfly ... but the details???
Perhaps a
- bath white (pontia daplidice, Linn. 1758)
What have you seen?
A white butterfly ... but the details???
Perhaps a
- bath white (pontia daplidice, Linn. 1758)
- black-veined moth (siona lineata, Scopoli 1763)
What have you seen?
A white butterfly … but the details ???
Perhaps a
- bath white (pontia daplidice, Linn. 1758)
- black-veined moth (siona lineata, Scopoli 1763) - apollo (parnassius mnemosyne, Linn. 1758)
What have you seen?
A white butterfly … but the details ???
Perhaps a
- bath white (pontia daplidice, Linn. 1758)
- black-veined moth (siona lineata, Scopoli 1763) - apollo (parnassius mnemosyne, Linn. 1758)
- black-veined white (aporia crataegi, Linn. 1758)
What have you seen?
A white butterfly … but the details ???
Perhaps a
- bath white (pontia daplidice, Linn. 1758)
- black-veined moth (siona lineata, Scopoli 1763) - apollo (parnassius mnemosyne, Linn. 1758)
- black-veined white (aporia crataegi, Linn. 1758) - wood white (leptidea sinapis, Linn. 1758)
What have you seen?
A white butterfly … but the details ???
Perhaps a
- bath white (pontia daplidice, Linn. 1758)
- black-veined moth (siona lineata, Scopoli 1763) - apollo (parnassius mnemosyne, Linn. 1758)
- black-veined white (aporia crataegi, Linn. 1758) - wood white (leptidea sinapis, Linn. 1758)
- green veined white (pieris napi, Linn. 1758)
What have you seen?
A white butterfly … but the details ???
Perhaps a
- bath white (pontia daplidice, Linn. 1758)
- black-veined moth (siona lineata, Scopoli 1763) - apollo (parnassius mnemosyne, Linn. 1758)
- black-veined white (aporia crataegi, Linn. 1758) - wood white (leptidea sinapis, Linn. 1758)
- green veined white (pieris napi, Linn. 1758)
- small cabbage white (pieris rapae, Linnaeus 1758)
What have you seen?
A white butterfly … but the details ???
Perhaps a
- bath white (pontia daplidice, Linn. 1758)
- black-veined moth (siona lineata, Scopoli 1763) - apollo (parnassius mnemosyne, Linn. 1758)
- black-veined white (aporia crataegi, Linn. 1758) - wood white (leptidea sinapis, Linn. 1758)
- green veined white (pieris napi, Linn. 1758)
- small cabbage white (pieris rapae, Linnaeus 1758)
- large cabbage white (pieris brassicae, Linnaeus 1758)
Bayesian Statistics:
Thomas Bayes 1701-1761
Bayesian Statistics
The most frequent butterfly in central Europe is the small cabbage white.;
If I use this scheme of inference very often, most times my result will be correct…
…but sometimes
…the method fails!
This time it was a
female
orange tip (anthocharis
cardamines, Linnaeus
1758)
Back to remote sensing:
Often the pure measurements are insufficient.
We use a priori information, e.g. climatological data.
The best estimate (the most probable posterior state
estimate) is the weighted mean of observations and a priori information, both weighted with their inverse covariance
matrix.
Maximum a posteriori estimates:
aka “optimal estimation”
xmap best Bayesian estimate of state variable x xa a priori information on state variable x
Sa a priori covariance matrix Maximize:
pdf =c1 exp-[½(y-F(x)TSy-1(y-F(x)] c2 exp -[½(x-xa)TSa-1(x-xa)]
= c3 exp-[½(y-F(x)TSy-1(y-F(x) + (x-xa)TSa-1(x-xa) ] xmap = xa + (KTSy-1K + Sa-1)-1KTSy-1(y-F(xa))
(c.f. e.g. Rodgers 2000)
If the true state is part of the ensemble used to build the climatology (xa, Sa), then xmap is the most probable
Using this estimator, the estimation error will be minimal in the long run…
… but
“In the long run we are all dead”
(John Meynard Keynes, 1883-1946)
Using this estimator, the estimation error will be minimal in the long run…
… but
But sometimes it fails!
(E.g. I could miss the ozone hole, when it appears for the first time and is not part of my a priori
climatology.)
Sometimes I may be wrong!
( e.g. I might miss the ozone hole when it appears the first time since it’s then not included in the a priori climatology)
Can we avoid to include so much prior information in
the data?
Back to the butterflies:
What a butterfly is the right one?
Observed characteristics might not be sufficient to determine which species it is.
But we observe it together with a male “orange tip”.
So, isn’t it reasonable to believe that this is a female “orange tip”?
Note: This conclusion does not need an a priori frequency
We can apply the same rationale to retrieval theory:
If it is hot at one point in the atmosphere, it is very unlikely that it is very cold in the vicinity of this point.
If there is much ozone at one altitude, it is very
unlikely that there is little ozone at a similar altitude.
minimize differences of values at
adjacent points
We need to know, how our estimate
depends on the prior information. There Is a difference
• “I believe to have seen a small cabbage white because the butterfly has been white, and small cabbage whites are very”
and
• ”I have seen a small cabbage white. The black pattern on the wing tips are unambiguous
indication.”
We define:
A = (∂xmap/∂xwahr)
We define:
A = (∂xmap/∂xwahr) From this follows:
I-A = (∂xmap/∂xa)
We define:
A = (∂xmap/∂xwahr) From this follows:
I-A = (∂xmap/∂xa) We use:
xmap = xa + (KTSy-1K + Sa-1)-1KTSy-1(y-F(xa))
We define:
A = (∂xmap/∂xwahr) From this follows:
I-A = (∂xmap/∂xa) We use:
xmap = xa + (KTSy-1K + Sa-1)-1KTSy-1(y-F(xa)) And we calculate the derivative:
A = (KTSy-1K + Sa-1)-1KTSy-1K
We can now characterize our estimate:
We can now characterize our estimate:
How depends the estimate on the true state?
We can now characterize our estimate:
How depends the estimate on the true state?
A = (KTSy-1K + Sa-1)-1KTSy-1K
xmap = Ax + (I-A) xa
We can now characterize our estimate:
How depends the estimate on the true state?
A = (KTSy-1K + Sa-1)-1KTSy-1
xmap = Ax + (I-A) xa
Columns: Response of the estimate to a delta perturbation of the true vertical profile;
We can now characterize our estimate:
How depends the estimate on the true state?
A = (KTSy-1K + Sa-1)-1KTSy-1
xmap = Ax + (I-A) xa
Columns: Response of the estimate to a delta perturbation of the true vertical profile;
Rows: Weights which control how the true values at the different altitudes contribute to the estimate.
KIT-IMK-ASF-SAT 80
We can now characterize our estimate:
How depends the estimate on the true state?
A = (KTSy-1K + Sa-1)-1KTSy-1
xmap = Ax + (I-A) xa
Columns: Response of the estimate to a delta perturbation of the true vertical profile;
Rows: Weights which control how the true values at the different altitudes contribute to the estimate.
A is the averaging kernel matrix
:
To compare our estimates (= remote sensing results)
which contain a priori information with independent data, we might wish to remove the a priori information from the estimate.
:
To compare our estimates (= remote sensing results)
which contain a priori information with independent data, we might wish to remove the a priori information from the estimate.
This is not possible!
:
To compare our estimates (= remote sensing results)
which contain a priori information with independent data, we might wish to remove the a priori information from the estimate.
This is not possible
Instead we have to …
:
To compare our estimates (= remote sensing results)
which contain a priori information with independent data, we might wish to remove the a priori information from the estimate.
This is not possible
Instead we have to …
…figure out a different strategy!
Cannot the influence of the a priori information just be made part of the error bar?
The fraction of a priori information is:
I-A = (∂xmap/∂xa)
Cannot the influence of the a priori information just be made part of the error bar?
The fraction of a priori information is:
I-A = (∂xmap/∂xa)
Climatological variability of x around xa is Sa
Cannot the influence of the a priori information just be made part of the error bar?
The fraction of a priori information is:
I-A = (∂xmap/∂xa)
Climatological variability of x around xa is Sa Gaussian error estimation yields:
Ssmoothing = (I-A)T Sa (I-A) (c.f. Rodgers 2000)
Cannot the influence of the a priori information just be made part of the error bar?
The fraction of a priori information is:
I-A = (∂xmap/∂xa)
Climatological variability of x around xa is Sa Gaussian error estimation yields:
Ssmoothing = (I-A)T Sa (I-A) (c.f. Rodgers 2000)
Thus it seems that the influence of the a priori information can be included in the error budget.
Attention: there is a trap!
Transformation of xmap onto a finer grid gives:
xfine = Wxmap
Attention: there is a trap!
Transformation of xmap onto a finer grid gives:
xfine = Wxmap
If Ssmoothing is really, in its essence, an error covariance matrix, then Gaussian error propagation must hold:
Ssmoothing,fine = WSsmoothing WT
But
:The smoothing error calculated on the fine grid
Ssmoothing,fine = (Ifine-Afine)T Sa,fine (Ifine-Afine)
But
:The smoothing error calculated on the fine grid
Ssmoothing,fine = (Ifine-Afine)T Sa,fine (Ifine-Afine) Is much larger that that we get by Gaussian error propagation :
Ssmoothing,fine = WSsmoothing WT
But
:The smoothing error calculated on the fine grid
Ssmoothing,fine = (Ifine-Afine)T Sa,fine (Ifine-Afine) Is much larger that that we get by Gaussian error propagation :
Ssmoothing,fine = WSsmoothing WT Something is wrong!
von Clarmann, AMT, 2014
Reductio ad absurdum:
Is there something wrong with Gaussian error propagation?
No!
Reductio ad absurdum:
Is there something wrong with Gaussian error propagation?
No!
Is interpolation too non-linear to justify Gaussian error propagation? No! It is exactly linear.
Reductio ad absurdum:
Is there something wrong with Gaussian error propagation?
No!
Is interpolation too non-linear to justify Gaussian error propagation? No! It is exactly linear.
Thus Ssmoothing cannot be an error covariance matrix; the smoothing error cannot be just included in the error bar!
Why is this?
An error we understand is the difference between an estimate and the true value, or its statistical estimate.
Why is this?
An error we understand is the difference between an estimate and the true value.
Ssmoothing represents the difference between the estimate
and a representation of the truth on some finite grid (but not the truth).
Why is this?
An error we understand is the difference between an estimate and the true value.
Ssmoothing represents the difference between the estimate
and a representation of the truth on some finite grid (but not the truth).
This subtle difference is the source of the problem!
My personal consequence:
Ssmoothing = (I-A)T Sa (I-A)
:
To compare our estimates (= remote sensing results)
which contain a priori information with independent data, we might wish to remove the a priori information from the estimate.
:
To compare our estimates (= remote sensing results)
which contain a priori information with independent data, we might wish to remove the a priori information from the estimate.
This is not possible!
:
To compare our estimates (= remote sensing results)
which contain a priori information with independent data, we might wish to remove the a priori information from the estimate.
This is not possible
We cannot just put the related uncertainty into the error bar
:
To compare our estimates (= remote sensing results)
which contain a priori information with independent data, we might wish to remove the a priori information from the estimate.
This is not possible
We cannot just put the related uncertainty into the error bar Instead we can apply our a priori information to the
independent data
xdegraded = Axreference + (I-A) xa
:
To compare our estimates (= remote sensing results)
which contain a priori information with independent data, we might wish to remove the a priori information from the estimate.
This is not possible
We cannot just put the related uncertainty into the error bar Instead we can apply our a priori information to the
independent data
xdegraded = Axreference + (I-A) xa
With this the independent data are “seen with the eyes of our remote sensing systems” and the data become comparable.
Is this important?
Jackman et al. ACP 2008
Remotely sensed data are often presented in fancy figures …
MIPAS C2H6 at 275 hPa, autumn 2003
…but for quantitative work you cannot avoid this tedious algebra.
Summary:
Depending on the target quantity to be remotely measured, different measurement systems (geometry, frequency range, emission vs.
absorption, platform) can be adequate.
Summary:
Depending on the target quantity to be remotely measured, different measurement systems (geometry, frequency range, emission vs.
absorption, platform) can be adequate.
To get the state of the atmosphere from the measurements, the inverse solution of the radiative transfer equation is sought.
Summary:
Depending on the target quantity to be remotely measured, different measurement systems (geometry, frequency range, emission vs.
absorption, platform) can be adequate.
To get the state of the atmosphere from the measurements, the inverse solution of the radiative transfer equation is sought.
Maximum likelihood solutions are often instable because the pure measurement information is insufficient.
Summary:
Depending on the target quantity to be remotely measured, different measurement systems (geometry, frequency range, emission vs.
absorption, platform) can be adequate.
To get the state of the atmosphere from the measurements, the inverse solution of the radiative transfer equation is sought.
Maximum likelihood solutions are often instable because the pure measurement information is insufficient.
Retrievals using prior information are more stable but the content of priori information can be misleading.
Summary:
Depending on the target quantity to be remotely measured, different measurement systems (geometry, frequency range, emission vs.
absorption, platform) can be adequate.
To get the state of the atmosphere from the measurements, the inverse solution of the radiative transfer equation is sought.
Maximum likelihood solutions are often instable because the pure measurement information is insufficient.
Retrievals using prior information are more stable but the content of priori information can be misleading.
The averaging kernel matrix helps to understand which fraction of the retrieval is measurement information and which is a priori assumption.
Summary:
Depending on the target quantity to be remotely measured, different measurement systems (geometry, frequency range, emission vs.
absorption, platform) can be adequate.
To get the state of the atmosphere from the measurements, the inverse solution of the radiative transfer equation is sought.
Maximum likelihood solutions are often instable because the pure measurement information is insufficient.
Retrievals using prior information are more stable but the content of priori information can be misleading.
The averaging kernel matrix helps to understand which fraction of the retrieval is measurement information and which is a priori assumption.
Application of the averaging kernel matrix to high resolution profiles makes them comparable to low resolution profiles.
…but for quantitative work you cannot avoid this tedious algebra.
As a compensation for the torture with the matrices I have some more butterflies for you…
Picture credits: yAstronaut Envisat Aristoteles David Hume C.S. Peirce R. A. Fischer Aurorafalter Thomas Bayes Kleiner Kohlweiβling Baumweiβling Senfweissling Karstweiβling Rapsweiβling Zitronenfalter Groβer Kohlweiβling Roter Apollofalter Schachbrettfalter Aurorafalter John Maynard Keynes Schwalbenschwanz Admiral Widderchen Pfauenauge Schachbrettfalter Hauhechel-Bläuling
DLR, CC-By 3.0 ESA
Public domain Public domain Public domain Public domain
Teun Spaans, CC By-SA 3.0 Public domain
Darkone CC SA 2.5 Generic Olaf Leillinger CC BY SA 2.0
Friedrich Böhringer, CC SA 2.5 Gen.
Public domain
Jörg Hempel, CC 2.0 Deutschland Richard Bartz CC SA 2.5 Generic Quartl CC BY-SA 3.0
Kristian Peters GNU 1.2
Leviathan1983 CC BY-SA 3.0
Jean-Pierre Hamon CC BY-SA-3.0 Public Domain
Jean-Pierre Hamon CC BY-SA-3.0 Samashy GNU 1.2
Bernd Haynold, CC BY-SA 2.5 Jörg Hempel, CC BY-SA 2.0-de Michael Apel, CC BY-SA 2.5 Luc Viatour CC BY-SA 3.0