NOT FOR QUOTATION WITHOUT PERMISSION OF
THE
AUTHORSDMX ASPECXS OF MODEL TUNING PROCEDURE:
I r n r n T I O N - T H E O r n C A N A L ,
April
1 9 8 5 I*-35-24Working Papers a r e interim reports on work of t h e International Institute for Applied Systems Analysis a n d have received only limited review. Views or opinions expressed h e r e i n do n o t necessariljr r e p r e s e n t those of t h e Institute or of i t s National Member Organizations.
1NTERNATlONA.L INSTITUTE FOR APPLIED SYSTEMS ANALYSIS 2361 Laxenburg, Austria.
I
would like t o t h a n k S u s a n n e Stock for h e r help in preparing t h e manuscript.S O D
ASPECTS
OF MODELTUNING
PROCEDURE:I N F O R M A T I O N - m o m c
ANALYSrS
1.
LNTRODUCTIONComputer or mathematical models a r e not exact representation of reality: lack of knowledge, technical restrictions and particular modeling goals make i t necessary t o approximate the real system in various ways. Nevertheless, t h e procedures by which t h e models a r e adjusted t o observed data a r e often based on t h e assumptions t h a t the real system h a s t h e same s t r u c t u r e as t h e model and differs only in t h e values of cer- t a i n parameters. These particular values usually should be included in t h e feasible s e t of t h e parameter values, and t h i s fact, together with some additional conditions, usually provides t h e convergence property for many individual algorithms
[I].
However, in reality all of t h e s e assumptions a r e generally false. Even if the s t r u c - t u r e of t h e system corresponds t o t h e s t r u c t u r e of t h e model, t h e real parameters values often do not belong to t h e presupposed feasible set. Moreover, mathematicians often consciously diminish this s e t in order t o simplify t h e estimation algorithms. For i n s t a n c e they approximate t h e bounded compact s e t of p a r a m e t e r values by a s e t con- sisting of a finite number of points, t h u s increasing t h e c h a n c e s t h a t t h e real parame- t e r values will be excluded.
It is t h e r e f o r e both remarkable a n d surprising t o find t h a t despite t h e s e false assumptions a n d approximations, t h e p a r a m e t e r estimation algorithms often still con- verge! The model r e s u l t i n g from t h i s t u n i n g procedure will of c o u r s e n o t coincide with t h e real system. a n d t h i s r i s e s t h e n a t u r a l question: how f a r i s t h i s c o m p u t e r model from reality?
When considering t h i s question it i s n e c e s s a r y to have some way of measuring t h e distance between individual models. One of m e a s u r e of divergence w a s i n t r o d u c e d by B h a t t a c h a r y a [Z] ; Kullback [3] also formulated s o m e m e a s u r e of information distance.
However t h e s e m e a s u r e s were not p r o p e r metrics. Baram a n d Sandell [4] l a t e r i n t r o - d u c e d a modified version of Kullback measure, which have been shown t o be a proper distance metric. They applied t h i s approach t o l i n e a r Gaussian s y s t e m s a n d models; in t h i s paper i t is generalized t o a wider class of systems.
2. NOTATIONS
AND
DEFINITZONAssume t h a t t h e variety of models of t h e r e a l system may be c h a r a c t e r i z e d by a p a r a m e t e r
B
, which takes values from t h e p a r a m e t e r s e tB.
In view of Bayesian for- mulation of t h e problem, we will assumefi
t o be a random variable defined on some probabilistic space(Q.
H , P ).
Lettn
(w),nrO be some random p r o c e s s (observation) adapted t o some nondecreasing family of u-algebrasH =
( H , ) , ~ , H..= H
inR .
We shall denote by?i = (i?,)nM, H,
-=
t h e family of o-algebras g e n e r a t e d by t h e pro- c e s s [, .n > 0 , wherei s a-algebra i n
Q
g e n e r a t e d by t h e processtt
u p t o time t.
In t h e c a s e of c o n t i n u o u s time observation process
t t ,
t r O we assume t h e non- decreasing r i g h t continuous family of o-algebrasH =
( H t ) t M t o be given, where H,= H
a n d Ho is completed by P-zero s e t s fromH .
We also i n t r o d u c e t h e family of a-algebrasH = ( p t
)tro , whereIf
t h e s e t of t h e p a r a m e t e r values i s finite o r denumerable we will denote by .rrj(n) , ( o r n , ( t ) ) t h e a p o s t e r i o r i probabilities of e v e n t s18 = 13,
j , j E B given obser- vationst,.,
k<
n ,(t,,
u<
t ).For any A
EH
,z EB
we d e n o t e byP
( A ) , t h e family of probability m e a s u r e sLet P , f ( A ) , P ( A ) , z E B , n
r
0 be t h e r e s t r i c t i o n s of t h e P ( A ) on a-algebras??,
, respectively. Assume also t h a t for a n y z.y EB
we haveFz - - Pz.
Define
z . v
as a Radon-Nicodim derivative
a n d l e t
a,l.v
=
&Z.V(&Z,Y,)--l.I t
i s easy t o s e e t h a t if t h e ~ - l - c o n d i t i o n a l distributions ofcn,
n r 0 have densities f = ( zI an-1).
z EB
t h e n3.
SOME BAYESIAN P -
TIO ON ALGORlTHMBefore deriving o u r main r e s u l t s , we will first consider some Bayesian p a r a m e t e r estimation algorithms for d i f f e r e n t observation schemes.
a ) Assume t h a t
tn,n r
0 i s given by t h e formulawhere dn satisfies t h e r e c u r s i v e s t o c h a s t i c equation
Here E ~ , , . E ~ ~ , n 2 0 a r e t h e sequences of independent Gaussian random variables with zero mean and variance equal to one, and
p
is an unknown parameter. Assuming t h a t /3 takes its values from some finite s e tBk = 1p1,p2, .
,.
,p k ]
t h e aposteriori probabili- ties a r ewhere
4
a r e Kalman estimates of+,
givenf p = pi
j and D j ( n ) a r e functions of t h e conditional variance y j ( n ) [ 5 ]b) Consider t h e continuous (in time) observation process
tt
given by the sto- chastic differential equationwhere Wt,
t
r 0 is t h e H-adapted Wiener process,p
is an unknown parameter and Ct is H-adapted positive function. Assuming again t h a t the number of parameter values is finite, we have for rrj(t)= P(p = pi
IRt) [ 6 ] .where
c) Consider a n observation made by a c o n t i n u o u s - s t a t e jumping p r o c e s s with unk- nown t r a n s i t i o n i n t e n s i t i e s h t j
.
Once again assuming a finite n u m b e r of values for/I
we have t h e following e q u a t i o n s for a posteriori probability
ni
( t ) [7 ]where
The necessary a n d sufficient conditions of convergence with probability one for a posteriori probabilities t o r e s p e c t i v e indicators were given i n t h e p a p e r s [I, 8.91 in t e r m s of absolute c o n t i n u i t y a n d singularity of some special families of probability dis- tributions. Papers d e m o n s t r a t e d t h e applications of t h e g e n e r a l t h e o r y to various par- t i c u l a r forms of t h e random processes.
One of t h e c e n t r a l places in t h e proof of t h e main convergence r e s u l t in [5,9, I.]
was the relation between a posteriori probabilities a n d likelihood r a t i o in t h e c a s e of denumerable or finite n u m b e r of t h e p a r a m e t e r values. More exactly t h e following lemma i s t r u e :
L e m m a 1 . Let for a n y i
=
j and n 2 o m e a s u r e-
is e q u i v a l e n t t o the meas- u r e , m e n - a . s . the n e z t e q u a l i t y is t r u e :The proof of t h i s lemma follows from t h e definition of t h e likelihood ratio
G.j.
The equality (1) yields t h a t
According t o t h e p a p e r s [ I , 8,9] t h i s property g u a r a n t e e s t h e following r e s u l t of con- vergence: (remind t h a t we still deal with t h e c a s e when t h e p a r a m e t e r value corresponding t o t h e r e a l system belongs t o t h e feasible s e t of t h e p a r a m e t e r values
B).
- . - .
Theorem
1. Let f o r a n y i=
j.
n 2 0,P A - P i .
Then the condition1
isequivalent to the condition
l i m n j ( n )
= I(#I =
#Ij), P-a.sn -.-
The proof of t h i s t h e o r e m i s based on t h e p r o p e r t y t h a t singularity s e t for t h e m e a s u r e s a n d
@
coincide @-a. s. with t h e s e t!#I = #Ij].
If t h e r e a l p a r a m e t e r value
#Ik
does n o t belong t o t h e feasible set variables n i ( n ) ,i EB
c a l c u l a t e d in s e c t i o n 3 a r e already n o t t h e a p o s t e r i o r i p r o b a b i l i t i e s , b u t some functionals of t h e observable processt, .
Taking t h e m a s a p o s t e r i o r i probabilities, t h e observer expects t o g e t t h e conver- g e n c e o n e of n i ( n ) ,i E B (say nio(n)) t o 1 a n d i n t e r p r e t t h i s r e s u l t a s if t h e r e a l p a r a m e t e r value is equal t o io
.
However t h i s is a c t u a l l y a false conclusion. The ques- t i o n s which a r i s e i n t h i s r e l a t i o n are: When does t h e convergency f a c t for some of t h e n i ( n ) , i EB
really t a k e place? What does i t mean when nio(n) t e n d s t o 1 for some i, EB
? I n o r d e r t o answer t h e s e questions we n e e d some auxiliary results.Assume t h a t t h e r e a l s y s t e m corresponds t o a p a r a m e t e r value k s u c h t h a t k E B
.
Introduce t h e function g ( z , y )
= I$
In aE.v [4] a n d define t h e m e a s u r e of d i s t a n c eLemma
2. Function d , , ( i . j ) ispseudo-metric. m a t is, the following e q d i t i e s hold:& ( ~ s k )
+ & ( k ~ y ) 2& ( Z P Y )
The proof of t h i s lemma is done in [4].Lemma
3. fir a n y z , y EB,
n r 0 w e haveG(2.y)
2 0.Proof.
From t h e definition of t h ec ( z ,
y )C ( ~ - Y ) =
Ez(ln<.*I
pn-1)= Ez
(&(Ina,"JI Bn-l)) = ~ , ( q ( @ ( a , " J ) I irn-l)
where
$ ( t ) = t
Int .
According t o t h e t h e o r e m of t h e m e a n ,( a )
c a n be r e p r e s e n t e d a s follows:where 6E.y varies between
a,Z.'
a n d1.
I t is n o t difficult t o s e e t h a t1 (a,"." - 1)2
q (@(Gmy) I %-,I = +( a: 1 Bn-l)
2o
Lemma4.
Let &(k,z) Id,(k,y) .
m e nI,"(z,y)
2 0Proof.
From t h e definition of t h eC ( z , y ) ,
we c a n writeg ( z , y ) =
In a,Z.'= E~
Inf Z ( t n I H , - ~ ) - E~
Infy(t, IHn-l)
From Lemma 3 for any z
EB
Ek
ln a,"." 2 0a n d t h u s
5.
RESULTS
Assume t h a t t h e process In a,Z*" is ergodic, i.e.,
Theorem2. If
d ( k , z )>
d ( k , y ) thenz*"
0,P-a.s.
lf
i t is k n o w n thatz-'
4 0P - a s .
, t h e nProof. Note t h a t from Lemma 4, t h e inequality d ( k , z )
>
d ( k , y ) yields P ( z , y )<
0 a n d c o n s e q u e n t l y1 "
lim
-
In a%'<
0P-a.s.
n - n , = 1
This means t h a t
a n d consequently
t h u s proving t h e first p a r t of t h e theorem.
In o r d e r t o prove t h e second p a r t of t h e t h e o r e m we a s s u m e t h a t
z-'
-r 0 b u tt h a t d ( k ,z )
<
d ( k ,y ).
This yieldsfrom which
a n d t h e theorem is proved by contradiction.
Example. Assume t h a t t h e sequence
#,
is a finite s t a t e ergodic Markov chain on --.
any of t h e probability s p a c e s
(R,H,Pa),
i E B ,where B is a finite set. Let p f m , 1 , m= l,k
be t h e transition probabilities for one step.I t
is n o t difficult t o find (see also [8 ] ) t h a t a i j is given by the formulaWell known results from t h e Markov c h a i n theory (see
[ l o
] for instance)she;
t h a t t h e process In cxA.j is ergodic. Thus if t h e Bayesian algorithm for ~ ( n ) converges to 1 for some particular j o i t means t h a t t h i s j o is the point from B t h a t is t h e n e a r e s t (in t h e sense of information distance d ( k ,z) ) to the real parameter value k.
REFERENCES
1. AI. Yashin, Bayesian Approach To Parameter Estimation: Conuergence Analysis, WP-8367, International 'Institute For Applied Systems Analysis , Laxenburg, Aus- t r i a (July 1983).
2. A. Bhattacharya, "On Measure Of Divergence Between Two Statistical Populations Defined By Probability Distributions," h i l e t i n . Calcutta Mathematical Society 35, pp.99-104 (1943).
3. S. Kullback, h z f o m a t i o n Zheory And Statistics, Wiley, New York (1959).
4.
Y.
Baram a n dN.R.
Sandell , "An Information Theoretic Approach To Dynamical Sys- tem Modeling And Identification," IEEE Transactions Automatic Control AC-23(1), pp. 61-66 (1978).5. N.M. Kuznetsov, A.V. Lubkov, a n d A.I. Yashin, "About Consistency Of Bayesian Estimates In Adaptive Kalman Filtration Scheme ," A u t o m a t i c a n d R e m o t e Control
(transLated f r o m R u s s i a n ) ( 4 ) , pp.47-56 (1981).
6. R.S. Liptzer a n d AN. Shiryaev, S t a t i s t i c s of R a n d o m f i o c e s s e s , Springer-Verlag, Berlin a n d New York (1978).
7 . A.I. Yashin, "Filtering of Jumping Processes," A u t o m a t i c a n d R e m o t e Control 5, pp.52-58 (1970).
8. A.I. Yashin, "Sostoyatelnost Bayesovskich Otcenok Parametrov (Consistency of Bayesian P a r a m e t e r Estimates)," R o b l e m i P e r e d a c h i h f o m a c i i ( i n Russian )(I), pp.62-72 (1981).
9.
N.M.
Kuznetsov a n d k I . Yashin, "On t h e Conditions of t h e Identifiability of P a r - tially Observed Systems," Docladi A k a d e m i i N a u k SSSR ( i n Russian) 259(4), pp.790-793 (1981).10. S. Karlin, A First Course I n S t o c h a s t i c P r o c e s s e s , Academic P r e s s , New York a n d London ( 1 968).