• Keine Ergebnisse gefunden

Some Aspects of Model Tuning Procedure: Information-Theoretic Analysis

N/A
N/A
Protected

Academic year: 2022

Aktie "Some Aspects of Model Tuning Procedure: Information-Theoretic Analysis"

Copied!
12
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

NOT FOR QUOTATION WITHOUT PERMISSION OF

THE

AUTHOR

SDMX ASPECXS OF MODEL TUNING PROCEDURE:

I r n r n T I O N - T H E O r n C A N A L ,

April

1 9 8 5 I*-35-24

Working Papers a r e interim reports on work of t h e International Institute for Applied Systems Analysis a n d have received only limited review. Views or opinions expressed h e r e i n do n o t necessariljr r e p r e s e n t those of t h e Institute or of i t s National Member Organizations.

1NTERNATlONA.L INSTITUTE FOR APPLIED SYSTEMS ANALYSIS 2361 Laxenburg, Austria.

(2)

I

would like t o t h a n k S u s a n n e Stock for h e r help in preparing t h e manuscript.

(3)

S O D

ASPECTS

OF MODEL

TUNING

PROCEDURE:

I N F O R M A T I O N - m o m c

ANALYSrS

1.

LNTRODUCTION

Computer or mathematical models a r e not exact representation of reality: lack of knowledge, technical restrictions and particular modeling goals make i t necessary t o approximate the real system in various ways. Nevertheless, t h e procedures by which t h e models a r e adjusted t o observed data a r e often based on t h e assumptions t h a t the real system h a s t h e same s t r u c t u r e as t h e model and differs only in t h e values of cer- t a i n parameters. These particular values usually should be included in t h e feasible s e t of t h e parameter values, and t h i s fact, together with some additional conditions, usually provides t h e convergence property for many individual algorithms

[I].

However, in reality all of t h e s e assumptions a r e generally false. Even if the s t r u c - t u r e of t h e system corresponds t o t h e s t r u c t u r e of t h e model, t h e real parameters values often do not belong to t h e presupposed feasible set. Moreover, mathematicians often consciously diminish this s e t in order t o simplify t h e estimation algorithms. For i n s t a n c e they approximate t h e bounded compact s e t of p a r a m e t e r values by a s e t con- sisting of a finite number of points, t h u s increasing t h e c h a n c e s t h a t t h e real parame- t e r values will be excluded.

(4)

It is t h e r e f o r e both remarkable a n d surprising t o find t h a t despite t h e s e false assumptions a n d approximations, t h e p a r a m e t e r estimation algorithms often still con- verge! The model r e s u l t i n g from t h i s t u n i n g procedure will of c o u r s e n o t coincide with t h e real system. a n d t h i s r i s e s t h e n a t u r a l question: how f a r i s t h i s c o m p u t e r model from reality?

When considering t h i s question it i s n e c e s s a r y to have some way of measuring t h e distance between individual models. One of m e a s u r e of divergence w a s i n t r o d u c e d by B h a t t a c h a r y a [Z] ; Kullback [3] also formulated s o m e m e a s u r e of information distance.

However t h e s e m e a s u r e s were not p r o p e r metrics. Baram a n d Sandell [4] l a t e r i n t r o - d u c e d a modified version of Kullback measure, which have been shown t o be a proper distance metric. They applied t h i s approach t o l i n e a r Gaussian s y s t e m s a n d models; in t h i s paper i t is generalized t o a wider class of systems.

2. NOTATIONS

AND

DEFINITZON

Assume t h a t t h e variety of models of t h e r e a l system may be c h a r a c t e r i z e d by a p a r a m e t e r

B

, which takes values from t h e p a r a m e t e r s e t

B.

In view of Bayesian for- mulation of t h e problem, we will assume

fi

t o be a random variable defined on some probabilistic space

(Q.

H , P )

.

Let

tn

(w),nrO be some random p r o c e s s (observation) adapted t o some nondecreasing family of u-algebras

H =

( H , ) , ~ , H..

= H

in

R .

We shall denote by

?i = (i?,)nM, H,

-

=

t h e family of o-algebras g e n e r a t e d by t h e pro- c e s s [, .n > 0 , where

i s a-algebra i n

Q

g e n e r a t e d by t h e process

tt

u p t o time t

.

In t h e c a s e of c o n t i n u o u s time observation process

t t ,

t r O we assume t h e non- decreasing r i g h t continuous family of o-algebras

H =

( H t ) t M t o be given, where H,

= H

a n d Ho is completed by P-zero s e t s from

H .

We also i n t r o d u c e t h e family of a-algebras

H = ( p t

)tro , where

(5)

If

t h e s e t of t h e p a r a m e t e r values i s finite o r denumerable we will denote by .rrj(n) , ( o r n , ( t ) ) t h e a p o s t e r i o r i probabilities of e v e n t s

18 = 13,

j , j E B given obser- vations

t,.,

k

<

n ,

(t,,

u

<

t ).

For any A

EH

,z E

B

we d e n o t e by

P

( A ) , t h e family of probability m e a s u r e s

Let P , f ( A ) , P ( A ) , z E B , n

r

0 be t h e r e s t r i c t i o n s of t h e P ( A ) on a-algebras

??,

, respectively. Assume also t h a t for a n y z.y E

B

we have

Fz -

-

Pz.

Define

z . v

as a Radon-Nicodim derivative

a n d l e t

a,l.v

=

&Z.V(&Z,Y,)--l.

I t

i s easy t o s e e t h a t if t h e ~ - l - c o n d i t i o n a l distributions of

cn,

n r 0 have densities f = ( z

I an-1).

z E

B

t h e n

3.

SOME BAYESIAN P -

TIO ON ALGORlTHM

Before deriving o u r main r e s u l t s , we will first consider some Bayesian p a r a m e t e r estimation algorithms for d i f f e r e n t observation schemes.

a ) Assume t h a t

tn,n r

0 i s given by t h e formula

where dn satisfies t h e r e c u r s i v e s t o c h a s t i c equation

(6)

Here E ~ , , . E ~ ~ , n 2 0 a r e t h e sequences of independent Gaussian random variables with zero mean and variance equal to one, and

p

is an unknown parameter. Assuming t h a t /3 takes its values from some finite s e t

Bk = 1p1,p2, .

,

.

,

p k ]

t h e aposteriori probabili- ties a r e

where

4

a r e Kalman estimates of

+,

given

f p = pi

j and D j ( n ) a r e functions of t h e conditional variance y j ( n ) [ 5 ]

b) Consider t h e continuous (in time) observation process

tt

given by the sto- chastic differential equation

where Wt,

t

r 0 is t h e H-adapted Wiener process,

p

is an unknown parameter and Ct is H-adapted positive function. Assuming again t h a t the number of parameter values is finite, we have for rrj(t)

= P(p = pi

IRt) [ 6 ] .

where

(7)

c) Consider a n observation made by a c o n t i n u o u s - s t a t e jumping p r o c e s s with unk- nown t r a n s i t i o n i n t e n s i t i e s h t j

.

Once again assuming a finite n u m b e r of values for

/I

we have t h e following e q u a t i o n s for a posteriori probability

ni

( t ) [7 ]

where

The necessary a n d sufficient conditions of convergence with probability one for a posteriori probabilities t o r e s p e c t i v e indicators were given i n t h e p a p e r s [I, 8.91 in t e r m s of absolute c o n t i n u i t y a n d singularity of some special families of probability dis- tributions. Papers d e m o n s t r a t e d t h e applications of t h e g e n e r a l t h e o r y to various par- t i c u l a r forms of t h e random processes.

One of t h e c e n t r a l places in t h e proof of t h e main convergence r e s u l t in [5,9, I.]

was the relation between a posteriori probabilities a n d likelihood r a t i o in t h e c a s e of denumerable or finite n u m b e r of t h e p a r a m e t e r values. More exactly t h e following lemma i s t r u e :

L e m m a 1 . Let for a n y i

=

j and n 2 o m e a s u r e

-

is e q u i v a l e n t t o the meas- u r e , m e n - a . s . the n e z t e q u a l i t y is t r u e :

The proof of t h i s lemma follows from t h e definition of t h e likelihood ratio

G.j.

The equality (1) yields t h a t

(8)

According t o t h e p a p e r s [ I , 8,9] t h i s property g u a r a n t e e s t h e following r e s u l t of con- vergence: (remind t h a t we still deal with t h e c a s e when t h e p a r a m e t e r value corresponding t o t h e r e a l system belongs t o t h e feasible s e t of t h e p a r a m e t e r values

B).

- . - .

Theorem

1. Let f o r a n y i

=

j

.

n 2 0,

P A - P i .

Then the condition

1

is

equivalent to the condition

l i m n j ( n )

= I(#I =

#Ij), P-a.s

n -.-

The proof of t h i s t h e o r e m i s based on t h e p r o p e r t y t h a t singularity s e t for t h e m e a s u r e s a n d

@

coincide @-a. s. with t h e s e t

!#I = #Ij].

If t h e r e a l p a r a m e t e r value

#Ik

does n o t belong t o t h e feasible set variables n i ( n ) ,i E

B

c a l c u l a t e d in s e c t i o n 3 a r e already n o t t h e a p o s t e r i o r i p r o b a b i l i t i e s , b u t some functionals of t h e observable process

t, .

Taking t h e m a s a p o s t e r i o r i probabilities, t h e observer expects t o g e t t h e conver- g e n c e o n e of n i ( n ) ,i E B (say nio(n)) t o 1 a n d i n t e r p r e t t h i s r e s u l t a s if t h e r e a l p a r a m e t e r value is equal t o io

.

However t h i s is a c t u a l l y a false conclusion. The ques- t i o n s which a r i s e i n t h i s r e l a t i o n are: When does t h e convergency f a c t for some of t h e n i ( n ) , i E

B

really t a k e place? What does i t mean when nio(n) t e n d s t o 1 for some i, E

B

? I n o r d e r t o answer t h e s e questions we n e e d some auxiliary results.

Assume t h a t t h e r e a l s y s t e m corresponds t o a p a r a m e t e r value k s u c h t h a t k E B

.

Introduce t h e function g ( z , y )

= I$

In aE.v [4] a n d define t h e m e a s u r e of d i s t a n c e

Lemma

2. Function d , , ( i . j ) ispseudo-metric. m a t is, the following e q d i t i e s hold:

(9)

& ( ~ s k )

+ & ( k ~ y ) 2

& ( Z P Y )

The proof of t h i s lemma is done in [4].

Lemma

3. fir a n y z , y E

B,

n r 0 w e have

G(2.y)

2 0.

Proof.

From t h e definition of t h e

c ( z ,

y )

C ( ~ - Y ) =

Ez(ln

<.*I

pn-1)

= Ez

(&(In

a,"JI Bn-l)) = ~ , ( q ( @ ( a , " J ) I irn-l)

where

$ ( t ) = t

In

t .

According t o t h e t h e o r e m of t h e m e a n ,

( a )

c a n be r e p r e s e n t e d a s follows:

where 6E.y varies between

a,Z.'

a n d

1.

I t is n o t difficult t o s e e t h a t

1 (a,"." - 1)2

q (@(Gmy) I %-,I = +( a: 1 Bn-l)

2

o

Lemma4.

Let &(k,z) I

d,(k,y) .

m e n

I,"(z,y)

2 0

Proof.

From t h e definition of t h e

C ( z , y ) ,

we c a n write

g ( z , y ) =

In a,Z.'

= E~

In

f Z ( t n I H , - ~ ) - E~

In

fy(t, IHn-l)

From Lemma 3 for any z

EB

Ek

ln a,"." 2 0

(10)

a n d t h u s

5.

RESULTS

Assume t h a t t h e process In a,Z*" is ergodic, i.e.,

Theorem2. If

d ( k , z )

>

d ( k , y ) then

z*"

0,

P-a.s.

lf

i t is k n o w n that

z-'

4 0

P - a s .

, t h e n

Proof. Note t h a t from Lemma 4, t h e inequality d ( k , z )

>

d ( k , y ) yields P ( z , y )

<

0 a n d c o n s e q u e n t l y

1 "

lim

-

In a%'

<

0

P-a.s.

n - n , = 1

This means t h a t

a n d consequently

t h u s proving t h e first p a r t of t h e theorem.

In o r d e r t o prove t h e second p a r t of t h e t h e o r e m we a s s u m e t h a t

z-'

-r 0 b u t

t h a t d ( k ,z )

<

d ( k ,y )

.

This yields

(11)

from which

a n d t h e theorem is proved by contradiction.

Example. Assume t h a t t h e sequence

#,

is a finite s t a t e ergodic Markov chain on -

-.

any of t h e probability s p a c e s

(R,H,Pa),

i E B ,where B is a finite set. Let p f m , 1 , m

= l,k

be t h e transition probabilities for one step.

I t

is n o t difficult t o find (see also [8 ] ) t h a t a i j is given by the formula

Well known results from t h e Markov c h a i n theory (see

[ l o

] for instance)

she;

t h a t t h e process In cxA.j is ergodic. Thus if t h e Bayesian algorithm for ~ ( n ) converges to 1 for some particular j o i t means t h a t t h i s j o is the point from B t h a t is t h e n e a r e s t (in t h e sense of information distance d ( k ,z) ) to the real parameter value k

.

REFERENCES

1. AI. Yashin, Bayesian Approach To Parameter Estimation: Conuergence Analysis, WP-8367, International 'Institute For Applied Systems Analysis , Laxenburg, Aus- t r i a (July 1983).

2. A. Bhattacharya, "On Measure Of Divergence Between Two Statistical Populations Defined By Probability Distributions," h i l e t i n . Calcutta Mathematical Society 35, pp.99-104 (1943).

3. S. Kullback, h z f o m a t i o n Zheory And Statistics, Wiley, New York (1959).

4.

Y.

Baram a n d

N.R.

Sandell , "An Information Theoretic Approach To Dynamical Sys- tem Modeling And Identification," IEEE Transactions Automatic Control AC-23(1), pp. 61-66 (1978).

(12)

5. N.M. Kuznetsov, A.V. Lubkov, a n d A.I. Yashin, "About Consistency Of Bayesian Estimates In Adaptive Kalman Filtration Scheme ," A u t o m a t i c a n d R e m o t e Control

(transLated f r o m R u s s i a n ) ( 4 ) , pp.47-56 (1981).

6. R.S. Liptzer a n d AN. Shiryaev, S t a t i s t i c s of R a n d o m f i o c e s s e s , Springer-Verlag, Berlin a n d New York (1978).

7 . A.I. Yashin, "Filtering of Jumping Processes," A u t o m a t i c a n d R e m o t e Control 5, pp.52-58 (1970).

8. A.I. Yashin, "Sostoyatelnost Bayesovskich Otcenok Parametrov (Consistency of Bayesian P a r a m e t e r Estimates)," R o b l e m i P e r e d a c h i h f o m a c i i ( i n Russian )(I), pp.62-72 (1981).

9.

N.M.

Kuznetsov a n d k I . Yashin, "On t h e Conditions of t h e Identifiability of P a r - tially Observed Systems," Docladi A k a d e m i i N a u k SSSR ( i n Russian) 259(4), pp.790-793 (1981).

10. S. Karlin, A First Course I n S t o c h a s t i c P r o c e s s e s , Academic P r e s s , New York a n d London ( 1 968).

Referenzen

ÄHNLICHE DOKUMENTE

c) bei verbundenen Etagenheizungen und Warmwasserversorgungsanlagen entsprechend Nummer 4 Buchstabe d und entsprechend Nummer 2, soweit sie nicht dort

Octob er an Fried erike lehrt , sich sofort in Fried erike ganz v

The remit of the R+D project envisages for Sub-project 1 the development of an integrated catalogue of goals with quantified quality targets for sustainable communal and

For Weyl groups, singular Soergel modules can be realized as intersection cohomology of Schubert varieties in a partial flag variety, hence the Hodge theory in this case can be

An dieser Stelle muss auch noch einmal festgehalten werden, dass das Lehrwerk sich ausdrücklich an „lerngewohnte Erwachsene, akademisch geprägte Lerner[Innen],

Farben 3 Farbvarianten Material echtes Rinderfell Herstellung Handarbeit und Maßanfertigung mit höchstem Qualitäts- anspruch Oberseitengestaltung Patchwork mit

Nachdem Sie ein Programm mit einer Reihe von Befehlen ausgeführt haben, senden wir die Ausgangssignale an verschiedene Geräte, zum Beispiel: eine LED, einen Motor, einen Summer

(Planck, 1948, S. 29f.) kann man die Physik durchaus als sehr schematische Wissenschaft sehen. Ihre Struktur gleicht annähernd einem Kochbuchrezept, welches Schritt