Bayesian Approach to Parameter Estimation: Convergence Analysis

(1)

NOT FOR QUOTATION WITHOUT PERMISSION OF THE AUTHOR

BAYESIAN APPROACH TO PARAMETER ESTIMATION: CONVERGENCE ANALYSIS

Anatoli Yashin

Adaptive Resource Policies

July 1983 WP-83-67

Working Papers are interim reports on work of the International Institute for Applied Systems Analysis and have received only limited review. Views or opinions expressed herein do not necessarily repre- sent those of the Institute or of its National Member Organizations.

INTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS ANALYSIS

A - 2 3 6 1 Laxenburg, Austria

(2)

CONTENTS

INTRODUCTION PRELIMINARIES

THE STRONG CONSISTENCY PROPERTY IN THE CASE OF A DENUMERABLE SET OF PARAMETER VALUES

SOME PROPERTIES OF CONDITIONAL MEASURES THE PROCESS St, t20 AS A SEMIMARTINGALE LOCAL ABSOLUTE CONTINUITY AND SINGULARITY OF PROBABILISTIC MEASURES

CONSISTENCY CONDITIONS FOR BAYESIAN ESTIMATIONS WHEN OBSERVATIONS ARE SEMIMARTINGALES

EXAMPLES

THE UNCOUNTABLE SET OF PARAMETER VALUES CONCLUSION

REFERENCES

(3)

BAYESIAN APPROACH TO PARAMETER ESTIMATION: CONVERGENCE ANALYSIS

1. INTRODUCTION

In spite of evident success in the analysis of many aspects of natural phenomena, uncertainty is still one of the most important features of the relations between human beings and natural systems,

The absence of exact knowledge about the structures,

regularities, and peculiarities of system functions, the variety of unknown links between subsystems, errors of measurement, and sometimes the practical impossibility of measuring on the one hand, and the need to decide on appropriate control actions under incomplete information on the other, have prompted attempts to arrive at formal descriptions of uncertainties and to analyze their dynamic properties.

One of the most highly developed formal ways of dealing with the dynamic aspects of uncertainties is the theory of random

(4)

processes, but the practical application of this formal theory is often accompanied by long and informal procedures to identify when and how the basic assumptions and axioms of the theory may corre- spond to real situations. This aspect of the probabilistic method becomes especially important when we are dealing with statistical

inference or data-processing problems. The Bayesian approach to statistical inference provides a way of taking into account the informal experience and intuition of the person dealing with a particular problem.

A detailed discussion of the Bayesian approach can be found in Savage (1954), Edwards et al. (1963), Box and Tiao (1973), Lindley (1 974)

,

and Peterka (1 980)

.

Theoretical research in this

field has been stimulated mainly by the problems of estimation and control under incomplete information. The main conceptual difficulty in applying the Bayesian method is related to the interpre- tation of the a p r i o r i probability, although this difficulty is often overcome by using subjective measures of belief in a

"rationally and consistently reasoning person" (Peterka 1980).

With this approach statistical analysis becomes a part of the man-machine interaction procedure. Such a concept widens the scope of the implementation of probabilistic methods for many situations with uncertainties, and provides a rational basis for decision- making. In particular, it is often used to solve identification problems as a first step in adaptive policy design for large-scale systems.

By systematically applying the Bayesian approach it is possible to produce a consistent theory with a formal structure from systems

(5)

identification. For example, combining the Bayesian approach with the results and methods of the general theory of processes developed during the last decade has enabled abstract theoretical results of the martingale theory to be applied, and their best implementation in practice to be determined (see Meyer 1966, 1976;

Dellacherie 1972; Jacod 1979; Liptser and Shirjaev 1977; etc.).

One of the important characteristics of the Bayesian estimation procedure is its consistency property in parameter estimation, which often provides high-quality adaptive control algorithms.

Many papers have been devoted to convergence analysis of Bayesian estimators (see, for example, Xiefer and Wolfowitz 1956; Kraft

1955 ; Le Cam and Schwartz 1960 ; Wald 1949 ; Ljung 1978 ; Freedman 1963; Doob 1949; Le Cam 1953, 1958).

Sufficient conditions for the convergence of such estimation algorithms was the subject of a paper by Baram and Sandell (1978), which included various assumptions about the properties of the observation process, parameter sets, and the correlation of

parameters with the measuring process. Necessary and sufficient conditions of consistency of the parameter estimations for the diffusion observation process was the subject of a paper by Kitsul (1980)

.

The necessary and sufficient conditions of consistency for the discrete-time observation process and a denumerable set of parameter values were investigated in Yashin (1981). It turns out that the strong consistency property is often equivalent to the property of mutual singularity for some special family of probabilistic measures. The details of singularity conditions

(6)

given in Kabanov et al. (1978) make it possible to obtain the convenient conditions of convergence of the Bayesian estimation algorithm in adaptive filtration schemes (Kuznetzov and Yashin 1981; Kuznetzov et al. 1981). The main advantage of these conditions is that they can be checked before the measurement or observation process is begun.

The development of the results of Yashin (1981) will be twofold: first, a dissemination of the conditions in that paper on the wide class of continuous-time random observation processes;

and second, an investigation of the consistency property for an uncountable set of parameter values.

This paper is devoted to the investigation of both of these problems. It turns out that in the case of continuous-time

random processes, results similar to those of Yashin (1981) are true. However, the proof of the consistency property for an uncountable set of parameter values requires some additional conditions of restraint.

2. PRELIMINARIES

Let ( a , JC, H I P) be the probabilistic space, where H ⁼ (Kt) t10 is a nondecreasing right-continuous family of a-algebras JC t>O,

t'

-

JC C X , JC, =JC, and% is completed by the sets with a P-probability t -

equal to zero. Consider the Xo-measurable integrable random variable B ( w ) , which talres its values in some interval I of the real line. FJe will interpret B as the unknown,unobservable parameter of some dynamic system.

(7)

Let tt(w), tO' be a H-adapted continuous-time random process, taking its values in R m with right-continuous, and having left li-

- -

mits, sampling paths. Denote by = (Xt)t,O, where

Xt

= u> n ^t^o{tS,

-

s<ul. For simplicity we will use

3

to denote the o-algebra

Em.

The process tt(w) will be interpreted as an observation process or the results of the measurement of some system variables.

Definition I . The fi-adapted random process

it

^{(w) t?O,}

is said to be a consistent (strongly consistent) estimation of the random variable

B

if P lim

it

=

B

A t+m

(lim

Bt

=

B

P

-

^a.s.)

t+m

We will deal with the properties of the conditional mathematical expectations

Bt

⁼ E(BIXt) as an estimation of 8. The simple

-

necessary and sufficient conditions of consistency

Bt

⁼E ( B ( X ~ ) may be formulated in terms of the %-measurability of the random variable

B.

Theorem 1. The estimation

Bt

is strongly consistent

-

if and only if the random variable B(w) is X-measurable.

The proof of this theorem follows from the Levy theorem about the regular martingale asymptotic behavior and evident property of

%-measurable functions.

The relation between the consistency property of the Bayesian estimation of the random variable

B

and the same property of the arbitrary estimation may be determined by the following theorem.

Theorem 2. Let

it

be some arbitrary consistent H-adapted estimation of

B.

Then the estimation

-

Bt is strongly consistent.

(8)

Proof. According to the theorem's condition

A

P lim

Ot

= B, for any t>O, the random variable

A

-

Bt

is %-measurable; consequently

B

will also be

%-measurable. According to the Levy theorem lim

Bt

⁼^E(BIJ()^{P -}^a.s.

t+co

and consistency follows from ?-measurability of

B.

If there is no information on the convergence of some non-

-

Bayesian estimation, the proof of the X-measurability of f3 becomes more difficult. Fortunately there is another way of proving this property for B, as can be seen from the following.

Theorem 3. Let {f3"1, n = 0,1,2,.

. .

be the sequence of random variables such that P lim fin =

B.

^Then,

n+m

if the estimations of

Bn

are consistent for any n = 0,1,2,

...,

the Bayesian estimation of

B

^is

strongly consistent.

The proof of the theorem is the simple consequence of the property of the measurable functions.

Using the result of Theorem 3 we should concentrate our ef- forts on the findings of the appropriate sequence {Bn), n =

0.1 ,2,

. . . ,

establishing consistency properties for

En

_t⁼^{E (Bn}

1

^gt)

for any n = 0,1,2,

....

It is clear that the variables Bn(w), n = 0,1,.

. ^. ^,

^should

have a more simple structure than B(w). We will use as such

(9)

variables piecewise constant functions with denumerable sets of values. It is well known that for any random integrable variable ~ ( w ) there always exists a sequence of such random

integrable variables ( 6")

,

ⁿ⁼ ^{0,1 ,2,}

. . . .

The problem therefore is to prove the consistency property for all such

B

n

_,

n = 0,1,2

,....

We start with the investigation of this property for one random variable with a denumerable set of values in a continuous-time random observation process.

3. THE STRONG CONSISTENCY PROPERTY IN THE CASE OF A DENUMERABLE SET OF PARAMETER VALUES

Assume that parameter

P

takes the denumerable set of values

{ 6 , i E H and the o-algebra

Kt

is generated by o (6) and

kt

where o(P) is the a-algebra in R generated by 6, and H is some denumerable set.

Let Ptt Pt, and

-

be the restrictions of measure P on

- -

o-algebras JCt, JCtI and X, respectively. Define the probabilistic measures pi ( )

,

i E H on measurable space (R

,m

^using

P ( A N B

=

Bill

pi (A) ₌

Pi

where pi = P(B =

B i l l

i E H I L pi = ~ , A E X iEH

i -i

We will use Pt, Pt, and

pi

to denote the restrictions of measure P on o-algebras Kt, i

kt,

^and

k t

respectively, and and

XI

to denote the derivatives

-

( 5 ) and - ( 5 ) if they exist.

dF2 dFL

(10)

-

Denote by At(B) the random process, defined by

-

Let IT

.

^(t)= P (B =

B . 1%)

^{be the}a p o s t e r i o r i probability of the

I I

-

event { B = B.1, given Kt, and J O = -L pi in pi is the entropy of

I ³

L

the random variable 8.

The basic theorem about the consistency properties of the Bayesian estimation of 6 establishes the relationship between the following conditions:

(B) lim IT. (t) = I(B= Bj) P

-

^a.s.

,

^{j E H}

ttw J

t f Vk, j E H

,

t > O

-

(D)

p k M

^I Vk, j E N

Theorem 4. Let (C) be true. Then

(E)

*

^(Dl* ^(B)⁺⁺ ^(A)

Note that the analogs of Theorems 1-4 may be formulated for discrete-time stochastic processes 5 1110. We will qive

n here the formulation of Theorem 4 only.

(11)

L e t ( R , XI ^{H I ,} P ) b e t h e p r o b a b i l i s t i c s p a c e where H ' =

('In) n20 i s a n o n d e c r e a s i n g f a m i l y o f a - a l g e b r a s XIn, n'0.

3c1c 3c'

n n + l .

.. ²

^Y'

^,

^{a n d}

^x;)

i s c o m p l e t e d by t h e P - z e r o s e t s . Assume t h a t B ( w ) i s t h e 3cA-measurable random v a r i a b l e w h i c h t a k e s i t s v a l u e s i n some d e n u m e r a b l e s e t H. L e t

<,,

n 2 0 , b e a n H 1 - a d a p t e d d i s c r e t e - t i m e s t o c h a s t i c p r o c e s s t a k i n g i t s v a l u e s i n R m

.

^Denote

- '

by

i'

=

( x ' ~ ) ~ ~ ~ ~

^{where 3c} ⁼ ~ { < ~ ( w ) ,mgn}.

~ e t p n l p n l a n d

- P

b e t h e r e s t r i c t i o n s o f m e a s u r e P o n a -

a l g e b r a s XIn,

2'

a n d

%'=%'_,

r e s p e c t i v e l y . The n o t a t i o n s P:,

FA#

^5,.

^In,

^{a n d}n ⁿ

^.

( n ) a r e a s d e f i n e d a b o v e , w i t h t h e n a t u r a l 3

c h a n g i n g o f t h e i n d e x t t o i n d e x n . The c o n d i t i o n s ( A ) , ( B ) , ( C )

,

^{( D )}

,

a n d ( E l may b e r e w r i t t e n a s f o l l o w s :

( A ' ) l i m

B

⁼ 6 P - a . s . n-tm

B l i m n . ( n ) ⁼ I ( B = B . ) P - a . s .

,

j E H

n-tcx, 3 3

( D ) p k L ~ j

,

Vk, j E H

Theorem 4

' .

^{L e t} ^{( C}

'

) b e t r u e . Then

(12)

Theorem 4' becomes the simple corollary of Theorem 4 if we define o-algebras Kt, t>O, by

X

=

X

for nlt<n+l, n10 t n

To prove Theorem 4 some additional results will be useful.

4 . SOME PROPERTIES OF CONDITIONAL MEASURES

The next assertion establishes a remarkable property of absolute continuous probability distributions.

Lemma 1. Let (C) be true. Then the following assertions are true

(a) For

Bk

^{and ~j}

-

a.s. the following limits exist

;b) The measures pk, k E N have the Lebesque representations

(c) he following conditions are equivalent

(13)

The proof of the assertions of Lenuna 1 may be done in a similar way as in Kabanov et al. (1978)

,

taking into account the equivalence property of

pk

_~1

t' t -

-k

Lemma 2. Let (C) be true. Then measures Pt and

Ft

are equivalent and

Proof. The property P i << P follows from the definition of the measures pi ( . ) , i E N. The properties pt i << pt and

pi <<

Pt

t follow from the evident property pi << P. The definition of P (.)yieldsthe following formula for i

-

Let y (w) be an arbitrarily bounded X-measurable' function.

t

We will use Ek to denote the operation of mathematical expectation with respect to measure P k

.

^{We have}

The arbitrariness of yt yields equality ( I ) .

(14)

The a b s o l u t e c o n t i n u i t y o f

Ft

w i t h r e s p e c t t o Ptr -k k E N f o l l o w s from t h e e v i d e n t r e p r e s e n t a t i o n

I n d e e d , l e t k E N b e t h e a r b i t r a r y i n d e x and A E

Rt

b e s u c h t h a t p ( A ) k = 0. A c c o r d i n g t o c o n d i t i o n ( C ) f o r any o t h e r i n d e x

j E N ; PI ( A ) = 0 , a n d c o n s e q u e n t l y , a c c o r d i n g t o f o r m u l a ( 2 )

,

( A ) = 0 , t h u s c o m p l e t i n g t h e p r o o f .

Lemma 3 . L e t ^{( C )}b e t r u e . Then 3 - a . s . f o r k , jEN, k f j , t 2 0

and

P r o o p , From t h e d e f i n i t i o n o f t h e p r o c e s s e s

z:'

w e g e t

and f o r m u l a ( 3 ) i s t r u e .

-k

t ^'^EH I k # j y i e l d s The e q u i v a l e n c e o f t h e m e a s u r e s P a n d

IS:,

k 1 3

which y i e l d s ( 4 ) .

(15)

P r o o f o f T h e o r e m 4 .

(B)

*

^(D)

.

From (B) and the condition C ni (t) = 1 it iEH

follows that

Taking (5) into account we can get

-k kj = O l Using the Lebesque representation we get for P _{{ Z m}

[ part (b) of Lemma 'I1

,

and using (6) we get

Comparing (6) and (7) we get (D)

.

(D) (B)

.

^Let

^r

be the singularity set for measures k j

hk

and ~j such that

hk

(I' ) = 0

,

and consequently k j

P (Tkj) j = 1. Using the Lebesque representation of measure -k P (Tkj) we get

(16)

~t follows from (8) that

;j (I!* = 0) = 1

,

^k,^{j E H} ^kf j

Lebesque respresentation of the measure ~ ~ ( t k j = 0 ) yields

k j -k

Consequently, the

{Zw

= 0 } coincide with the

rk

P and $-a. s

.

It follows from ( 5) and (9) that for any k, j E HI k f j n,(t)

lim n. (t) = O

,

t ? ~ J

.rrk (t)

-

lim --- I ( @ = B . ) = 0, P-a.s.

,

^k,^{j E H}

,

^kf j ttw 'j (t) I

Property (B) follows from the condition

B E It follows from (2) and (B) that the following is true

Summing both parts of (11) over i yields

Averaging both parts of (12) over P yields

- k

E i n A,(B) = E Z I(@ = 6.) I n p = E l i m E [ - I

(B

=

Bi)

I n p i ] (13)

ia

¹ ⁱ ^{k t m}i=l

(17)

k S i n c e t h e v a r i a b l e s J =

k ^L

[-

^I^{( 6}⁼^Bi) ^{I n p} i n c r e a s e

i = l

il

m o n o t o n i c a l l y a s k g r o w s , i t i s p o s s i b l e t o c h a n g e t h e

o r d e r s o f i n t e g r a t i o n a n d t o g o t o t h e l i m i t i n ( 1 3 ) . T h i s y i e l d s ( E ) b e c a u s e

~ -

E 1 n T m ( B ) =

-

l i m 6 p i i n p i = J O k i = l

( E )

*

^{( B )}

.

C o n d i t i o n ( E l a n d f o r m u l a ( 5 ) y i e l d

S i n c e

i t f o l l o w s from ( 1 4 ) t h a t

I t i s c l e a r t h a t e q u a l i t y ( 1 5 ) may b e t r u e i f and o n l y i f

T a k i n g i n t o a c c o u n t t h e e q u a l i t y 6 n . ( m ) = 1 w e g e t i € N

p r o p e r t y ( B )

.

( B ) A

.

P r o p e r t y ( B ) y i e l d s t h a t t h e i n d i c a t o r s I ( B ⁼Bi)

,

i E H a r e X-measurable, a n d c o n s e q u e n t l y t h e random v a r i a b l e B

(18)

is x-measurable. According to the Levi Theorem for regular martingales

-

Since is K-measurable,

(A)

*

(B). Property (A) yields that random variable l3 is measurable and consequently I (f3 = Bi)

,

ⁱ^E^H^are

2-

measurable random variables. The processes

are H-adapted regular martingales. Consequently,

-

lim IT. (t) = .rr. (a)

,

j E H exists P-a. s. The x-measurability

t+w _I

of the indicators I(@ = B j ) 1-islds ( 6 ) and cor.iplstes the proof of Theorem 4.

The results of Theorem 4 are too general to be implemented in practical convergence analysis of Bayesian algorithms. The applied statistician expects from statistical theory more convenient conditions which are formulated in terms of parameters and probabilistic characteristics of the systems and processes with which he deals. As will be seen later, such forms of conditions

stem immediately from our results if we have some additional information about the observation process. We will consider the situation here when this information is concentrated in the semimartingale properties of the observable process

Ct.

(19)

5 . THE PROCESS S t , t > 0 AS A SEMIMARTINGALE

The s e m i m a r t i n g a l e i s one o f t h e k e y c o n c e p t s o f modern m a r t i n g a l e t h e o r y . I t a c c u m u l a t e s t h e common p r o p e r t i e s o f a wide c l a s s o f random p r o c e s s e s , which c a n b e i n v e s t i g a t e d i n t h e framework o f m a r t i n g a l e t e c h n i q u e s . T h i s i d e a a p p e a l s t o human i n t u i t i o n , which i s i n c l i n e d t o r e p r e s e n t dynamic p r o c e s s e s d e s c r i b i n g n a t u r a l phenomena a s t h e sum o f two components: s l o w

( t r e n d ) a n d q u i c k ( n o i s e ) . B e f o r e g i v i n g a f o r m a l d e f i n i t i o n w e w i l l i n t r o d u c e s e v e r a l new c o n c e p t s .

L e t t h e n o t a t i o n s H , Kt, P b e a s d e f i n e d above i n S e c t i o n 2 . W e w i l l u s e M ( H , P ) t o d e n o t e a c l a s s o f H-adapted m a r t i n g a l e s w i t h r e s p e c t t o m e a s u r e P w i t h r e g u l a r ( i . e . , r i g h t - c o n t i n u o u s and

h a v i n g l e f t l i m i t s ) s a m p l i n g p a t h s . The c l a s s o f H-adapted, n o n d e c r e a s i n g p r o c e s s e s h a v i n g a P - i n t e g r a b l e v a r i a t i o n w i t h r e g u l a r s a m p l i n g p a t h s w i l l b e d e n o t e d by A + ( H , P ) . The n o t a t i o n

+ +

A ( H I P ) ⁼A ( H I P )

-

A ( H I P ) w i l l b e u s e d f o r t h e c l a s s o f a r b i t r a r y H-adapted r e g u l a r p r o c e s s e s w i t h a n i n t e g r a b l e v a r i a t i o n . I n a s i m i l a r way w e c a n i n t r o d u c e t h e n o t a t i o n V(H,P) f o r t h e c l a s s o f H-adapted p r o c e s s e s w i t h a bounded v a r i a t i o n . The c l a s s o f

c o n t i n u o u s s a m p l i n g p a t h m a r t i n g a l e s w i l l b e d e n o t e d by M ~ ( H , P ) . The n o t a t i o n s Mloc ( H I P )

,

Mloc C ( H I P )

,

Aloc ( H I P )

,

a n d Vloc ( H I P ) w i l l b e u s e d f o r t h e c l a s s e s o f l o c a l m a r t i n g a l e s , c o n t i n u o u s l o c a l m a r t i n g a l e s , t h e p r o c e s s e s of l o c a l l y i n t e g r a b l e v a r i a t i o n , a n d l o c a l l y bounded v a r i a t i o n , r e s p e c t i v e l y . P r e d i c t a b l e a - a l g e b r a i n R x R+ g e n e r a t e d by H-adapted p r o c e s s e s w i l l b e d e n o t e d by n ( H )

,

a n d 8 - a l g e b r a c ( H ) x B ( R ~ ) i n R x R+ x R~ d e n o t e d by 3 ( H )

.

n ( H ) - m e a s u r a b l e p r o c e s s e s w i l l a l s o b e c a l l e d H - p r e d i c t a b l e .

(20)

D e f i n i t i o n 2 . A random process 5 ⁼ (St,xt) is called a semimartingale if gne can identify the processes V and M such that

We will also use the concept of H-predictable projection of the random process.

D e f i n i t i p n 3. The H-adapted process 'X = ( P X )

t t10 is said to be an H-predictable projection of process X if, for any H-predictable non-negative function y and arbitrary H-

t

predictable non-decreasing process A, the following holds

The class of H-adapted semimartingales with respect to measure P will be denoted by S(H,P).

It is not hard to see that local martingales, supermartingales, and submartingales are semimartingales. Arbitrary processes with stationary independent increments are semimartingales. A process X with independent increments will be semimartingale if

(21)

is a function of locally bounded variation for any X E R

(Shirjaev 1980). The concept of a semimartingale is applicable to many processes governed by stochastic differential and

integro-differential equations.

The class of semimartingales is invariant with respect to equivalent transformation of probabilistic measures and random

change time transformations (Shirjaev 1980)

.

Finally, if X E S (H,P) and f = f(x) x E R is a twice continuously differentiable function, then the process

is also semimartingale. Finally, any stochastic discrete-time process is semimartingale too.

In the next section we will give the singularity conditions for some probabilistic measures corresponding to semimartingales.

6. LOCAL ABSOLUTE CONTINUITY AND SINGULARITY OF PROBABILISTIC MEASURES

FJe start this section with an analysis of the properties of absolute continuity and singularity for local absolute continuous probability distributions (Kabanov et al. 1978).

Let probabilistic measures P and P be defined on measurable space (R, 3C, H), where all notations are the same as in Section 2.

Assume that measures P and P are locally equivalent

-

($'ccp), and

(22)

t h e l o c a l d e n s i t y i s g i v e n by

which i s t h e Radon-Nicodin d e r i v a t i v e o f measure

Pt

w i t h r e s p e c t t o Pt, where Pt and P a r e t h e r e s t r i c t i o n s of

- 5

and P t o

t

0 - a l g e b r a s t > O . N o t i c e t h a t f o r any t > O ( Z t > O ) = P ( Z > 0 ) ⁼ 1 .

t f t

We now i n t r o d u c e t h e p r o c e s s

I t i s e a s y t o s e e t h a t p r o c e s s M t ? ~ , i s H - l o c a l m a r t i n g a l e a n d , t f

by d e f i n i t i o n ,

L e t p ( d t ,d x ) b e t h e i n t e g e r - v a l u e d random m e a s u r e , c o r r e s p o n d i n g t o t h e jumps of M, and l e t v ( d t ,d x ) b e i t s d u a l H ( P ) - p r e d i c t a b l e p r o j e c t i o n . D e f i n e

The f o l l o w i n g theorem was proved i n Kabanov e t a l . ( 1 9 7 8 ) .

-

^{l o c}

Theorem 5 . Assume t h a t P < < P. Then

where B_ ( M ) = l i m Bt (M) t ^1.m

(23)

The equivalent formulations of the theorem are as follows:

or passing from M to Z,

where

and v (') is the dual H-predictable projection of measure

u

^(Z)

corresponding to jumps of Zt, t20.

These general results become more accessible for applications if they are reformulated in terms of characteristics and parameters corresponding to some particular processes, We will give these conditions for semimartingales in terms of their predictable characteristics (Kabanov et al, 1978)

.

Assume that the observable process 5 t20 is semimartingale t'

on probabilistic space (R, X, H I P), where a-algebra 3C and the family H= (Kt) t?0 are as defined above in Section 2.

According to Kabanov et al. (1978) any H-adapted semimartingale may be represented in the form

(24)

where

p (ds,dx) is the measure of jumps St,

v(ds,dx) is its dual H-predictable projection with respect to measure P.

Assume that process St, t>0, is also semimartingale with respect to probabilistic measure

b

that is on probabilistic space

( R , K,

P),

and consequently may be represented by

where

;

^(dt,dx) is the dual H-predictable projection of

"

p(ds,dx) with respect to probabilistic measure P.

-

Let as above,

fi

⁼ ^(Kt)

,

^where

Kt

= o{Ss, sit), and Pt and

" "

-

Pt are the restrictions of P and P on o-algebra

Kt ,

^t?O.

"

Denote by <m> (<m>t) the H-predictable square characteristic of t

the martingales mt C (5;) respectively.

Let (qn,0 be a sequence of stopping times with respect to H such that rnt" P-a.s. The processes

xt.

^P)t,O ^and

"

(St-'n,

jet,

^p) are also semimartingales with triples of

(25)

characteristics

aTn, <m>Tn, VTn n <fi>Tn, <Tn and

D e f i n i t i o n 4 . The measure P is said to have the property of (T,) -uniqueness if the triples (tiTn, <fi>'", cTn) uniquely de- termine the restrictions

6

of measure

P

to the u-algebras X

.

'I- n 'n

The next conditions will be useful in an analysis of the absolute continuity and singularity properties of probabilistic measures P and P (see Kabanov et al. 1978).

-

There exists an

(H)

-measurable function Y (t,x) such that 11. (a) dc = Ydv

(b) v({t), E) = 1 *c({t}, E) = 1, t10 (c) <m>

t = ~ f i > ~ , t2o.

There exists an H-predictable process y s such that

Define the H-predictable process Bt as follows:

+

^H I(O<as<l)

slt I -1 -a s s

)'

^(I-as)

(26)

where

~ e f i n e the stopping times r by n r = in£ {t?~: Bt>n)

n

IV. The measure

5

is ( r ) -unique.

n

Theorem 6. (Kabanov et al. 1978) The following state- ments hold for the semimartingales (Et, xt, P) and

3 ) If I, 11, IIIa, and IV hold, then IIIc

*

PLP.

-

The proof of this theorem may be found in Kabanov et al. (1978;

Theorem 13)

.

The results of Theorem 6 are very useful in specifying the strong consistency conditions, as we will do in the next section.

(27)

7. CONSISTENCY CONDITIONS FOR BAYESIAN ESTIMATIONS WHEN OBSERVATIONS ARE SEMIMARTINGALES

The condition of absolute continuity and singularity of probabilistic measures

6

and P formulated in Theorem 6 are given

in terms of measure P I that is, in terms of an upper measure which is calculated in the likelihood ratio

d6,

z, = -

,

^tlo

when it exists.

In practical situations,however,the properties of observable processes are usually defined by the measure P which is the lower measure in the likelihood ratio Z t' In order to reformulate the

results of Theorem 6 in terms of measure P I some auxilliary information about local martingale properties will be relevant.

-

¹

Let mt E M (H,P), mt>O P-a.s., t10, and E(m ) < c o for any t

t10. Denote by y(dt,ds) the integer-valued random measure, corresponding to jumps of m and let v(dt,dx) be the dual H(P)-

t'

predictable projection of y (dt,dx). Denote also by y' (dt,ds) the integer-valued random measure and the dual H(P)-predictable projection of the process m; = m ^-1

t 1 and <mC> t2O is the local t'

square H(P)-predictable characteristic of the continuous part of the process mt. t?O. The formulas for local H(P)-predictable characteristics of the process 5 can be given as follows.

t

(28)

-

¹

Lemma 4. The process m' = m is H(P)-submartingale. The t t

process <m C > ' tZ0, and the measure v(dt,dx) are charac- t'

f erized by

and

6

b s , x ) ~ ( d s , d x l = ^-X

v '

^(ds,^dx)

Proof. The submartingale property of m;, t?~,follows easily from the Jensen inequality for conditional mathematical

expectations.

Using the %-stochastic differentiation formula for

-

1

m' = m we get t t

It follows that

and consequently

This proves the first part of the lemma.

In order to prove the second part of the lemma, consider the arbitrarily bounded JC -measurable random variable

nt

t

(29)

and the (H)-measurable function f(t,x), such that

for any t2O.

We have

Notice that the jumps of processes mi and mt are related by

Am;

Amt =

-

(mt-+A mi-

Taking this into account for Lt, we can get

t -X

Lt = E

jo 1

^E(ntl%-) ^f^[^.^,

]

!~'(ds,dx)

E (ms-+x) ml-

The arbitrariness of

n

yields the proof of the second t

part of the lemma.

8. EXAMPLES

(1) Assume that the observation process is a sequence of random variables [Xn ( w ) ] n20

,

taking their values in R adapted to some

L J

-

nondecreasing family of o-algebras

H

⁼

{ % I .

^n=0,1^{, 2}

, . . . .

(30)

Introduce the family of o-algebras H = (Xt)t20 and the process S t ( d by

St(u) =Xn(u) for n < t < n + l

-

Let B(w) be an %-measurable integrable random variable taking its values in the set of non-negative integer numbers.

Xt

is defined in the normal way.

Denote by p (ds ,dx) the integer-valued random measure of jumps of the process

St.

The problem is to define the necessary

- .

and sufficient conditions for consistency of the estimation

- B

Bt

= E (

I %) .

^Let^{V (ds,}dx) be the dual H-predictable projection of p. It can be easily shown that

where AXm -

-

^m^' ^{- X}

.

m- 1

Denote by Qm -k (A. B) the probabilistic measure on [R x _{0 ,} o (R) €3

2m-1]

which is defined as

(31)

k j

Assume that the measures Qm( . , . ) and Qm( . , . ) are equivalent and denote by ykj (m,x) the derivative

where we omit for simplicity the symbol w in ykj (m,x)

.

^Let

and

where i (a: = 1 ) is the indicator of the event {a; = 1 1. ^Assuming that PI -a.s., the following inequality is true for any t 1 0 ,

Let also the measures

P:

^and

p J

be equivalent for any k, j E H 0

and the event {ak = O} yields the event {a: = 0) for any k, j E H . m

Then, from the results of Kabanov et al. (1978), it follows that

P:

( ) and

F!

are equivalent for any k , j E H and t?O. The conditions of singularity for the measures

gk

( - ) and ~j ( * ) may also be represented with the help of the results in Kabanov et al.

(1978), taking into account the equivalence of measures

pk

^andF!:

t

(32)

for any k, j E H , kj!j, pj-a.s. This is also a condition of

-

consistency of the Bayesian estimation 6 t o

(2) Let the process

tt

be the Markovian jumping process on any probabilistic space ( R , X I PI)

,

j E HI which is characterized by the family of functions X j

,

a,yEI' where I' is some denumberable

aY set on R , j E H.

Let the processes Xj (t) be the measurable functions of t aY

for any a,y E r, j E H and let the following conditions be true:

iii) sup

1

¹^'_aa^(s)

1

^ds^<

aEr

Assume also that measures

pk

( ) and

F

( ) are equivalent and the

0 0

following conditions are true for any t >

-

0 and k, j E N ,

5 ' -

^a.s.

t

ii) ^- k

J'

⁽¹

^- )

âY Î^(FS-^-â)^lay⁽⁵⁾^ds^<â

0

where

(33)

Then the condition of singularity of measures

pk

and

6'

will be

According to Theorem 4 this is equivalent to the almost certain convergence of Bayesian estimation.

(3) Let observation be a continuous-time diffusion-type process:

where ws is the Wiener process on (fi,X,P), which is H-adapted

-

and, as before,

Xt

= o ( B ) V

Xt.

Assume that for any k and j k,j E N the measures P k ( - ) and Pj ( ) are equivalent and PI-a. s. the following inequality is true for any t 2 ⁰ and k r j E N

Then for any t

-

> 0 the measures pk _tand p i are equivalent and the strong consistency property is equivalent to the P j

of the integral (Kitsul 1980)

(34)

(4) Assume that Ct(w) is the multivariant point process that is the sequence of (TnIXn) nZII where Tn are the stopping times with

-

respect to H = (Xt)t,OI = o(B) v%, such that the following

-

conditions hold

and Xn are

$

-measurable random variables taking their values n

in [R, (R)]

.

The random variable 6 is as defined above.

The multivariant point process can be represented with the help of the integer-valued random measure ~ ( 0 ) on (]O,m[,R)

Let vi(dt,dx) be the dual A-predictable projection of P on

( ~ , g , p ~ ) , i E H . Denote by a: = vJ({t}, R - \ {O}) and assume that for any k,j E H the event {a:= O} yields the event {a:= O} and P o k is equivalent to P i . Assume that there is a function ykJ (w,t,x) such that

v

k (dt,dx) = ykj (w,t,x)

vJ

(dt,dx) -1 P -ass.

and for any t

-

> 0

/ t ( l - i ~ k J ( S t ~ /

)

2 ~ j ( d S I d ~ )

+

^I~(o<a!<I)

0 Sft 1 S -a;)

(35)

Then it follows from Kabanov et al. (1978) that the measure

6:

is equivalent to ~j for any k , j E H.

t

The condition that is equivalent to an almost certain convergence of to 6 is

t

k 2

~j(ds,dx)

+

L I ( O < a i < I ) -a; )(I -a:) = m

s<t

-

^{1 - a s}

PI

-

a.s. for any k. j E H. k

2

j

9. THE UNCOUNTABLE SET OF PARAMETER VALUES

Consider now the case when

B

takes its values in some interval I of the real line. Let (@") be the sequence of piece-wise

constant functions of w such that

Denote by ? ( - ) , x E I the family of probabilistic measures on

-

3C which are defined by the equalities

-

Denote bv

px

( * ) the restrictions of pX on

Xt.

.- t

Theorem 5. For any x, y I let the measures P; (

.

⁾ ^and^pY_t⁽

^.

⁾

be equivalent and for any sets A,B E B (I)

,

A n B = (I the

A = - 1 ix(-)A(dx) and P B ⁽ ⁰ ⁾ =

-

Q i X ( * ) ~ ( d x ) measures P (

be orthogonal. Then the estimation

- Bt

is stongly consistent.

(36)

Before proving this theorem we will give some additionalstatements.

L e m m a 4. For any x,y E I let the measures and

5;

^be

equivalent and X ( ) some probabilistic measure on (I, B(I 1 )

.

Then for any sets A , B E B(I), A ~ B = $ , the measures pt(-) -A =

i - d ' (

^X^{idx) and}

P:

( . ) = ptX (dx) are equivalent.

-x

P r o o f . Let

r

E

2

be such that

?;(r)

⁼ ^0. ^Then

F:

^(r)⁼ 0

X

-

^a.^s^.I and consequently P -E (T) = 0.

L e m m a 5 , Let the conditions of Theorem 5 be true. Then for

any n the estimations

E:

are strongly consistent.

P r o o f . According to the choice of the sequence of

(Bn)

for

any n the random variable

Bn

has a denumerable set of values. According to the conditions of Theorem 5 the

-ni -nk ni n- n

measures P ( and P ( )

,

where P ( = p(.n(B -BiI, are

P ( Bn=8;) orthogonal. It follows from Lemma 5 that the

measures P ni ( ) and pnk( ^{a )} are equivalent. The result of

t t

Theorem 5 then follows from Theorem 4.

- 1 0. CONCLUSION

This paper represents the results for the strong consistency property of Bayesian estimation in two cases: a

denumerable and uncountable parameter set and wide class of continuous-time stochastic observation processes. In the case of the denumerable set of parameter values the necessary and

(37)

sufficient conditions of consistency are formulated in terms of absolute continuity and singularity of some special family of conditional probabilistic measures. In the case of an

uncountable parameter set the sufficient condition of strong consistency is formulated. The results of consistency may be specified when more details of the properties of random observation processes are available.

(38)

REFERENCES

Baram, Y., and N.R. Sandell (1978) Consistent estimation of finite parameter sets with application to linear system identification. I E E E T r a n s . A u t o m a t . C o n t r . 23 (3)

.

Box, G.E.P., and G.C. Tiao (1973) B a y e s i a n I n f e r e n c e i n S t a t i s - t i c a l A n a l y s i s . New York: Addison-Wesley.

Dellacherie, C. (1 972) C a p a c i t i e s e t P r o c e s s u s S t o c h a s t i q u e s ( C a p a c i t i e s and S t o c h a s t i c P r o c e s s e s ) . Berlin: Springer.

Doob, J.L., (1949) Le Calcul des Probabilites et ses Applications (Application of the theory of martingales)

.

C o l l o q u e

I n t e r n a t i o n a l du C N R S . 13:22-28.

Edwards, W., H. Lindnan, and L.J. Savage (1963) Bayesian

statistical inference for psychological research. P s y c h o l . R e v . 70: 193-242.

Freedman, D.A. (1963) On the asymptotic behaviour of Bayes estimates in the discrete case. Ann. Math. S t a t i s t i c s . 34:1386-1403.

Jacod, J. (1 979) Calcule Stochastique et Probleme de Martingales (Stochastic Calculus and Problems of Martingales). L e c t u r e N o t e s i n Math. 714. Berlin: Springer.

Kabanov, Y.M., R.S. Lipster, and A.N. Shirjaev (1978) Absolutnaya Neprerivnost i Singularnost Localno Absolutno Neprerivnich Vero j atnostnich Raspredeleny (Absolute Continuity and Singu- larity of Locally Absolutely Continuous Probabilistic Distri- butions). ath he ma tic he sky S b o r n i k 107:149, 3:11.

(39)

K i e f e r , J . , a n d J . W o l f o w i t z ( 1 9 5 6 ) C o n s i s t e n c y o f t h e maximum l i k e l i h o o d e s t i m a t o r i n t h e p r e s e n c e o f i n f i n i t e l y many i n c i d e n t a l p a r a m e t e r s . Ann. Math. S t a t i s t i c s 27:887-906.

K i t s u l , P. I . (1980) N e c e s s a r y a n d s u f f i c i e n t c o n d i t i o n s o f c o n s i s t e n c y e s t i m a t i o n s o f t h e p a r a m e t e r s o f d i f f u s i o n p r o c e s s e s . Proc. X I I I European M e e t i n g o f S t a t i s t i c i a n s . B r i g h t o n : s t o n e b r i d g e P r e s s .

K r a f t , C . ( 1 9 5 5 ) Some c o n d i t i o n s f o r c o n s i s t e n c y a n d u n i f o r m c o n s i s t e n c y o f s t a t i s t i c a l p r o c e d u r e s . U n i v e r s i t y o f C a l i f o r n i a Pub. S t a t i s t . 2:125-142.

K u z n e t z o v , N . A . , A . V . Lubkov, A.V. a n d A . I . Y a s h i n ( 1 9 8 1 ) About c o n s i s t e n c y o f B a y e s i a n e s t i m a t e s i n a d a p t i v e Kalman

f i l t r a t i o n scheme. A u t o m a t i c and r e m o t e c o n t r o l 4: 47-56.

K u z n e t z o v , N . A . , and A . I . Y a s h i n ( 1 9 8 1 ) On c o n s i s t e n t p a r a m e t e r e s t i m a t i o n i n a d a p t i v e f i l t e r i n g . Problems o f C o n t r o l and I n f o r m a t i o n T h e o r y . 10 ( 5 ) :317-327.

Le C a m , L. ( 1 9 5 8 ) L e s P r o p e r i e t e s A s y m p t o t i q u e s d e s S o l u t i o n s d e Bayes (The A s y m p t o t i c P r o p e r t i e s o f B a y e s i a n S o l u t i o n s ) . P u b l . de Z ' I n s t . de S t a t i s t . de Z ' U n i v . de P a r i s . 7:17-35.

L e C a m , L. ( 1 9 5 3 ) On some a s y m p t o t i c p r o p e r t i e s o f maximum l i k e l i - hood e s t i m a t e s a n d r e l a t e d B a y e s ' E s t i m a t e s . U n i v e r s i t y o f

C a l i f o r n i a P u b l . S t a t i s t . 1:277-330.

Le C a m , L . a n d L. S c h w a r t z ( 1 9 6 0 ) A n e c e s s a r y a n d s u f f i c i e n t c o n d i t i o n f o r t h e e x i s t e n c e o f t h e c o n s i s t e n t e s t i m a t e s . Ann. Math. S t a t i s t i c s 31 : 140-150.

L i n d l e y , D.V. ( 1 9 7 5 ) The f u t u r e o f s t a t i s t i c s

-

a B a y e s i a n 2 1 s t c e n t u r y ( l e c t u r e a t t h e C o n f e r e n c e o n D i r e c t i o n s f o r

M a t h e m a t i c a l S t a t i s t i c s , Canada 1 9 7 4 )

.

Supp. Adv. A p p l . P r o b . , 7:106-115.

L i p t s e r , R.S., a n d A . N . S h i r j a e v ( 1 9 7 7 ) S t a t i s t i c s o f s and om P r o c e s s e s . I and I I . B e r l i n : S p r i n g e r .

L j u n g , L . (-3978) C o n v e r g e n c e a n a l y s i s o f p a r a m e t r i c i d e n t i f i c a t i o n m e t h o d s . I E E E T r a n s . A u t o m a t . C o n t r . 23 ( 4 )

.

Meyer, P.A. ( 1 9 6 6 ) P r o b a b i l i t y and P o t e n t i a l s . Waltham:

B l a i s d e l l .

Meyer, P.A. ( 1 9 7 6 ) Un C o u r s s u r l e s I n t e g r a l e s S t o c h a s t i q u e s

( A C o u r s e o n S t o c h a s t i c I n t e g r a l s ) , S e m i n a r e d e P r o b a b i l i t i e s , X. ( U n i v e r s i t y of S t r a s b o u r g , 1 9 7 4 / 5 ) . L e c t u r e N o t e s i n Math. 531:245-224. B e r l i n : S p r i n g e r .

P e t e r k a , V. ( 1 9 8 0 ) B a y e s i a n a p p r o a c h t o s y s t e m i d e n t i f i c a t i o n . T r e n d s and P r o g r e s s i n S y s t e m I d e n t i f i c a t i o n e d i t e d by P. E y k h o f f . O x f o r d : Pergamon P r e s s .

(40)

Savage, L.J. (1954) The F o u n d a t i o n s o f S t a t i s t i c s . New York:

Wiley

.

Shirjaev, A. (1980) Martingales: recent results and applications P r o c . XIII European M e e t i n g o f S t a t i s t i c i a n s . Brighton:

Stonebridge Press.

Wald, A. (1949) Note on the consistency of the maximum likelihood estimate. Ann. Math. S t a t i s t i c s 20:595-601.

Yashin, A.I. (1981) Sostojatelnost Bayesovskich Otcenok Parametros (consistency of Bayesian Parameter Estimates). ProbZemy

P e r e d a c h y I n f o r m a c i i 1:62-72.

Bayesian Approach to Parameter Estimation: Convergence Analysis

,

.

.

-

- -

Xt

-

3

Em.

it

B

it

B

Bt

B

-

Bt

-

Bt

B.

Bt

-

B

it

B.

-

Ot

-

Bt

B

Bt

B.

-

. .

B.

Bn

...,

B

. . . ,

En

1

....

. . ,

,

. . . .

B

,

,....

P

Kt

kt

-

- -

,

,m

P ( A N B

Bill

Pi

B i l l

pi

kt,

k t

XI

-

-

-

.

B . 1%)

-

-

,

,

-

p k M

*

.. 2

,

x;)

<,,

. ^. ^,

_,

.. ²

^,

^x;)

^In,

^.

^r