• Keine Ergebnisse gefunden

A Bayesian Approach to Analyzing Uncertainty Among Stochastic Models

N/A
N/A
Protected

Academic year: 2022

Aktie "A Bayesian Approach to Analyzing Uncertainty Among Stochastic Models"

Copied!
21
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

A BAYESIAN APPROACH T O ANALYZING UNCERTAINTY AMONG S T O C H A S T I C MODELS

E r i c F. W o o d

S e p t e m b e r 1 9 7 4

R e s e a r c h R e p o r t s a r e p u b l i c a t i o n s r e p o r t i n g on t h e w o r k of t h e a u t h o r . A n y v i e w s o r c o n c l u s i o n s a r e t h o s e o f t h e a u t h o r , and do n o t n e c e s s a r i l y r e f l e c t t h o s e of I I A S A .

(2)
(3)

A B a y e s i a n A p p r o a c h t o A n a l y z i n g U n c e r t a i n t y among S t o c h a s t i c M o d e l s

E r i c F . Wood

A b s t r a c t

T h e s t a t i s t i c a l u n c e r t a i n t y , r e s u l t i n g f r o m t h e l a c k o f k n o w l e d g e o f w h i c h m o d e l l i n g r e p r e s e n t s a g i v e n s t o c h a s t i c p r o c e s s , i s a n a l y z e d . T h i s a n a l y s i s o f m o d e l u n c e r t a i n t y l e a d s t o a c o m p o s i t e B a y e s j a n d i s t r i b u t i o n . The c o m p o s i t e B a y e s i a n d i s t r i b u t i o n i s a l i n e a r m o d e l o f t h e i n d i v i d u a l B a y e s i a n p r o b a b i l i t y d i s t r i b u t i o n s o f t h e i n d i v i d u a l m o d e l s , w e i g h t e d by t h e p o s t e r i o r p r o b a b i l i t y t h a t a p a r t i c u l a r m o d e l i s t h e t r u e m o d e l . T h e c o m p o s i t e B a y e s i a n p r o b a b i l i t y m o d e l a c c o u n t s f o r a l l s o u r c e s o f s t a t i s t i c a l u n c e r t a i n t y - - b o t h p a r a m e t e r u n c e r t a i n t y a n d m o d e l u n c e r t a i n t y . T h i s m o d e l i s t h e o n e t h a t s h o u l d b e u s e d i n a p p l i e d p r o b l e m s o f d e c i s i o n a n a l y s i s , f o r i t b e s t r e p r e s e n t s t h e k n o w l e d g e - - o r l a c k o f i t - - t o t h e d e c i s i o n m a k e r a b o u t f u t u r e e v e n t s o f t h e p r o c e s s .

I n t r o d u c t i o n

A p p l i e d s c i e n t i s t s a r e o f t e n c o n f r o n t e d w i t h t h e p r o - b l e m o f c h o o s i n g o n e s t a t i s t i c a l m o d e l f r o m many c o n t e n d i n g m o d e l s . An e x a m p l e o f t h i s s e l e c t i o n p r o b l e m i s f r e q u e n t l y e n c o u n t e r e d by h y d r o l o g i s t s i n f l o o d f r e q u e n c y a n a l y s i s . T h e e x a m p l e s a n d a p p l i c a t i o n s i n t h i s p a p e r w i l l b e a d d r e s s e d t o t h a t p r o b l e m .

C o n s i d e r t h e p r o b l e m o f t h e h y d r o l o g i s t who m u s t make a d e c i s i o n b e t w e e n a number o f a l t e r n a t e d e s i g n s t h a t p r o - p o s e t o p r e v e n t o r d e c r e a s e t h e o c c u r r e n c e o f f u t u r e f l o o d s . H i s f i r s t t a s k i s t o make i n f e r e n c e s a b o u t t h e u n d e r l y i n g

(4)

process that generates these events but, in addressing this problem, he is faced with a number of sources of uncertainty.

These sources of uncertainty have often been summarized into three categories [l] :

1. Natural uncertainty. This is the uncertainty in the stochastic process --the occurrence of extreme streamf lows, q.

2. Statistical uncertainty. This is associated with the estimation of the parameters of the model of the stochastic process due to limited data.

3. Model uncertainty. This is associated with the uncertainty that a particular probabilistic model of the stochastic process may not be the true model.

Most hydrologic processes are so complex that no model yet devised may be the true model, or maybe hydrologic events follow no particular model.

Many models seem to fit the available data very well, but often the models lead to different inferences and decisions.

In recent years, considerable progress has been made on the development of statistical procedures for comparing alter- native models; examples of this are Gaver and Geisel [3], Smallwood [8] and Leamer [4], who all used Bayesian statistical procedures, and Dumonceaux et al. and Pesaran [5] who applied llclassicallt statistical procedures of hypothesis testing.

(5)

Composite Bayesian Distribution

For a particular model of flood events, parameter un- certainty can be accounted for by considering the Bayesian pdf of flood events, which is

-

f(q) =

I

f ( q l ~ ) ' f"(A) - dA - A

where

-

f(q) is the Bayesian pdf for q ,

f ( q l ~ ) is the lfmodelled" pdf of q, conditional upon the uncertain parameter set - A, and f" (A) is the posterior pdf for the parameter set

- A.

Model uncertainty can be considered by defining a com- posite model of the form

where

A

The composite model, f(ql&,~), is conditioned upon a set of unknown model parameters - A and an unknown composite model parameter set - 8.

fl(q[A1),..

.

Y and fn(qlAn) is the set of probabilistic models that make up the composite model. These models are

conditioned upon a general unknown parameter set

A.

81,..., and

en

are parameters that take on a value of either 0 or 1; their value is uncertain. If

el

= 1 , then model fi(qlAi) is the true model. The constraint

(6)

is imposed, which implies that one and only one model is the true model.

For notational simplicity, consider the case where n = 2. The likelihood function for a set of observations Q is just:

There are no cross products of the models, due to the li- mitation imposed on the values that Oi can take on; and the constraint on - 0 Li(Ai(&) is just the likelihood function of model i, conditional upon the observations,

Q.

-

Define now a composite prior distribution on the pa- rameters

A

and - 0. The prior will be of the form

f t i ( ~ i l O i = 1) is the prior distribution on the parameter set

A,

conditional upon Oi = 1. pt(Oi = 1) is the prior probability that model i is the "true" model.

(7)

Bayes' rule can be written as

f"(b1data) = - 1 L (bldata)

.

fl(b)

.

( 6 )

K

f" (bldata) is the posterior distribution of the b y conditional upon the data; ~ ( b l d a t a ) is the likelihood function for b; f'(b) is the prior distribution of b;

and K is a normalizing constant.

The normalizing constant K is often called, in the econometrics literature, the marginal density of the ob- servations or the marginal likelihood [12] and can be found by

~ ( b

/

data, model)

.

f

'

(b (model) db

.

( 7 ) Ki, the marginal likelihood function for model i, can be thought of as the probability of observing the data, given model i.

The posterior density function for A,g i s calculated from Bayes' rule; it is

(8)

w h e r e K- i s a n o r m a l i z i n g c o n s t a n t e q u a l t o

The p o s t e r i o r model p r o b a b i l i t i e s , p l ' ( O i ) a r e

T h e s e p o s t e r i o r p r o b a b i l i t i e s f o r O i a r e t h e same a s t h o s e f o u n d by Leamer [4], G a v e r a n d G e i s e l [3], a n d Smallwood [8], e v e n t h o u g h t h e i r a p p r o a c h e s t o t h e p r o b l e m w e r e d i f f e r e n t .

The c o m p o s i t e B a y e s i a n d i s t r i b u t i o n o f e x t r e m e f l o o d e v e n t s , q , c a n a l s o b e f o u n d by a p p l y i n g f i r s t p r i n c i p l e s :

The c o m p o s i t e B a y e s i a n d i s t r i b u t i o n i s s i m p l y t h e B a y e s i a n d i s t r i b u t i o n s o f t h e m o d e l s w e i g h t e d by t h e p o s t e - r i o r p r o b a b i l i t y t h a t a p a r t i c u l a r m o d e l i s t h e t r u e m o d e l . T h i s r e s u l t i s e x t r e m e l y c o n v e n i e n t .

(9)

Analytical Derivation of the Marginal Density Function The marginal density function of a set of observations is calculated from Equation ( 7 ) , and represents the proba- bility of observing that set of data. The marginal density function depends upon the probability model for the sto-

chastic process, the prior probability density function over the parameters of the model and the set of observed data. Consider the marginal likelihood function for the following cases:

1. Normal Process

Let the random variable q be distributed with Normal mean and precision h. The probability density for

q is

1 2

f(qlp,h) = ,-- 2 II h5 exp I - ! (q-P)

3 .

(13 2

Then, given n independent observations of q,

9,

the like- lihood function for P and h is

Define the following

(10)

then

Assume the prior on (p,h) is a natural conjugate prior 1 of the form

r(112 v f )

Then, the marginal likelihood function for the Normal model, KN =

/I

L ( L I , ~ ~ Q ) f d m

UYh

is from Equation (14) and (18)

where

'For the Normal process, the natural conjugate over the mean and precision is Normal-Gamma (Raiffa and Schlaifer, [6]).

(11)

The integral is equa; to r(1/2 v " )

Thus

- n t 1/2

.

(2.n -v/2

.

r(1/2 v") KN - (- )

n I' r(1/2 v t )

2. Log-Normal Troces s

Let xi = In qi be distributed Normal with mean

and precision h. Then qi is distributed Log-Normal by de- finition. The probability density function f 0 r . q is

The likelihood function for p and h, given n independent observations of q is

(12)

Assume a Normal-Gamma prior for LI and h of the same form as Equation (18). The marginal likelihood, KLN, is just the integration of p and h over the product of the likeli- hood and the prior probability density function.

The integral is of the same form as the marginal likelihood for the Normal model. Then, from Equation (211, KLN is:

1

.

n' 112 (2T)-v/2 (112 v")

K~~ =

n

(,,, (1/2 v')

n

qi

i = l

3. Exceedance Model

Another model of common use in water resources, espe- cially in the analysis of extreme events, is the Exceedance model. ( ~ h a n e and Lynn, [7] ; Wood, [lo] ; Todorovic and

Zelenhasic

,

[g]

.

) The Exceedance model considers only those extreme events, let's say flood discharges, greater than a specified base level. Such discharges are called exceedance discharges and the probability dznsity function of exceedance discharges is assumed to be of an Exponential type. Furthermore, the arrival of exceedance events is assumed to be a Poisson process. Such a model is of a general form since the upper tails of many distributions

(13)

can be approxi~~~ated by an exponential form.

The second part of this model concerns flood discharges less than the base level. Usually such discharges are of little interest in analyzing extreme events, and the distri- bution of such events may be quite complex. Here, it will be assumed that the events will follow a uniform distribu- tion. The use of the uniform b density function implies that the posterior probability for the Exceedance model will be underestimated or conservative.

The probability density function for the Exceedance model is

f(qlv,a) = vatexp I-a(q - qb)} for q 2 qb

where v is the arrival rate of floods, a is the event magnitude parameter and qb is the base level.

Given a sample of n independent discharges, &, of which m are discharges less than qb and n-m are discharges greater than or equal to q b , then the likelihood function for v and a can be shown to be,

(1 - v ) ~ n-m n-m n -m

~ ( v , a J g ) =

.

v a e:rp I-a 1 (qi -qb)j

q: i= 1

(14)

The marginal likelihood function, KE, is defined as

The conjugate prior density function for v and a are of the form

Therefore, from Equation (28) applying Equations (27) and (29) KE is simply

The integral over v equals

where .

u " = u l + n - m

S" = S' + T (or st' = s ' + Iti)

(15)

and the integral over a equals

where

v " = v l + n - m

Thus, KE equals

(34) Some computer experiments were carried out with samples generated from known distributions. As an example, a

sample growing from 10 to 200 was generated from a Log- Normal distribution with pQn = 7.85 and a = 0.95

Y llny

an$ the marginal likelihoods where numerically evaluated for the Log-Normal and the Exceedancemodels assuming diffuse prior distributions on the probability model parameters.

Table 1 shows the values of the marginal likelihoods jointly with the posterior model probabilities estimated according to Equations (10) and (11) on the assumption of diffuse prior model probabilities (pl(B1 = 1) = p1(02 = 1) = 0.5).

Extensive experiments are presently being performed to

evaluate the worth of data on the problem of model selection as well as the influence of prior assessments, and the

results will be forthcoming.

(16)

An Application to the Blackstone River, U .S .A.

The Blackstone River, at Woonsocket, Rhode Island, has been analyzed by Wood and Rodriguez [ll.] for prior information for the Bayesian probability density function of its flood discharges (for four different probability models), and for a decision problem concerning local flood protection. Model uncertainty was not considered in the previous paper even though competing models were considered.

This section calculates the posterior model probabilities.

The parameters for the marginal likelihood functions are summarized in Table 2. The values of the marginal likeli- hoods are

for the Normal, Log-Normal, and Exceedance models, respectively.

Assuming uniform prior probabilities on the three models, the posterior probabilities for the models are

(17)

The composite Bayesian distribution of flood discharees is, from Equation (12)

-

where fE(q) is the Bayesian density function for the Exceed-

-

ante model, and fLN(q) is the Bayesian density function for the Log-Normal model.

The composite Bayesian distribution of Equation (36) is the probability model which should be used in making inferences about future flood discharges. The composite Bayesian model rationally accounts for both parameter and model uncertainty.

It is interesting to note that the form of composite Bayesian model is not fixed, but is dynamic and changes as more data becomes available.

Conclusions

This paper considers the problem of model uncertainty within a Bayesian analysis. When there is a set of competing probability models for flood discharges, Bayesian analysis leads to a composite Bayesian model. The composite Bayesian model is a linear model consisting of the Bayesian distribu-

tion of the individual models, weighted by the posterior model probability that the individual model is the true model The posterior model probabilities are calculated from the marginal likelihood function of the observed data and the

(18)

prior model probability.

The posterior model probabilities are found by calcula- ting the marginal likelihood function for each competing model. The marginal likelihood function was derived analy- tically for three commonly used models - - a Normal process, a Log-Normal process,and an Exceedance model. The results have been applied to "real-world" data and favourable results obtained.

(19)

Table 1: Marginal Likelihoods and Posterior Model Probabilities for Samples Generated from Log-Normal Process with 'En y = 7.8 and o

En Y = 0 . 9 5

Log-Normal Model Sample Size Exceedance Model

Marginal Posterior Model Marginal Posterior Model Likelihood Probability Likelihood Probability

(20)

Table 2: Marginal Likelihood Parameters for Normal, Log-Normal, and Exceedance Models for the Blackstone River, U.S.A.

Normal Model n' = 7 years

v = 36 years

v' = 9.22 x lo6 cfs2

n" = 44 years vl' = 43 years.

vw = 24.7 x lo6 cfs 2

Log-Normal Model

n' = 4 years

V' = 36 years v' = .22 log cfs 2

U' = 6 events

V ' = 3 events

S' = 50 years

! Z ' = 10850 cfs m. = 32 events qb = 8500 cfs

n" = 41 years

V" = 40 years

V" = .689 log cfsC

Exceedance Model

U " = 11 events

V" = 8 events

S" = 87 (S"+m=119) years

ail

= 49468 cfs n = 5 events

(21)

R e f e r e n c e s

[l] B e n j a m i n , J a c k R . a n d C . A l l i n C o r n e l l . P r o b a b i l i t y , S t a t i s t i c s a n d D e c i s i o n f o r C i v i l E n g i n e e e ,

McGraw-Hill, New Y o r k , 1 9 7 0 .

[2] Dumonceaux, R o b e r t , C h a r l e s E . A r t l e , a n d G e r a l d H a a s .

" L i k e l i h o o d R a t i o T e s t f o r D i s c r i m i n a t i o n B e t w e e n Two Models w i t h Unknown L o c a t i o n a n d S c a l e P a r a m e t e r s , "

T e c h n o m e t r i c s , V o l . 1 5 , No. 1 ( F e b r u a r y 1 9 7 3 ) .

[3] G a v e r , K . a n d M . G e i s e l . " D i s c r i m i n a t i n g Among A l t e r n a t i v e M o d e l s : B a y e s i a n a n d N o n - B a y e s i a n M e t h o d s , " t o a p p e a r i n F r o n t i e r s o f E c o n o m e t r i c s , P a u l Z a r e m k k a , E d i t o r , Academic P r e s s , f o r t h c o m i n g , 1 9 7 2 .

[4] L e a m e r , E . " P r o b a b i l i t i e s o f L i n e a r H y p o t h e s i s ,

"

H a r v a r d U n i v e r s i t y , u n p u b l i s h e d w o r k i n g p a p e r , 1 9 7 3 . 5 P e s a r a n , M . H . "On t h e G e n e r a l P r o b l e m o f Model S e l e c t i o n , "

T h e R e v i e w o f Economic S t u d i e s , V o l . XLI ( 2 ) , No. 1 2 6 ( ~ p r i l 1 9 7 4 ) .

[6] R a i f f a , Howard a n d R o b e r t S c h l a i f e r . A p p l i e d S t a t i s t i c a l D e c i s i o n T h e o r y , MIT P r e s s , C a m b r i d g e , Mass., 1 9 6 1 , P . 3 5 6 .

r71 S h a n e , R . M . a n d W . R . L y n n . " M a t h e m a t i c a l Model f o r F l o o d

-

R i s k E v a l u a t i o n , " J o u r n a l ' o f t n e H y d r a u l i c s D i v i s i o n , A m e r i c a n S o c i e t y o f C i v i l E n g i n e e r s , V o l . 9 0 , No. HY-6

( J u n e 1 9 6 4 ) .

[8] S m a l l w o o d , R . "A D e c i s i o n A n a l y s i s o f Model S e l e c t i o n , "

IEEE T r a n s a c t i o n s o n S y s t e m s S c i e n c e a n d C y b e r n e t i c s , V o l . S S C - 4 , No. 3 , S e p t e m b e r 1 9 6 8 .

[9] T o d o r o v i c , P . a n d E. Z e l e n h a s i c . "A S t o c h a s t i c Model f o r F l o o d A n a l y s i s , " W a t e r R e s o u r c e s R e s e a r c h , V o l . 6 , No. 6 ( D e c e m b e r 1 9 7 0 )

.

[lo]

Wood, E r i c F . " F l o o d C o n t r o l D e s i g n w i t h L i m i t e d D a t a -

A C o m p a r i s o n o f t h e C l a s s i c a l a n d B a y e s i a n A p p r o a c h e s , "

Syir~posium o n t h e D e s i g n W a t e r R e s o u r c z s P r o j e c t s w i t h I n a d e q u a t e D a t a , M a d r i d , J u n e 1 9 7 3 .

1 1 Wood, E r i c F . a n d I . R o d r i q u e z - I t u r b e

.

" B a y e s i a n I n f e r e n c e a n d D e c i s i o n M a k i n g f o r E x t r e m e H y d r o l o g i c E v e n t s , "

s u b m i t t e d f o r p u b l i c a t i o n , W a t e r R e s o u r c e s R e s e a r c h , 1 9 7 4 . [12) Z e l l n e r , A . An I n t r o d u c t i o n t o B a y e s i a n I n f e r e n c e i n

E c o n o m e t r i c s , W i l e y , New Y o r k , 1 9 7 1 .

Referenzen

ÄHNLICHE DOKUMENTE

Geisinger Health System, an integrated healthcare delivery system located in Pennsylvania which has operated for more than 100 years, has set a goal to counter sprawling healthcare

The zero-emission transit bus represents a proven, scalable application with a global stock estimated at greater than 500,000 vehicles that is helping to prove

We then, in the wake of Oelschl¨ ager (1990), Tran (2006, 2008), Ferri` ere and Tran (2009), Jagers and Klebaner (2000, 2011), provide a law of large numbers that al- lows

Given the definition of the range of typical systems be- haviour in terms of the model output (p. 12), the initial data space should separate in regions giving rise to that be-

A host of researchers in the last 15 years [8] have suggested another way to explain software architectures: Instead of pre- senting an architectural model as a

Model: an abstract representation of a system created for a specific purpose.... A very popular model:

Model: an abstract representation of a system created for a specific purpose.... A very popular model:

The work is focused on different neutrino related topics: neutrino physics in the context of cosmology and general particle physics, the mechanisms of neutrino mass generation and