Working Paper
The
Erperimental Design of an Observational N e t w o r k Optimization Algorithms of the Rchfmge TypeK K Fedorov
October 1986 WP-86-62
International Institute for Applied Systems Analysis
A-2361 Laxenburg, Austria
NOT FOR QUOTATION WITHOUT THE PERMISSION OF THE AUTHOR
The Experimental
Design
of an ObservationalNetwork:
Optimization Algorithms of the Exchange Type
K K Fedorov
Ootober 1986 WP-86-62
Working Pbpers a r e interim reports on work of the International Institute f o r Applied Systems Analysis and have reoeived only limited review. Views o r opinions expressed herein do not neoessarily r e p r e s e n t those of t h e Institute or of i t s National Member Organizations.
INTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS ANALYSLS 2361 Laxenburg, Austria
Preface
For many years, designers of environmental monitoring systems have faced t h e problem of optimal allocation of resources f o r observational networks (see, f o r instance, Munn, 1981): where, how frequently and what characteristics have to be measured or observed in o r d e r to obtain data that will be sufficient f o r prog- noses o r warnings. From t h e 1970's t o the early 19809s, a number of heuristic ap- proaches appeared in t h e "environmental" literature. Most of them are based on t h e analysis of space and time correlation structures (usually historical time series are used f o r t h e i r estimation) of t h e observed entities with subsequent siev- ing to keep t h e less correlated (and hopefully most informative) observational points.
Different procedures f o r "sieving" have been used in applications: viz., for- ward and backward versions with various objective functions. These procedures have led to reasonably good results; however, no accurate mathematical analysis were undertaken.
In this present paper, a new approach to optimal allocation of a n observation- a l network is proposed and some iterative numerical procedures are considered.
The approach is essentially based on t h e theory of t h e optimal design of regression experiments (Ermakov, ed., 1983). Using t h e classical results from t h e moment spaces theory, t h e author investigates the properties of optimal allocations and the oonvergence of t h e numerical procedures to optimal solutions.
Prof. M. Antonovsky (Environment Program)
-
iii-
The Ekperimental
Design
ofan
Obsemational Network:Optimization Algorithms of the
Exchange
Type K K Fedorov1. Introduction
In this p a p e r t h e numerical procedures of t h e "exchange" type f o r construc- tion of continuous optimal designs with r e s t r i c t e d measures (see definitions in Fedorov, 1986, Wynn, 1982) were considered. The "exchange" type procedures were based on t h e simple heuristic idea: a t every subsequent s t e p t o delete 'bad"
(less informative) points and t o include "good" (most informative) ones.
Before giving t h e a c c u r a t e mathematical formulation of t h e problem and to il-
luminate t h e place of t h e r e s u l t s in experimental practice, let us start with two simple hypothetical examples. "Real" examples, where t h e considered approach seems t o be a p p r o p r i a t e can b e found, f o r instance, in Munn, 1981.
Ezample 1. Let X b e a n area where N observational stations have t o b e locat- ed. An optimal ( o r at least, admissible) location depends upon models describing a system: "object under analysis
-
observational techniques".The regression models:
y i
=
q(zi ,9)+ci , i=1,~
(1)are commonly used in experimental practice. Here yi is a r e s u l t of an observation of t h e i -th station, q ( z , 6 ) is a n a p r i o r i given function, 9 is a vector of parame- ters t o b e estimated and ci is an e r r o r which one believes t o be random (more de- tailed specification will b e given later). The optimal location _of stations h a s t o provide t h e minimum of s o m e measure of deviation of estimates 6 from true values of 6.
For sufficiently l a r g e N t h e location of stations can b e approximately described by some distribution function #(&) and one needs t o find an optimal
#*(&). If X i s not uniform, then one comes to t h e r c s t r i c t i o r ~ Mrat t h e s h a r e N(AX)/ N of stations in any given p a r t AX cannot exceed some prescribed level. In terms of distribution functions, i t means t h a t
where i s defined by an experimenter. Here is t h e crucial feature of t h e problem considered in this paper.
Ezample 2. Let some c h a r a c t e r i s t i c yi be observed f o r members of a sample of size N. Every i -th member of this sample can b e chosen f r o m a group labelled by variables zi. If t h e sampling i s randomized, then t h e observed c h a r a c t e r i s t i c y i can b e described by s o m e distribution (y / zi , 9).
In many cases, a f t e r some manipulations, t h e initial model can b e reduced t o ( I ) , where q ( z i , 6 ) is an a v e r a g e c h a r a c t e r i s t i c of a n i -th group and ci r e f l e c t s a variation within this group. The size of any group (or number of units available f o r sampling) is normally bounded. When applied to a continuous version of t h e
design problem o n e oan easily r e p e a t t h e considerations of t h e previous example and come to model ( I ) , (2).
In what follows, i t will b e assumed t h a t in model ( I ) , (2):
-
a response function i s a l i n e a r function of unknown p a r a m e t e r s , i.e.q ( z , 9 )
=
gT) ( z ) , 9- a n d functions ) ( z ) are given;-
errors E( are independent and E[rf]=l ( o r E [ E ~ ] = A ( z ~ ) , where X(z) i s known, t h i s case c a n b e easily transformed to t h e previous one).A s usual, some objective function O defined on t h e s p a c e of m Xm information matrices
will d e s c r i b e t h e quality ( o r a c c u r a c y ) of a design
t
(M-'(€) as a normalized variance-covarianoe matrix of t h e least s q u a r e estimators of p a r a m e t e r s 9.The purpose of optimum design of experiments i s to find
Constraint (4) defines t h e peculiarity of t h e design problem with r e s p e c t to s t a n d a r d a p p r o a c h e s . Similar to t h e moment s p a c e s t h e o r y (compare with Krein and Nudelmann, 1 9 7 3 Ch. VII), a solution of (3) and (4) will be called "(0 , *)-op- timal design". In p r a c t i c e , *(&) r e s t r i c t s t h e number of observations in a given s p a c e element dz ( s e e t h e examples).
Optimization problem (1) and (2) were considered by Wynn, 1982 and Gaivoron- sky, 1985. To some e x t e n t , t h e y t r a n s l a t e d a number of classical r e s u l t s from mo- ment s p a c e s t h e o r y to experimental design language. Gaivoronsky also analyzed t h e convergence of t h e i t e r a t i v e p r o c e d u r e f o r optimal design construction based on t h e traditional idea of s t e e p e s t descent (see, f o r instance, Ermakov (ed), 1983, Wu a n d Wynn, 1976)
where
t
h a s to satisfy (4) a n d some additional l i n e a r constraints:J
~ ( z ) t ( & ) s c.
X
Wynn briefly discussed a number of h e u r i s t i c numerical p r o c e d u r e s based on some r e s u l t s from t h e moment s p a c e s t h e o r y .
The main objective of t h i s p a p e r i s to consider t h e i t e r a t i v e p r o c e d u r e s of ex- change t y p e which extensively use t h e n a t u r e of optimal designs f o r problem (3),
(4) and t h e r e f o r e promises to b e more efficient than t h e ones mentioned above.
General p r o p e r t i e s of optimal designs are discussed in Section 2. Section 3 d e a l s with formulation and basic analysis of t h e i t e r a t i v e p r o c e d u r e and i t s modifi- cations.
2. C h a r a c t e r i z n t i o n o f (+
,
+)-optimal DesigasIn t h i s section, t h e p r o p e r t i e s of optimal designs will b e discussed only to t h e e x t e n t sufficient f o r t h e analysis of t h e proposed i t e r a t i v e procedures. More de- tails c a n b e found in Wynn, 1982.
The set of assumptions used l a t e r is t h e following:
a ) X i s c o m p a c t , XER' ;
b) j ( z ) ERm are continuous functions in X ;
c ) +(z ) i s atomless;
d ) t h e r e exists c
<-
such t h a tE c ( + ) =
{t:+[M(#)] S c< -
, t ~ z ( + ) j+ 4
,where Z(+) i s t h e set of designs satisfying (4);
e ) 0(M) i s a convex function of M ;
c') *(z ) h a s a continuous density q ( z ) ;
f') derivatives
-
80 =k
e x i s t and are bounded f o r all designs satisfying (d).8M
Let
z(+)
t o be a set of measurest
which e i t h e r coincide with 9'o r equal t o 0.Theorem 1. & a s s u m p t i o n s (a)
-
(e) hold, t h e n t h e r e e x i s t s an o p t i m a l d e s i g nt
*€2 (4').
Proof. The existence of a n optimal design follows from (d)-(e) and t h e conr- pactness of the set of information matrices. The compactness of t h e latter
Ls
pro- vided by (a) and (b). The fact t h a t at l e a s t one optimal design h a s to belong :(*) is t h e oorollary of Liapounov's Theorem on t h e r a n g e of a vector measure (see, for instance, Karlin and Studden, 1966, Ch. VIII, Wynn, 1982).Note 1. Liapounov's Theorem leads to a n o t h e r
- -
r e s u l t which can b e useful in applications: f o r any design4
t h e r e i s a design ~ E Z ( + ) such t h a t M ( ~ ) = M ( Z ) .A function q ( z , t ) is said to s e p a r a t e sets Xi and X2 if t h e r e is a constant C such t h a t p ( z , t ) S C (a.e. *) on XI and q ( z , t)ZC (a.e. +) on X2 , (a.e. *) means "al- most everywhere with r e s p e c t t o t h e measure q".
Theorem 2.
U
a s s u m p t i o n s (a)-V) hold, w e n a n e c e s s a r y a n d sl4;riEcient c o n d i t i o n that t 8 f E ( + ) is (+,+)-optimal is that v ( z , 4 ' ) s e p a r a t e s two sets:X* =suppt0 and
x\x*.
This theorem w a s f i r s t formulated by Wynn, 1982; but i t s proof was not per- fect. Therefore, w e give t h e newer one which i s also more illuminative f o r t h e for- mulation and analysis of t h e numerical procedures.
Proof. Necessity.
Consider t w o designs:
to
andt ~ ? ( + ) .
LetAssume t h a t
#'
i s (*,*)-optimal. Then f o r any design#
(see (f)):o s /
cp(z,tS)#(&I
X
From t h e definition of cp(z ,t):
and, t h e r e f o r e , f o r any E and D :
j t + j
c p*(&I.
E D
This p r o v e s necessity.
Sr4fSicisncy. Consider designs # * and
# ~ z ( + )
satisfying t o (7) and (8) and as- sume now t h a t#*
i s nonoptimal, i.e.Let 7 = ( 1 - a ) t * + a
#
,a m
and#
i s now ( 9 , +)-optimal. Then, t h e convexity of @ leads t h e n to t h e inequality:@ ~ ~ ( 7 1 1 d ( 1 - 4 9 C M ( ~ * ) I + a @ CM(#)I (10) r (1 -a) 9 [M(#*)]
+
a t@[M(#*)]-bj=
9 [M(#*)]-a d.
Assumption (f) and inequality ( 8 ) lead t o t h e inequality
@ [ M ( ~ ) I
=
4 [M(t*)l + aj
cp(z, t o )
#(&I + o ( a )=
(11) X2 @ CM(#*)I + 0 ( a )
where E a n d D d e s c r i b e t h e difference between t h e supporting sets f o r
t *
and#.
When a + , t h e comparison (10) and (11) gives a contradiction. This completes t h e proof.
Note 1. If instead of (c), one uses ( c ' ) , then a necessary and sufficient condi- tion can b e formulated in t h e form of t h e following inequality:
max cp ( z , t * ) d min cp ( z , t o )
2
a*
zm x *
Note 2. If (f) i s complemented by (f), then
cp (z,#) = 7 ( z . 0 - t r 9 ( t ) M(t) , where b(z , t )
=
f' T ( z )&
(4) f' (Z ), and (12) c a n b e converted tomax y ( z , t * ) d min v ( z , # ' ) z
a*
zm x *
3. Numerical Procedure of Exchange Type
Theorem 2 gives a hint on how t o construct optimal designs numerically: if f o r some given design # one c a n find a couple of sets:
t h e n i t i s hoped t h a t t h e design
7
withsupp
7 =
s u m # \ D U Ewill b e "better" than
#.
The repetitions of t h i s p r o c e d u r e c a n lead t o a n optimal design.A number of algorithms based on t h i s idea c a n b e easily invented. In t h i s pa- p e r one of t h e simplest algorithms i s considered in detail and i t i s evident t h a t thorough consideration of o t h e r s from t h i s c l u s t e r i s r o u t i n e technique.
In what follows, t h e fulfillment of (c') i s assumed.
ALgorithm. Let
-
lim 6,
=
0 , limx
6,= - and lim x
6: =
k <- .
s +- s + - , = i S +-
Step a. T h e r e is a design #,
EZ(+).
Two setsD,
and E, with equal measures:and including, correspondingly, points:
z
=
A r g max 6 ( z ,#, ) and zz,=
A r g min 6 ( z ,#,
) ,+ + e m
where XI, =supp
t,
and Xzs =X\X1,, h a v e t o b e found.Step b. The design
#,
with t h e supporting setSUPP
ts
+ I=
Xl(S +1)=
XIS \DsUES
i s constructed.
I t e r a t i v e p r o c e d u r e (14)-(16) i s based on t h e approximation (6+0):
The analysis of i t e r a t i v e p r o c e d u r e (14)-(16) becomes simpler if (g) f o r any design
#€z(+):
IM ( 0 1 2 ( > O
This assumption i s not v e r y r e s t r i c t i v e . If, f o r instance, $ ( z ) 2 q >O and t h e functions f ( z ) are linearly independent on any open finite measure subset of X, t h e n (g) i s valid.
Most optimality c r i t e r i a (g) lead t o t h e fulfillment of t h e following inequalities:
f o r any
€EE(*).
Otherwise (17) i s supposed to b e included in (g).Theorem 3. a s s u m p t i o n s (a), 0, (c '), (e)-(g) hold, t h e n lirn O [ M ( t
)I = iw
O [M(€)]=
0'S *- C
R o o f . The a p p r o a c h i s s t a n d a r d f o r optimization t h e o r y (in t h e statistical l i t e r a t u r e s e e , f o r instance, Wu and Wynn, 1978). T h e r e f o r e , some elementary con- siderations will b e omitted.
Expanding (see (g) a n d (17)) by a Taylor s e r i e s in 6, gives:
where
1 % 1
SK,=K,(Kl,K2,K3). Due to t h i s inequality a n d (14) t h e sequence S2,= tx
K~ 6:j converges. By definition:s
and, t h e r e f o r e , t h e sequence:
s,, = C
6, [7(z2, 9 € s )-
Y (2,s 9 € s ) ls monotonically d e c r e a s e s .
F r o m (g) and (19):
K120[M (€2+1)]
=
@[M(t0)I + Sls + s 2 s 2 @ * leads t o t h e boundness of SlS.
Subsequently, t h e monotonicity of
IS,,
provides i t s convergence and t h e con- vergence of 0[M(€,
)] j.
Assume t h a tlirn @ [M(€,)]
=
2 @*+a , a > o .s +- (20)
Then, from Theorem 2 and assumptions (b), (c') i t follows t h a t
and
lirn SlS r b lim
x
6, = -ao,s +- s +-
lim O [M(€,)]
s -.
s +-
The contradiction between (20) and (21) p r o v e s t h e theorem.
Note 1. In (14)-(16), t h e r e i s some uncertainty in t h e choice of Ds and Es.
Somehow, t h e y have to b e located around z
,,
and z2,. When $(z ) = const (and one a r r i v e s at t h i s c a s e by t h e transformation &=$(z)&), t h e n zls and z2, could be t h e "geometrical" c e n t e r s of Ds and Es.
Note 2. The i t e r a t i v e p r o c e d u r e can b e more effective (especially in t h e f i r s t s t e p s ) if t h e r e i s a possibility to easily find
and
subject t o
Note 3. When 6, i s sufficiently s m a l l and
J
f ( z ) f T ( z )S
( 2 ) d z * f ( z l s ) f T ( z l s ) 6, Dthen, t h e calculations in (14)-(16) can b e simplified if one use t h e following recur- sion formula (see, f o r instance, Fedorov, 1972)
(M*bffT)
=
( I T 6 ~ - I ff) a-I i r t 6 f T M-11
The modified version of t h e algorithms, presented in Note 2. gives a hint f o r t h e construction of
Algorithm 2.
S t e p a. The same as (22). but instead of (23)
(no constraints on t h e sizes of D, and E, !).
Step b. Coincides with s t e p b of algorithm 1.
This algorithm seems to b e r a t h e r promising f o r changing t h e s t r u c t u r e of a n initial design
€,
rapidly. but i t allows some oscillation regimes, at least principally.The a u t h o r failed t o prove i t s convergence. Probably some combination of both considered algorithms (for instance, t h e majorization of (24) by some vanishing se- quence 6, ) could b e useful.
4. Exchange algorithm in the standard design problem
The possibility of changing t h e algorithms similar to (14)-(16) f o r design problem (3) (without constraint (4)) w a s somehow overlooked in t h e design theory.
Atwood (1973) proposed a very similar algorithm but based on (5) and t h e r e f o r e handling all supporting points in design
€, .
The simplest analogue of (14)-(16) can b e formulated as follows:
Step a. There i s a design
€, .
Two pointsz 1,
=
Argmax v ( z ,€,
) and z 2s=
Argminv(z,t,
) ,x % x u (25)
where
& =
suppX, have to b e found.Step b.
where t ( z ) i s a design with one supporting point z
.
The sequence id,
1
can b e chosen as in (14). The convergence of t h e algorithm can b e proven similarly t o Theorem 3.It is worthwhile noting that the convergence of procedures (25), ( 2 6 ) , in the discrete case (when 6,
=
K / N , a N - I , where N i s the total number of observa- tions) i s questionable. because proof of Theorem 3 i s essentially based on the fact that 6,4.REFERENCES
Atwood, C.L. (1973) Sequences Converging t o D-Optimal Designs of Experiments, Stat.. l, 342-352.
Ermakov, S. ed. (1983) Mathematical Theory of t h e Design of Experiments (in Rus- sian), Moscow, Nauka, p. 386.
Fedorov, V. (1986) Optimal Design of Experiments: Numerical Methods, WP-86-55, Laxenburg, Austria, International Institute f o r Applied Systems Analysis.
Gaivoronsky, A. (1985) Stochastic Optimization Techniques f o r Finding Optimal Submeasures, WP-85-28, Laxenburg, Austria, International institute f o r Ap- plied Systems Analysis.
Karlin, S. and W.J. Studden (1966) Tchebycheff Systems: With Applications in Analysis and Statistics. N e w York: Wiley & Sons, p. 586.
Krein, M.G. and A.A. Nudelman (1973) Markov Moment Problem and Extremal Prob- l e m s , Moscow, Nauka, p. 552.
Munn, R.E. (1981) The Design of Air Quality Monitoring Networks, Macmillan Pub- lishers LTD, London, p. 109.
Wu, C.F. and Wynn. M. (1978) The Convergence of General Step-Length Algorithms f o r Regular Optimum Design Criteria, Ann. Statist., 6, 1273-1285.
Wynn, H. (1982) Optimum Submeasures With Applications To Finite Population Sam- pling in "Statistical Decision Theory and Related Topics 111", 2, Academic P r e s s , N e w York, pp. 485-495.