• Keine Ergebnisse gefunden

Let us see, in very brief outline, how the problem of allocating a lim- ited testing budget among both chemicals and possible tests can be cast as a fairly standard problem in Bayesian statistical decision theory.31 We also want to see "how bad" the large number problem really is: we want to know about how rapidly the computational burden of the optimization exercise proposed below for sequencing tests grows with problem size.

"see, for example, Blackwell and Girschck (1954), Ferguson (1967), or de Groot (1970).

Remember that these a r e expositions of the theory of Bayesian stati.tistica1 decision theory.

The computational implementations of t h a t theory in large-number problems raises addi- tional, and somewhat novel, problems.

S t a r t w ~ t h our P ~ r s t task casting the problem as a statistical d e c ~ - slon problem. Figure 1 is a n illustration of the way in w h c h o u r four ways of gathering information on a particular chemical might b e deployed agalnst a single chemical. Where we have t o r e s o r t t o all four information-generating opportunities, we might successively improve our e s t i m a t e s of t h e (exposure-adjusted logarithmic)32 potency ki of t h a t i f h chemical; t h e four, presumably successively improved, e s t i m a t e s a r e k t 1 ) , k t 2 ) , kJ3) , ki(4) in Figure 1, and a r e obtained a t costs

The k i ' s i n t h a t d i a g r a m a r e of course heuristic, for we begin with imperfect knowledge of k i , a n d hopefully improve o u r e s t i m a t e a s we spend m o r e on information o n t h a t chemical. But a t e a c h s t e p we have only a m o r e o r less narrow probability distribution defined o n ki. What follows below is t h e s t a n d a r d Bayesian calculus for sequential revision of a n initial, o r prior probability distribution f o(ki) o n t h e (exposure- adjusted logarithmic) potency of a single chemical. We write down t h a t calculus a s if t h e r e were only one chemical t o be t e s t e d a n d a s if t h e four t e s t s were t o be m a d e in t h e sequence indicated o n Figure 1. We do so because half t h e a r t of applied Bayesian analysis lies i n choosing a good probabilistic c h a r a c t e r i z a t i o n of t h e kinds of information one has avail- able;33 l i t e r a t u r e s e a r c h a n d biological t e s t results do not naturally come in t h e form of joint probability distributions, a n d t h e usefulness of s u c h information d e p e n d s crucially u p o n choice of a n appropriate form. The

3 2 ~ e e note 23 above.

3 3 ~ e c a l l the difference between mechanical invocation of Bayes' Theorem and real-work in- duction; see note 22 above.

reader 1s warned t h a t , Ln the general multic\emlcal sequentlal case, not all c h e m ~ c a l s w ~ l l be subjected to all tests.34 Of course, t h a t would be ruled out in any event slnce the testing budget 1s constrained. But the dlstrlbutions below are the essential building blocks of that general sequential case, and for that reason we have taken c a r e in defining and specify~ng t h e m .

Introduce notation as follows:

Prior probability distribution on t h e (exposure-adjusted logarithmic) potency k i

1, (DATALIT[ i]) Joint distribution of DATALIT[i], ki f ,(ki

I

DATALIVT[i] ) Post literature s e a r c h distribution of ki

l,(STRUCTURE[i], ki) Joint distribution of STRUCTURE[i], ki f z(ki

I

STRUCTURE[i]) Post structure-activity correlation distri-

bution of ki

l,(AMES[i], ki ) Joint distribution of AMES[~], ki

f

3(ki

I

AMES[il) Post short-term testing distribution of ki

~,(BIoASSAY[~], ki) Joint distribution of BIOASSAY[~] and ki f s(ki

I

BIOASSAY[~]) Post-bioassay distribution of ki

3 4 ~ e r e we a r e sloughing over many subtleties and many potential problems. The complexity result will depend upon how the problem is cast; there is no b e s t way. The worst-case results typically of complexity theory may not b e particularly helpful a s guides to t h e computation problem for real data i n this area. In any event, t h s is work i n progress and work t o be done.

The successive distributions f o, f f z , f of the variable ki a r e related by the usual Bayes' Rule revision formulas:

In e a c h successive equation, we have simplified notation by suppressing some of the previous stage conditioning values: thus f ,(ki) in the second equation stands for l l ( k i

I

DATALIT[i]), and so on.

Nothing is easier t h a n writing down formalisms; much more difficult is the prior conceptual work guiding the choice of formalization. What, then, c a n we say about the appropriate forms of the functions f

,.

f 2, f 3, 11, 12, l3 which we have so blithely written down above? Else- where we have written on t h s question; here we content ourselves with a few words on the logic of those recommended initial ch.oices, since the real work of implementation will require substantial refinement of those initial choices.

Each such choice reflects a commitment to a theory of, or a t least a view of, the process by w h c h the information to be exploited came into existence. It may be plausible to suppose t h a t chemicals to which more

individuals are exposed and which are more toxic have drawn more atten- tion from toxicologists and epidemiologists:35 that supposition guides one form of the joint distribution l l . It may be plausible to suppose that structure-activity correlation provides good relative, but poor absolute, information on the ki variables. Again, that supposition leads immedi- ately to a particular function form for the joint distribution 12. Similarly, for short-term or bacterial testing, the relevant supposition is that such tests discriminate powerfully between noncarcinogens and carcinogens, but only poorly between carcinogens differing, even by a few orders of magnitude, in carcinogenic potency. For long-term or bacterial tests, the relevant supposition is that such tests give good information on

Ic,,

if a t relatively h g h cost. These latter two suppositions, like the first two, lead naturally to formalizations of the corresponding joint distributions, here lg and lq.

Now let us remember t h a t our real problem involves a decision about which tests we will apply to which chemicals and in which order. Because of the "large numbers problem", thls is naturally posed a s a sequential decision problem, but only can be practical if the computational burdens imposed by the large numbers problem are not overwhelming.36 So let us pose, more or less rigorously, the sequential decision problem we face, and then let us see how rapidly the computational burden grows with the

"problem size." The obvious measure of problem size here is, of course, the number of chemicals N c .

35~gain, we take note of the importance of surprises in toxicology; see note 25. The real question remains: how t o characterize the existing literature 8s an information resources, and how t o use i t efficiently.

'%ee for example, Aho, et al. (1974) or Garey and Johnson (1979).

Thls decislon problem, like any other, must be driven by an objective function describing just what we are trying to accomplish with a toxic chemicals testing program. Here is one such objective function; others are possible and may even be better, but one will do for illustrative pur- poses.37 The testing program optimization problem is taken as

Here we have drawn on our assumption that the benefits associated with individual chemicals are independent and additive; b is the benefit per chemical, net of (internal) production costs, but gross of possible exter- nality costs arislng from introduction of that chemical into commerce.

The subscript n ( i ) is an ordered subset of the integers, 0, 1, 2, 3, 4 and indicates those tests w h c h have been run, and the order of w h c h they were run, in the optimum program, on chemical i. If none have been run, it consists of the single value 0. The probability distribution fn(i)(ki) is the result of Bayesian revision in the order in which tests are performed.

The multiplicative coefficient g ( i ) is 0 or 1, as the chemical is banned from or allowed into commerce. Thus, t h s objective is nothing but t h e expected net benefits of chemicals remaining in commerce.

Given this (or any other plausible) objective function, we can turn to the problem of constructing the implied optimum program. The theoreti- cal problem was settled long ago by the work of Wald, Blackwell and Gir- s c h i ~ k , ~ ~ and others.

37~gain, this is work in progress on TSCA implementation a t Resources for the Future.

"see Wald (1847) and Blackwell and Girschick (1954).

Here is a very brief summary of what t h a t line of work tells us. Sup- pose we are given a loss function for a decision problem. That loss is defined on A x S , with A the space of actions and S the set of states of nature. We do not know w h c h s t a t e of nature prevails, but we can, a t cost

c j , make an observation on a random variable r j for w h c h the joint distri-

bution ( r j , s ) is known. Then Wald and Blackwell and Girschck tell us how to choose a sequence of observations, how to decide when to stop, and which action Prom A t o take when we do stop.

Our practical problem is easily seen to be similar: the states of nature are the [k,IicNc, our actions a r e [ g ( z ) ] ~ , ~ ~ , and our four kinds of tests allow us observation--at some cost--on variables whose joint distribu- tions with the ki's we think we know something. The novel feature of our problem is the large numbers problem: how reasonable a r e t h e Wald- Blackwell-Girschlck rules when t h e number of chemicals

NC

becomes large, say 1,000 or even 10,000? If the time t o compute a good testing program is bounded by some fairly low-order polynomial in

I I N C

, say

IN^^^,

t h ~ n g s may be tolerable. If the dependence is exponential, say exp

IN^ I],

the scheme described above is obviously of no practical impor- tance. It is easy to show that the bound is polynomial; for the two-test case, it is exactly

1

N C l3 39

i .

We mention here t h a t the application to t h e chemicals case of the Wald-Blackwell-Girschick apparatus is not exactly straightforward, in p a r t because the t e s t s we have described give infor- mation on many of the ki's simultaneously.

3 9 ~ e e note 36.

3. PERSPECTIVE 2: DECENTRALIZED INFORMATION-GATHERING AND