• Keine Ergebnisse gefunden

Polynucleotide Evolution and Branching Processes

N/A
N/A
Protected

Academic year: 2022

Aktie "Polynucleotide Evolution and Branching Processes"

Copied!
34
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

NOT

FOR

QUOTATION WITHOUT PERMISSION

OF

THE

AUTHOR

POLYNUCLEOTIDE EVOLUTION AND BRANCHING PROCESSES

Lloyd Demetrius*

Peter Schuster**

Karl Sigmunde**

August 1984 CP-84-36

*Max-Planck-Institut fiir Biophysikalische Chemie, Gcttingen, BRD

**Institut f i r Theoretische Chemie und Strahlenchernie der Universitiit Wien, A-1090 Wien, Austria

***Institut fiir Mathernatik der Universitat Wien, A-1090 Wien, Austria

C o l l a b o r a t i v e P a p e r s report work which h a s not been performed solely a t t h e International Institute for Applied Systems Analysis a n d which has received only limited review. Views o r opinions expressed herein do not necessarily represent those of the Institute, i t s National Member Organizations, or other organizations supportins t h e work.

ThTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS AVALYSIS 2361 Laxenburg, Austria

(2)
(3)

In this paper t h e a u t h o r s a r e concerned with t h e upper bound on t h e length of genomes imposed by t h e e r r o r r a t e (the frequency of inaccu- r a t e replication) of nucleotides. This limiting law was originally derived from a deterministic chemical model by Eigen. The a u t h o r s make a con- nection between these deterministic chemical equations a n d t h e theory of multitype branching processes in order t o study certain properties of the Eigen kinetic equations a n d t o generalize t h e e r r o r threshold cri- t e r i a This generalization is based on a criterion for t h e extinction of branching processes.

This research was carried out as p a r t of t h e Feasibility Study on t h e Dynamics of Macrosystems in t h e System and Decision Sciences Pro- gram.

ANDRZEJ WIERZBICKI Chairman

System and Decision Sciences

(4)
(5)

The theory of multitype branching processes is applied t o t h e kinet- ics of polynucleotide replication. The results obtained a r e compared with t h e solutions of the deterministic differential equations from conven- tional chemical kinetics.

(6)
(7)

POLYNUCNOTIDE EXOLUTION AND BRANCHING PROCESSES

Lloyd &me trius, Peter Schuster and Karl S g m u n d

Allometric relations, which s e t limits t o t h e growth of organisms based on certain physical laws, a r e very common in nature. For example, t h e height of trees is restricted by t h e strength of wood and t h e capacity for water transport of the trunk, t h e size of insects is restricted by t h e r a t e of oxygen transport through capillary diffusion, and t h e body weight of vertebrates is limited by t h e carrying capacity of t h e skeleton. We a r e concerned h e r e with a closely related limitation a t t h e molecular level: an upper bound on the length of genornes imposed by the e r r o r r a t e ( t h e frequency of inaccurate replication) of nucleo- tides. This limiting law was derived from a deterministic chemical kinetic model (Eigen, 1971; Eigen and Schuster, 1979) and is based on the relation between t h e r a t e of production of accurate replicas of molecules and t h e mean total productivity. This paper describes a connection between t h e determinis- tic chemical equations and t h e theory of multitype branching processes. We study this connection, in particular t h e matrix of mean values of the branching process, in order t o look a t certain properties of t h e Eigen kinetic equation and t o generalize t h e e r r o r threshold criteria*. This generalization is based on a criterion for t h e extinction of branching processes.

Eigen (1971) postulated a formal, phenomenological kinetic equation

t o describe t h e evolution of a population of replicating units under t h e ideal- ized experimental conditions of a dialysis reactor (see, e.g., Kiippers, 1979). We shall call these units types and represent t h e m by

11,

..., 1,. We use zi t o denote

*The expectation of a replication error must always remain below a sharply defined threshold (the srmr throshcld) if the information accurnulated in the evolutionary process is not to be lost.

(8)

the relative concentration of type

Ii:

All concentrations z j a r e positive and hence the physically accessible domain of variables is restricted to a unit simplex

The flow term p is given by

The elements wij a r e constants which will be discussed in detail later in the paper. We shall simply note here t h a t they are constructed from rate constants and mutation frequencies in accordance with the replication mechanism. The deterministic equation (1) has been subjected to rigorous mathematical analysis (Thompson and McBride, 1974; Jones e t al.. 1976; Swetina and Schuster, 1982; McCaskill, 1984a; Eigen e t al., 1984), and its solutions have been obtained in terms of the eigenvalues a n d eigenvectors of matrix W

=

twijj.

The deterministic equation (1) has a number of serious drawbacks when applied to realistic experimental systems. In the case where t h e replicating units a r e polynucleotides, t h e case we are basically interested in here, there are three sources of stochasticity which a r e of particular importance:

Finite popdcrfion size. The number of potential types, i.e., t h e number of possible, different polynucleotide sequences, is extremely large (4' for a length containing v bases). Thus, the number of potential types is far greater than the total number of molecules available in any experimental set-up or in nature. Only a tiny fraction of these sequences can possibly be present a t any time t . Thus, the population size truncates the existing mutant distribution and introduces a stochastic element into the dynam- ics of replicating ensembles of polynucleotides. In t h e case of high replica- tion fidelity, this truncation affects the many types of molecules which are present in only small numbers in the stationary mutant distribution. In cases of low replication fidelity, i.e., in systems which replicate with accu- racies below t h e error threshold, the deterministic description given by

(9)

equation (1) fails completely. Indeed, t h e deterministic solution predicts t h a t all types a r e present in equal amounts. This is impossible since we really cannot have less t h a n one molecule of a given type. What we should expect, therefore, is a steadily changing population of polynucleotide sequences, with some dying o u t while others appear through mutations.

No stationary distribution of m u t a n t s can ever exist in the r e a l world (Swetina and Schuster, 1982).

2. Kzmtic degeneracy. Conventional deterministic equations a r e unable to handle cases of kinetic degeneracy, i.e., situations in which two or more types have identical kinetic r a t e constants. In this case t h e relative con- centrations of such molecules a r e determined by random drift (Schuster a n d Sigmund, 1984a,b).

3. Complez dynamics. Sensitive dependence on t h e initial conditions can give rise to a third source of stochasticity in autocatalytic systems.

Although this kind of stochastic dynamics (often described as chaotic behavior) arises in some complicated networks of replication processes, i t does not occur with equation (I), and hence will not be discussed any f u r t h e r here.

The formal description of chemical reactions by stochastic processes has a long tradition (see, for example, t h e review by McQuarrie, 1967). More recently, new analytical techniques for t h e study of chemical master equations have become available and this fact has revived i n t e r e s t in stochastic approaches t o biochemical reaction systems. Equation (1) is essentially multi-dimensional, a n d this makes any analysis of t h e corresponding m a s t e r equation particularly difficult (Ebeling and Feistel, 1977). Some attempts t o study t h e m a s t e r equa- tion of a n ensemble of replicating polynucleotides under r a t h e r radical simpli- fying assumptions have been made by Jones and Leung (1981), Heinrich and Sonntag (1981), and Schuster and Sigmund (l984a). Inagaki (1982) reported a study of replication with random mutations using a Langevin-type equation. So far t h e only error threshold relation derived from an underlying stochastic model was obtained in a very r e c e n t study by McCaskill (1984b), which however makes several drastic approximations.

By contrast, t h e approach described h e r e adopts a less general but very powerful method: t h e theory of branching processes. This theory, which was originally developed to deal with the extinction of family names, has been applied to a great variety of physical and biological problems since t h e forties.

(10)

The mathematical background can be found, for example, in Harris (1961), Athreya and Ney (1978) a n d Jagers (1975); t h e main properties a r e summarized in Section 3. We shall apply this concept to polynucleotide replication and relate branching processes to the deterministic equation (1). In particular, we shall analyze the "freezing in" of fluctuations which makes t h e results of t h e deterministic model so reliable. Section 4 i s concerned with the probability of extinction, and t h e e r r o r threshold relation is derived in a stochastic context.

The original experimental set-up (Eigen, 1971) can t h e n be broadened consider- ably in t h e light of these results.

I t

h a s already been shown t h a t the results hold for most evolving systems (Eigen a n d Schuster, 1979). We a r e now in a position t o extend t h e theoretical predictions even further, to systems with a discontinuously changing environment. Sequential sampling or. more gen- erally, any sequence of alternating phases of growth and sampling is amenable t o a similar analysis, t h u s validating t h e threshold relation for conditions close to those under which molecular evolution o c c u r s in n a t u r e . The paper con- cludes with a discussion of "complexity", which we i n t e r p r e t in a different way to t h e concept of algorithmic complexity used, for example, by Ebeling and Jimenez-Montano (1980). The notion of complexity discussed here is a n entropy-based invariant which describes t h e frequency with which t h e r e a r e mutations back t o t h e "wild type" (see later). The complexity parameter, like t h e extinction p a r a m e t e r on which o u r threshold criterion is based, is a func- tion of the m e a n value m a t r i x of t h e branching process.

2. POLYNUCLEOTIDE REPLICATION AS A

MULTITYPE

BRANCHING PROCESS

If we t r y t o describe polynucleotide replication in t e r m s of elementary s t e p kinetics we obtain a n exceedingly complex reaction network (Biebricher e t al.,

1983). However, in m a n y cases we can dispense with most of t h e details a s long a s we retain certain important steps in a simplified reaction mechanism (see, for example, Gassner and Schuster, 1982). The basic features of selection a n d evolutionary o p t i n ~ i z a t i o n can be derived from a crude dynamical model which represents the whole polymerization process a s one single reaction step. In this simplified model it is only necessary t o distinguish between faithful repli- cation and mutation.

(11)

2.1 Discretetime branching processes

2.1.1 IPransitwn probabilities

Consider a population consisting of m types of polynucleotides, I 1 , . . . , I m . Each polymer of type Ii can g e n e r a t e polymers of t h e s a m e type (Ii -r 2 4 ) by faithful replication o r polymers of different types

(4

+ Ii

+ 4)

by false replica- tion, i.e., m u t a t i o n . Molecule replications a r e assumed t o be homogeneous in t i m e a n d m u t u a l l y independent. We shall first assume t h a t t h e molecules exist in discrete generations. In every generation, each po\ymer of type

4

produces

r

polymers of type 11,

r 2

polymers of type I2 a n d so on up t o rm polymers of type I,, with probability P,(T ,..., 7 , ) .

Let Z , ( n ) denote t h e total number of polymers of type I. in generation n , where t h e vector Z ( n )

=

t Z l ( n ) ,

...,

Z m ( n ) ] is a random variable.

In o r d e r t o illustrate the transition law for t h e stochastic process we shall t a k e Z ( 0 )

= q,

where

q =

( 0 .

...,

1 ,

...,

0 ) is t h e unit vector in t h e direction of type I,. This implies t h a t t h e population consists of a single polymer of type I , a t t i m e n

=

0 . In t h i s c a s e t h e probability generating function of Z ( 1 )

where

p b ) ( ~ ~ , . . . , Z ~ )

=

Prob ( Z l ( n )

= ~ ~ . . . . * z ~

( n )

=

Zm * is of t h e simple form

We m a y make t h e following generalization: if Z ( n )

=

( Z l. . . . , Z m ) r e p r e s e n t s t h e distribution of polymers in generation n , t h e n Z ( n - + 1 ) is t h e s u m of Z l +

. - . +

Zm independent random vectors of which a number Z 1 have gen- erating function f . l , Z 2 have generating function f 2, and so on. We may t h u s dispense with t h e explicit formula which is r a t h e r lengthy and not very infor- mative.

(12)

2.1.2 ?'he mean v a l u e rnatriz

For reasons which a r e physically obvious*, we assume t h a t first moments

exist for all i and j. Thus, mi, is the mean number of polymers of type

4

derived from a polymer of type I, within one generation. In t e r m s of generating functions we have

We a r e clearly dealing with non-negative first moments

T,

2 0. Unless other- wise stated, we shall assume that t h e matrix

M =

tmijj is positively regular.

i.e., there exists a n n

>

0 such that

P

has strictly positive elements. This implies t h a t

Y

is irreducible: each type Ii can be derived from every other type

4

by a series of mutations. (Mutation models which consider only point muta- tions. such as that analyzed by Swetina and Schuster (1982), a r e generally based on non-zero probabilities'of mutation over a sufficiently large number of generations. In more sophisticated models which include deletions and inser- tions i t might be advantageous to have disjoint s e t s of types.)

According to t h e Perron-Frobenius theorem (see, e.g., Karlin, 1974), the matrix

M

has a unique eigenvalue X

>

0 which is dominant i n the sense t h a t Ipl

<

X for every other eigenvalue p of

M.

The eigenvalue A is non-degenerate (or simple): there exist right and left eigenvectors, denoted by

u

and

v,

respec- tively, where

u, >

0 and vi

>

0 for all i

=

1.

...,

rn, such t h a t

Mu=Au and v M = A v

.

( 7 )

Both eigenvectors a r e normalized in a special but very useful manner:

The matrix T

=

itij

=

viujj is idempotent.

T~ =

T , and i n addition we have TM = M T = h T and l i m A * P

=

T

.

n +-

*In

r e d systems we always d e d with finite populations in finite time and in this case the expecta- tions do not diverge.

(13)

No o t h e r eigenvalue p of

M

is associated with a n eigenvector whose components a r e all strictly positive.

2 . 1 . 3 Probabilities of e z t i n c t i o n

A population is said t o become extinct if ~ ( n )

=

0 for some n

>

0 . Let qi denote t h e probability of t h i s event given t h e initial condition Z ( 0 )

=

e i :

pi =

Prob

i3n

s u c h t h a t ~ ( n )

=

0

1

Z ( 0 )

=

ei j

.

( 1 0 )

The vector q

=

( q l , . . . , q m ) is given by t h e smallest non-negative solution of t h e equation

where f ( s )

= t

f I(s),..., f m ( s ) j and t h e f i ( s ) a r e given by ( 4 ) .

Conditions for extinction c a n be formulated in t e r m s of t h e dominant eigenvalue h of

Y:

(i) if h 4 1 t h e n qi

=

1 for all i a n d extinction is certain,

(ii) if A

>

1 t h e n qi

<

1 for all i a n d t h e r e is a positive probability of survival t o infinite time.

2.1.4 A s y m p t o t i c f r e q u e n c i e s

The frequency of type

Ii

in generation n is a random variable defined by

provided t h a t t h e denominator is non-vanishing, i.e., t h a t t h e system does n o t become extinct.

If A

>

1, t h e r e exists a random vector

W =

( W 1 ,

..., W,)

a n d a scalar random variable zu s u c h t h a t with probability 1

(14)

where

u

is t h e right eigenvector of M from (7). I t follows t h a t lim & ( n )

=

ua

n

--

u1

+

. . . +u.,,&

holds almost everywhere provided t h a t t h e population does not become extinct.

Equation (14) a s s e r t s t h a t t h e random variable & ( n ) representing t h e fre- quency of type

4

converges almost surely to a c o n s t a n t (provided t h a t w # 0).

This asymptotic behavior of t h e random vector X ( n ) is in s h a r p c o n t r a s t t o t h a t of t h e population distribution Z ( n ) and t h e total population size

~ ( n )

= xi

& ( n ) . Because of t h e autocatalytic n a t u r e of t h e replication pro- cess, Z ( n ) m a y experience large fluctuations in t h e initial phases which persist a n d even a c c u m u l a t e in subsequent generations (see, e.g., Schuster, 1983). In t h e l a t e r stages of t h e stochastic process t h e system e i t h e r becomes extinct o r grows very large (with probability 1). In t h e l a t t e r case t h e law of large n u m b e r s implies t h a t fluctuations i n relative concentrations will be small.

The behavior of t h e random variable w c a n be described completely by results obtained by Kesten a n d Stigum (1966). We have e i t h e r

(i) w

=

0 with probability 1 (15)

(which is always t h e c a s e if

X c:

1). o r

(ii) E ~ W I Z ( O ) = ~ ~ = V ~ , (16)

where ui is t h e i - t h component of t h e left eigenvector

v

of M (see equation 7).

A necessary and sufficient condition for (16) t o hold is

E ( q ( 1 ) log ~ , ( 1 ) 1 Z(0)

=

e i j

<

= for 1 s i. j s m

.

(17)

This condition of finite population size clearly holds for all real populations. If t h e population initially consists of a single polymer of type

4 ,

t h e distribution of w displays a jump of magnitude q i a t zero and h a s continuous density on t h e s e t of positive numbers.

(15)

2.2 Continuous- time branching processes 2.2.1 R a n s i t i o n probabilities

The assumption of discrete generations generally applies t o populations with external (and sometimes also internal) clocks which prevent t h e mixing of generations. Such conditions a r e often found in n a t u r e , e.g., in populations whose breeding periods a r e fixed by seasonal requirements. In chemical sys- t e m s t h e r e a r e usually n o s u c h regulators. Indeed, if we s t a r t polynucleotide replication in an initially synchronized population t h e synchronization i s lost within a few rounds of replication. Continuous-time multitype branching Mar- kov processes offer a n a c c u r a t e description of polynucleotide replication, b u t one which is technically quite complicated. The basic results a r e similar t o those obtained in t h e discrete case, however, a n d a r e summarized briefly below.

In t h e continuous-time model we suppose t h a t , independently of t h e o t h e r polymers, a polynucleotide of type Ii persists for some random period of t i m e (which h a s a n exponential distribution a n d mean

&T')

a n d t h e n g e n e r a t e s copies by replication a n d mutation according t o a distribution whose g e n e r a t - ing function is f i ( s ) . This i s t h e case if it i s assumed t h a t in a t i m e interval of length At, u p t o probability o(At), t h e polynucleotide m u s t experience one of t h e following:

(i) no change (ii) i t "dies o f f , o r

(iii) i t survives and produces a copy of type

5

( j

=

1,

...,

m).

The time-homogeneous probabilities of events (ii) a n d (iii) a r e proportional t o At. up t o some o (At ). As before, we l e t & ( t ) denote t h e total n u m b e r of polynu- cleotides of type

4

a t t i m e

t ,

and the random vector ~ ( t )

=

( Z l ( t )

,....

Z,(t)) denote t h e distribution of types.

2.2.2 7he m e a n v a l u e m a t r i z

For physical reasons we a s s u m e once again t h a t all t h e first m o m e n t s

a r e finite, for all t 20. The mean value matrix ~ ( t ) satisfies the semigroup property

(16)

and t h e continuity property

where Id i s t h e identity matrix. Conditions (18) a n d (19) imply t h a t t h e r e exists a m a t r i x A s u c h t h a t

for all t 2 0. A is called t h e infinitesimal generator. The e l e m e n t s of A are given by % -

=

k c i j , where cij

=

bij

-

rJU (bij i s t h e Kronecker delta) a n d

3

k a i n we a s s u m e t h a t e a c h type can give rise t o all of t h e others.

I t

follows t h a t mij(t )

>

0 for t

>

0, a n d h e n c e t h a t A i s essentially positive, i.e.,

q, >

0 for all i # j. The Perron-Frobenius t h e o r y t h e n implies t h a t t h e r e exists a unique real eigenvalue A of A which is dominant in t h e sense t h a t i t is larger t h a n t h e real p a r t s of all o t h e r eigenvalues. The eigenvalue A i s simple (non-degenerate) a n d h a s positive r i g h t a n d left eigenvectors

u

a n d

v,

which we again normalize such t h a t

x

vi

= x

uivi

=

1. The dominant eigenvalue of M ( t ) is e M , with

u

and

v

a s associated eigenvectors. Taking tij

=

v i u j we again have lim e-M jM(t)j

= T .

t -.-

2.2.3 Asymptotic behavior

As in t h e d i s c r e t e case, t h e extinction conditions a r e given in t e r m s of A.

If, a s before, q

=

(q l,...,q,) denotes t h e extinction probabilities, t h e n q is t h e unique solution of

where

a n d

~ ~ ( 8 )

=

& ( f i ( s ) - s i )

.

(17)

We have

(i) if A

s

0 t h e n qi

=

1 for all i ; (ii) if A

>

0 t h e n qi

<

1 for all i.

F u r t h e r m o r e , we once again obtain

provided t h a t t h e process does n o t lead to extinction.

3.

EXPECTATIONS AND EIGEN'S SELECTION EQUATION

3.1 Eigen's selection equation

In Eigen (1971) a n d Eigen and S c h u s t e r (1979) t h e evolution of polynucleo- tides in a dialysis reactor was modeled by a differential equation of t h e f o r m

or, in vector notation (with W

=

(wij),

x =

( z l...Z,) a n d 1

=

( 1 ....,1 )).

on t h e u n i t simplex

Here t h e z, a r e t h e concentrations of polynucleotides of type li ( i

=

1,

...,

m ) . The coefficients wii satisfy wjj

=

A,. Qjj

-

Dj a n d wii

=

+Qij (i # j), where A,

>

0 is t h e total r a t e constant for polynucleotide synthesis on t e m p l a t e

4 ,

Dj

is t h e decay r a t e a n d QQ, t h e "quality factor", gives t h e probability t h a t a copy of a rnolecule of type

3

will be of type

I,

( f o r j # i this i s a m u t a t i o n rate). We shall first assume t h a t W is positively regular. The t e r m

is i n t e r p r e t e d as a n externally controlled "dilution flow" which keeps t h e total concentration

Cz,

constant (without loss of generality, equal to 1). The

(18)

p a r a m e t e r (p may be viewed a s t h e "average productivity" of t h e molecular population.

I t

i s easy t o c h e c k t h a t

, S

i s invariant under (26): if x ( 0 ) E

, S

t h e n x ( t ) E

Sm

for all t 2 0.

Equation (26), t h e n , was introduced a s a phenomenological equation describing t h e kinetics of self-reproducing molecules in a dialysis r e a c t o r u n d e r t h e c o n s t r a i n t of constant total population. The aim of t h i s section is t o r e l a t e t h i s equation t o multitype branching processes.

3.2 Preliminary remarks

We begin with a few simple remarks.

1. Let

be a linear differential equation with (essentially) positively regular W. If y (0) E

RF,

t h e n

is well defined a n d in

Sm

for all t r 0. x ( t ) is also a solution of (26).

2.

I t

is also possible t o obtain (26) from (30) by setting

a n d

(see Jones e t al., 1976; Thompson a n d McBride, 1974).

3. The nonlinear equation (26) is therefore easy t o solve. Any equilibrium of (26) m u s t satisfy

a n d therefore be a r i g h t eigenvector of W. There is only one such eigenvector in

Sm,

which is denoted by

u:

t h e corresponding eigenvalue 9 is t h e d o m i n a n t eigenvalue of W. F r o m t h e correspondence between (30) and (26) i t follows t h a t all orbits of (26) in t h e s t a t e space

Sm

converge to

u

(19)

4. We shall use a canonical method to link the difference equation

with the differential equation

Such an extension to continuous time cannot always be justified, of course. But if the generation length is 1 (say) then (35), or I

- v =

F(v)

- v,

implies that

If the generations are not distinct, but blend into each other, then t h e increase v ( l / n )

-

v(0) in time 1/ n is approximately ( l / n ) ( ~ ( v ( ~ ) )

-

v(o)), o r

which, in the limit, implies (36).

3.3

Multitype branching and the selection equation

The relationship between branching processes and the selection equation is summarized in Figure 1. If we start with a discrete multitype branching pro- cess Z ( n ) , then t h e values of the expectation Y(n) satisfy Y ( n )

=

MnY(0), where

M

is the mean value matrix described in Section 2.1.2. Thus Y ( n ) may be obtained by iteration from the difference equation* y'

= My.

This equation can be transformed into the selection equation in two ways:

(i) by first passing to continuous time, i.e., to the differential equation y

= V y

(with V

= M -

I d ) , and then normalizing, a s in ( 2 6 ) , which leads to

; = vx - x ( 1 - vx)

(37)

(ii) by first normalizing the difference equation, thereby obtaining

on

5;,

, and then passing to continuous time, which yields

%s difference equation is similar t o the discrete-time model given by Demetrius (1983e).

(20)
(21)

r =

(Mx

-

x ( l

.

Mx))

-

1 1 Mx

.

Multiplying t h e right-hand side of (39) by t h e factor 1

- Mx

(which does not depend on i and is always strictly positive on S,) corresponds to a change in velocity. The orbits of (39) a r e t h e s a m e a s those of

Since V

= M

- I d , equations (39) and (40) a r e identical on S,. Both are of the s a m e form a s Eigen's selection equation.

The previous discussion took a discrete process as t h e starting point. If, however, we begin with a continuous Markovian multitype branching process Z ( t ) , (t 2 0), we can either reduce i t (by discretization) to t h e discrete branch- ing process Z ( n ) , o r else obtain Y(t)

=

M(t)Y(0) for t h e expectation values Y(t)

=

E f Z ( t ) j (where M(t) is again t h e m e a n value matrix: M(1)

=

M). Y(t) is t h e n the solution of t h e linear differential equation

where

A

=

lim M(t)

-

Id

t -0 t

is t h e infinitesimal generator of t h e semigroup

~ (

). and ~

t

()

=

t eAt

.

Normali-

zation yields

on S,. This equation generally has different dynamics t o (40), but the asymp- totic behavior is t h e same. Indeed, A a n d m

=

eA have t h e same eigenvectors.

Thus

u

is t h e global a t t r a c t o r for both (40) a n d (42).

3.4 The reliability of the deterministic equation

We have seen t h a t t h e r e a r e t h r e e simple ways of getting from branching processes to a n essentially unique version of Eigen's selection equation. There remains t h e question of whether s u c h a reduction from a stochastic to a deter- ministic s y s t e m is of any practical use. The first impression rnay be t h a t i t brings n o obvious advantages. Indeed, going from the random variables t o their

(22)

expectations c a n be quite misleading, because t h e variances grow s o rapidly.

This can be verified most easily for the single-type branching process. If m a n d u2 are t h e m e a n and variance, respectively, of t h e n u m b e r of descendants of a single individual i n t h e first generation, t h e n t h e corresponding m e a n and vari- a n c e in t h e n - t h generation grow in t h e supercritical case ( m

>

1) according t o

m n a n d

u

2 m n ( m n -1) m ( m -1)

so t h a t t h e r a t i o of t h e dispersion (i.e., t h e root of t h e variance) t o t h e mean converges t o a positive constant. Thus t h e "window" of probable values of t h e random variable is r a t h e r large. (For a critical process, t h e situation is even worse: t h e m e a n r e m a i n s c o n s t a n t and t h e variance increases t o infinity.) The situation is similar i n t h e multitype case. Here t h e variance a n d correlation formulae a r e r a t h e r complicated (see Harris, 1961, for the discrete a n d Athreya a n d Ney, 1972, for t h e continuous case) but t h e r e s u l t is, once again, t h a t t h e second m o m e n t s grow so fast t h a t the averages tell us virtually nothing. Nor- malization changes this, however. The transition f r o m expectations to relative frequencies cancels t h e fluctuations. More precisely, if t h e process does not go t o extinction, t h e n t h e relative frequencies of t h e r a n d o m variables

converge almost surely t o the values ui (i

=

l...m). These a r e also t h e limits of t h e relative frequencies of t h e expectations

In this sense, t h e deterministic selection equation yields a description of t h e stochastic evolution process which is more reliable t h a n t h a t given by t h e dynamics of t h e non-normalized means. The qualitative aspects of t h e selec- tion equation r e p r e s e n t t h e "variance free" p a r t of t h e deterministic approach.

3.5 "Freezing in of fluctuations

We should s t r e s s h e r e t h a t t h e initial fluctuations in a supercritical branching process a r e "Frozen in". In order to clarify what we m e a n by this, l e t us compare two s y m m e t r i c random walks on a finite s e t

to,

1,

..., N{

of integers,

(23)

where one of t h e random walks has absorbing boundaries and t h e o t h e r reflecting boundaries. In both c a s e s t h e m e a n r e m a i n s c o n s t a n t , a n d t h e vari- a n c e converges t o s o m e positive value. In t h e absorbing case, however. t h e ini- tial fluctuations play a decisive role. Sooner or l a t e r , t h e walk r e a c h e s a boun- dary and from t h e n on r e m a i n s "frozen in". In t h e reflecting case, on t h e other hand, t h e initial fluctuation will be "forgotten" a f t e r a sufficiently long t i m e has e l a p s e d With t h i s example in mind, we say t h a t t h e initial fluctuations in a sto- chastic process

&

a r e frozen in if for every E

>

0 a n d for all n r N we have

Prob

Ivar E i h l h 1 '

provided t h a t N is sufficiently large. In t h i s s e n s e t h e "deterministic" model (i.e., t h e sequence

E l & { )

is fairly reliable if we wait sufficiently long before starting observations, because by this time t h e fluctuations will have subsided

I t is easy t o c h e c k t h a t t h e above s t a t e m e n t also holds for supercritical branching processes. It is only necessary t o note t h a t for given (large) k

>

0.

with probability 1

-

E , e i t h e r

XN =

0 or

XN >

k if N is sufficiently large. If

XN =

0, t h e n Var

l&

j

=

0 for all n r N; if

XN >

k , t h e n from (43) we have

which is smaller t h a n & provided t h a t k is s ~ ~ c i e n t l y large.

4.

THE

ERROR

THRESHOLD

The p a r a m e t e r A , t h e dominant eigenvalue of t h e m e a n value m a t r i x

M ,

plays a crucial role in t h e branching process i n t h a t extinction is c e r t a i n iff A

<

1. This relation provides both a n i n t e r p r e t a t i o n and a generalization of Eigen's e r r o r threshold relation, as we shall presently see.

4.1 Single- type branching

Let us &st consider a single type of macromolecule. In e a c h generation, a polynucleotide yields a copies before it "dies" by hydrolysis. Here a is an integer-valued random variable with probability distribution

(24)

a n d expectation

We shall a s s u m e t h a t t h e polymer is a chain consistin'g of v nucleotides and t h a t t h e r e is a fixed probability p of a single nucleotide being copied correctly*.

The assumption of some constant, single-component accuracy of replication p which i s independent of t h e molecule type and its position in t h e sequence is, of course, an oversimplification. However, t h i s assumption may be justified on physical grounds

-

for details see Eigen and Schuster ( 1 9 7 9 ) and Schuster ( 1 9 8 1 ) .

In t h i s case t h e probability t h a t a given copy is exact is p V , and t h e proba- bility t h a t

X,

t h e number of c o r r e c t copies i n one generation, is equal to some integer k , is

where m is the total n u m b e r of copies and m

>

k . The m e a n n u m b e r of correct copies is therefore

5 kPlX =

k ]

=

m q m ( ~ ) k p V k ( 1 - P v ) m 4

=

k =O k - 0 mrk

From t h e relation A

<

1 extinction is c e r t a i n iff

a n d h e n c e t h e r e is a strictly positive probability of indefinite survival iff log 0 log

I

u s

-log p 1

- p

'

*In order to avoid confusion with the notation used e!sewhere in this paper, we have chosen the letter "pp" to represent single-digit accuracy of replication. In our previous publications we have generally used q for this quantity (Eigen and Schuster, 1078).

(25)

where the approximation holds when 1 - p ( t h e probability of an inaccurate copy) is small. For fixed 5

>

1, t h i s m e a n s t h a t t h e maximum length of a polynucleotide is inversely proportional t o t h e probability of a replication e r r o r in one of its components. If this length i s exceeded, long-run survival is impos- sible.

The probability or extinction q is t h e smallest positive solution of

where @ ( s ) is t h e probability generating function for t h e random n u m b e r

X

of c o r r e c t copies, i.e.,

Under t h e previous assumptions,

r ( s )

= 5 z

s , ( ~ ) p u k ( 1 - p ~ ) m - k s k k = O m r k

4.2 Single- type branching

-

a variant case

In order t o link t h i s theory with c u r r e n t experimental work on polynucleo- tide replication, i t is useful t o introduce a slight modification. It should be recalled t h a t t h e lifetime of a polynucleotide is not a well-defined constant but r a t h e r a random variable which, t o a first approximation, h a s a n exponential distribution. On t h e o t h e r hand, t h e replication t i m e is fairly well defined, a t least u n d e r appropriate boundary conditions. It is therefore convenient t o view this time, r a t h e r t h a n t h e a c t u a l lifetime, as t h e length of a generation.

Let us assume, t h e n , t h a t in unit time, t h e molecule e i t h e r survives (with probability w ) a n d produces a copy (which is a c c u r a t e with probabilitypY), or is hydrolysed (with probability 1 -w). The survival probability w is constant if

(26)

t h e r e is no "aging" u n d e r t h e experimental conditions. A given molecule t h u s yields 0, 1 or 2 molecules of t h e s a m e type after one unit of t i m e with probabili- ties 1 - w , w ( 1 -pY) a n d wpV, respectively. The m e a n is w ( 1 + p V ) . We therefore have a non-zero probability of survival t o infinite t i m e iff equation (47) is satisfied, where t h e c o n s t a n t

5

now denotes w / 1 - w . The probability of extinc- tion is easily c o m p u t e d from (48):

q

=

m i n

[

1,

A]

UP

So far we have considered only one type of molecule. However, t h e same results also hold in a multitype situation if t h e possibility of "back mutations"

is excluded, i.e., if m u t a t i o n s from

I' (j

# 1) t o

Il

can be neglected.

In t h e g e n e r a l case, i.e., allowing all types of m u t a t i o n t o occur, i t can be difficult t o e s t i m a t e t h e dominant eigenvalue A. We refer t h e r e a d e r t o Thomp- son a n d McBride (1974) a n d Eigen and Schuster (1979) for s o m e useful inequali- ties.

The 2' x 2" m a t r i x

M

introduced by Swetina a n d Schuster (1982) provides a n interesting example. In a somewhat simplified version of t h e replication problem only two classes of components (or digits), say 0 a n d 1, a r e considered.

The polymer i s t h u s a sequence of

v

such digits and, in general, we a r e dealing with 2v different sequences. We shall assume t h a t

$

h a s replication r a t e

A,,

a n d t h a t t h e m u t a t i o n r a t e from

I'

t o Ii depends only on t h e Hamming distance k between t h e two s e q u e n c e s

-

this is t h e m i n i m u m n u m b e r of single-digit mutations needed t o t r a n s f o r m

I j

into

4 .

The elements of t h e m a t r i x

M

can t h e n be expressed by

If A, =

A

(j =

2.3...2v). if A l >>A a n d if ( 1 - p ) 2 c a n be neglected, t h e n second-order perturbation t h e o r y (see Eigen a n d Schuster, 1979; Thompson and McBride, 1974) yields

(27)

Again, t h e condition for positive survival probability r e d u c e s to (47).

4.4 Complementary replication

The basic m e c h a n i s m of polynucleotide replication does n o t lead directly t o copies of t h e templates. From t h e pairing r u l e s (G ++ C a n d A t*

U

or A o T) for t h e individual nucleotides, it is clear t h a t complementary copies a c t a s intermediates. This m e c h a n i s m is common in t h e replication of viral RNA a n d has been studied in g r e a t detail using kinetic m e t h o d s (Biebricher e t al., 1983). The two complementary polynucleotide s e q u e n c e s a r e usually called plus strands a n d minus strands

(I+

a n d I-, respectively). We c a n now apply o u r previous theory with some slight modifications. Let 1 ; . ...,

12

a n d I F . . ..,I; be t h e different types of plus a n d m i n u s strands in t h e reactor. The m a t r i x of m e a n values

M

i s t h e n a 2m x 2m m a t r i x of t h e form

I t

i s easy t o c h e c k t h a t t h e non-vanishing eigenvalues of

M

a r e just t h e s q u a r e r o o t s of t h e non-vanishing eigenvalues of

UV

(or W). Thus, t h e d o m i n a n t eigenvalue of

M

is t h e s q u a r e root of t h e dominant eigenvalue of UV. In partic- ular, if replication i s error-free, i.e., if

a r e diagonal m a t r i c e s , t h e dominant eigenvalue of

M

is m a x

[ d m :

i

=

I.

....

m j ,

From equation ( I ) , t h e deterministic r a t e equations for t h e c o n c e n t r a t i o n s zi+ a n d zi' of

I:

a n d Ii- a r e given by

(28)

where (p

= z

(&+zi-

+

A,-z:)

.

This s e t of equations was first analyzed by Eigen (1971). I t c a n easily be checked t h a t

and hence t h a t

In t h e limiting case z i

=

z;

- (&+/

a n d setting Zi

=

z:

+

z i we have

= 5 i , / ~ - z a ( z z j , / ~ ) ,

za

which is just t h e Eigen selection equation for direct, error-free copying, and leads t o t h e extinction of all pairs

4+, 4-

for which

d m

is not maximal.

4.5 The deterministic error threshold

Equation (47) is very similar to t h e e r r o r threshold relation

derived for t h e deterministic model by Eigen (1971), Eigen and Schuster (1979) and Swetina and Schuster (1982). In t h i s case, t h e molecular species Il is assumed t o be t h e m a s t e r sequence (which m e a n s by definition t h a t m l l

>

mii for all i # 1) and t h e parameter o l is its superiority, which is defined by

Here is t h e mean excess productivity (number produced minus number hydrolysed) of molecules other t h a n t h e m a s t e r sequence, i.e.,

(29)

It is instructive to compare the derivation of (47) with t h a t of (54). Ine- quality (54) is a consequence of

where Qll, t h e r a t e of a c c u r a t e replication of molecules of type 11, is again pV.

Equation (57) is derived from

a n d w l l

=

A l e l l

- D l .

This last inequality s t a t e s t h a t t h e value function, i-e., the rate of produc- tion of a c c u m t e copies of t h e master sequence 11, is higher t h a n t h e average production r a t e of aLL copies (accurate and inaccurate) of all other molecules.

This need not always be true. Let us consider an almost trivial but, nevertheless. illustrative example. Setting A l

=

3, A2

=

A3

=

4, Q l l

=

1, Qzz

=

Qa3

=

Q23

=

Q3Z

=

1/ 2, and all other Qij values and t h e degradation rate constants

Di

equal t o 0, we obtain

This leads to w l l

=

3 and E,l

- =

4. If the zero t e r m s a r e replaced by (more real- istic) small, non-vanishing terms, (58) is still violated. However, relation (58) clearly holds in physically meaningful situations when the mutation t e r m s are small. In particular, in t h e limiting case where all t h e mutation r a t e s vanish (and all the Qs a r e 1). relation (58) is an obvious consequence of w l l

>

wii for

i # 1, i.e.. of the assumption t h a t Il is the m a s t e r species. I t should, however, be noted t h a t

E-l

(and ul) a r e generally functions of z (see Swetina and Schus- t e r , 1982).

I t

i s natural to evaluate w l l and

E-l

a t t h e equilibrium state

u

However, in t h e case with no mutations we have

y =

0 for i

=

2,

...,

m , so t h a t E,l is not properly defined. In this case we consider lim which exists a n d is equal to

t

-+-

the second largest diagonal term. Thus relation (58) also holds under these conditions.

The deterministic e r r o r threshold is based on t h e assumption that, under selection, t h e r a t e of production of a c c u r a t e copies of a molecular species

(30)

becomes equal to t h e mean total productivity of all other species. The m a s t e r species will always replicate with a fidelity above t h e e r r o r threshold provided t h a t t h e mutation t e r m s of a l l o t h e r species a r e sufficiently small. The stochas- tic e r r o r threshold i s based on t h e probability of extinction. Thus, t h e require- m e n t t o operate above t h e stochastic threshold i s always a stronger condition t h a n t h e corresponding requirement in t h e deterministic case.

5. COMPLEXTY

In this section we shall introduce another p a r a m e t e r called t h e cornplez- ity, which, like t h e dominant eigenvalue, is a function of the matrix of m e a n values

M.

5.1

The

parameter

H

We assume once again t h a t the process is positively regular and write

The m a t r i x

P =

(pij) is Markovian. Let IF

=

( n i ) denote t h e stationary distribu- tion of the Markov chain. The cornplezity of t h e branching process is t h e n defined by

There is a simple relation between

H

a n d t h e dominant eigenvalue A, which is given by

where

+ = - C

nipij log

i j

The entropy-like parameter H defined by (60) represents the frequency of m u t a t i o n s back to t h e "wild type". The positive regularity of the process e n s u r e s t h a t

H

is strictly positive.

(31)

5.2

An

illustrative example

As in t h e case of t h e dominant eigenvalue A, i t is generally difficult t o com- pute

H

exactly if we allow all types of mutations to occur. However, an explicit expression can be obtained from t h e Swetina-Schuster m a t r i x (47) with

A l =

A

>

1 and

4 =

1 for j

=

2,...,2v.

H

m e a s u r e s t h e degree t o which correct and erroneous digits a r e incorporated in t h e polynucleotide, a n d assumes its maximum value when p

=

1/ 2. In this c a s e

The corresponding stochastic m a t r i x P h a s all rows equal: t h e y a r e given by the vector

Using (63), (64) and (60), we have

We note t h a t A decreases with sequence length v, while

H

increases with v.

5.3 Genealogies

We define a genealogy a s a sequence (Ik),, k, E 11

....,

r n j s u c h t h a t I is a

kn

direct copy (accurate or i n a c c u r a t e ) of for n

=

1,2,

... .

Demetrius (1983b) h a s used the Shannon-McMillan t h e o r e m on entropy (see, e.g., Billingsley, 1965) to show t h a t t h e s e t of all genealogies generated by a given individual after a sufficiently long time n falls i n t o two classes: a class

S1

in which each genealogy occurs with a high probability, and a class

S2

in which each geneal- ogy occurs with an arbitrarily low frequency. The elements of

S1

a r e called typ-

ical genealogies.

(32)

Let N o ( n ) denote the number of genealogies generated u p t o time n , and

~ ~ ( n ) t h e n u m b e r of typical genealogies. I t is known (cf. Demetrius, 1983a,b) t h a t

Tuljapurkar (1982) used the notion of Kullback distance t o show, in the context of t h e Leslie model of age distributions, t h a t

H

yields a m e a s u r e of the rate a t which a population converges to its stable distribution. Thus t h e complexity

H

yields biologically useful information which is not contained in A.

6.

CONCLUSIONS

The t h e o r y of multitype branching processes has been shown to provide a n appropriate basis for t h e description of replication in biophysics. The main results derived from t h e deterministic differential equations of conventional chemical kinetics a r e valid, on t h e average, for t h e corresponding stochastic processes. This is basically a consequence of t h e i m p o r t a n t principle by which the initial fluctuations a r e "frozen in". After a transition period the supercriti- cal multitype branching process either leads to extinction o r t h e total popula- tion size becomes very large. In t h e former case t h e r e a r e n o fluctuations, while in the l a t t e r t h e law of large numbers becomes applicable.

Both stochastic and deterministic t r e a t m e n t s of replication with e r r o r s yield e r r o r threshold relations which s t a t e t h a t the m a x i m u m lengths of faith- fully replicated sequences a r e roughly inversely proportional t o the single com- ponent e r r o r r a t e . The stochastic threshold t u r n s out t o be a stronger condi- tion t h a n t h e deterministic relation.

ACKNOWLEDGEMEXI'S

We acknowledge fruitful discussions with Prof. Manfred Eigen and Dr. John McCaskill. Financial support of P. Schuster by t h e Austrian Fonds zur Fijrderung d e r wissenschaftlichen Forschung Project No. 4506 a n d 5286 is also gratefully acknowledged.

(33)

Athreya, K.B. and P.E. Ney (1972). Branching P r o c e s s e s . Springer-Verlag, Ber- lin.

Biebricher, C.K.,

M.

Eigen and W.C. Gardiner, Jr. (1983). Kinetics of RNA replica- tion. B i o c h e m i s t r y 22, 2544-2559.

Billingsley, P. (1965). B g o d i c l'heory a n d h f o r m a t i o n . J. Wiley & Sons, New York.

Demetrius, L. (1983a). Statistical mechanics a n d population biology. S a t i s t . Rays. 30, 729-773.

Demetrius, L. (1983b). Selection and evolution in rnacromolecular systems.

J.

m e o r . B b l . 103, 619-643.

Ebeling,

W.

and R. Feistel (1977). Stochastic theory of molecular replication processes with selection character. Ann, f i y s . 34, 81-90.

Ebeling, W. and M.A. Jimenez-Montana (1980). Math. Biosciences 52, 53-71.

Eigen,

M.

(1971). Self-organization of m a t t e r and the evolution of biological macromolecules. N a f u r w i s s e n s c h a f t e n 58, 465-526.

Eigen, M. and P. Schuster (1979). The H p e r c y c l e .

A

Principle o f Natural S l f - O r g a n i z u t i o n . Springer-Verlag, Berlin.

Eigen,

M.,

J.S. McCaskill and P. Schuster (1984). Physics of Evolutionary Optimi- zation. Forthcoming.

Gassner,

B.

a n d P. Schuster (1982). Model studies on RNA replication

I.

The quasiequilibrium assumption and t h e analysis of a simplified mechanism.

MA.

C h e m . 113, 237-263.

Harris, T.E. (1961) 7he l'heory of Branching P r o c e s s e s . Springer-Verlag, Berlin.

Heinrich, R. and T. Sonntag (1981). Analysis of the selection equations for a multivariable population model. Deterministic and stochastic solutions and discussion of t h e approach for populations of self-reproducing biochemical networks.

J.

l h e o r . Biol. 93, 325-361.

I n a g a z ,

H.

(1982) Selection under random mutations in stochastic Eigen model.

M1. Math. H a l . 44, 17-28.

Jagers, P. (1975). Branching P r o c e s s e s w i t h B b l o g i c a l Applications. J. Wiley &

Sons, London.

Jones. B.L., RH. Enns and S.S. Rangnekar (1976). On the theory of selection of coupled macromolecular systems. &LL. Math. H a l . 38, 15-28.

Jones,

B.L.

and H.K. Leung (1981). Stochastic analysis of a nonlinear model for selection of biological macromolecules. &11. Muth. Biol. 43, 665-680.

(34)

Karlin, S. (1974). A E r s t Course in S t o c h a s t i c P r o c e s s e s , 2nd edn. Academic Press, New York.

Kesten, H. and B.P. Stigum (1966). A limit theorem for multidimensional Galton-Watson processes. Ann. Math. S t a t . 37, 121 1-1223.

Kiippers, B.O. (1979). Towards an experimental analysis of molecular self- organization and precellular darwinian evolution. N a t u r u r i s s e n s c h a f l e n 66, 228-243.

McCaskill, J. (1984a). A localization threshold for macromolecular quasispecies

-

from continuously distributed replication rates. Submitted to J. C h e m . F h y s .

McCaskill, J. (1984b). A stochastic theory of macromolecular evolution. Sub- mitted to Biological C y b e r n e t i c s .

McQuarrie, D . k (1967). Stochastic approach to chemical kinetics. J. Appl. P r o b . 4, 413-478.

Schuster, P. (1981). Prebiotic evolution. In H. Gutfreund ( E d ) , B i o c h e m i c a l B o l u t i o n . Cambridge University Press, Cambridge, pp. 15-87.

Schuster, P. (1983). Selection and evolution in molecular systems

-

a com-

bined approach by stochastic and deterministic chemical kinetics. In B.

Gornez, S.M. Moore, AM. Rodriguez-Vargas and A Rueda (Eds.), S t o c h a s t i c R o c e s s e s Applied t o P h y s i c s a n d O t h e r R e l a t e d Fields. World Scientific Publ.

Co., Singapore, pp. 134-159.

Schuster, P. and K. Sigmund (1984a). Random selection and the neutral theory

-

sources of stochasticity in replication. In P. Schuster (Ed.), S t o c h a s t i c F h e n o m e n a a n d C h m t i c B e h a v i o u r in C o m p l e z 3 y s t e m s . Springer-Verlag, Berlin, pp. 186-207.

Schuster. P. and

K.

Sigmund (1984b). Random selection

-

a simple model based on linear birth and death processes. &11. Math. Ri.01. 46, 11-17.

Swetina. J. and P. Schuster (1982). Self-replication with errors

-

a model for polynucleotide replication. B i o p h y s . C h e m . 16, 329-345.

Thompson,

C.J.

and J.L. McBride (1974). On Eigen's theory of the self- organization of m a t t e r and t h e evolution of biological macromolecules.

Math. B i o s c i e n c e s 21, 127-142.

Tuljapurkar, S.D. (1982). Why use population entropy? I t determines the r a t e of convergence.

J.

Math. Biology 13, 235-337.

Referenzen

ÄHNLICHE DOKUMENTE

Stochastic Processes II Summer term 2008 (Stochastische

Stochastic Processes I Winter term 2007/2008 (Stochastik

The content of the box is changed as follows: One ball is drawn randomly, its color is denoted, and this ball is placed back into the box together with c other balls of the same

9.4 (5 points) The following result of Blackwell, Dubins and Hunt combines the Lebesgue’s dominated convergence theorem and the convergence the- orem for uniformly

In these models the balls represent molecules and the urns represent different regions of space between which molecules may pass. Set up these models as Markov chains. In each

During the 24 hour period beginning at this time, a quantity Y n of water flows into the reservoir, and just before noon on each day exactly one unit of water is removed (if this

Gentle (1998), Random Number Generation and Monte Carlo Methods, Springer-Verlag, New York.. L’Ecuyer (1998), Random number generation, in: Handbook on

methods for computing open quantum system dynamics become very inefficient and cumbersome with increasing system size. In this regard, it is a key challenge of this thesis, to