Gillespie’s algorithm - Probabilities at work

2.2. Stochastic Modeling

2.2.2. Gillespie’s algorithm - Probabilities at work

As mentioned in the previous section, solving the master-equation for bigger systems is usually not feasible. One way of circumventing the need to do so is by obtaining paths (single trajectories) as possible solutions. Sampling such exact solution paths then gives rise to estimating the probability distributions that are solutions of the master-equation. D.T.Gillespie described such an al-gorithm in 1977 (Gillespie, 1977), called the “Stochastic Simulation Alal-gorithm”

(SSA) that uses a Monte-Carlo approach to simulate exact trajectories for the chemical reaction system. The derivation of the algorithm can be established using the exact same premises as for the master-equation as shown in Gillespie (1992), which makes the two approaches equivalent to one another despite their

distinct outputs.

The technique basically splits the problem in two questions per step: “1.

When will the next reaction occur?” and “2. Which reaction R_m will be occurring?”. This is the joint probability distributionP(τ, m|x, t) of the timeτ and reactionRm given the statexat timet. The timeτ until the next reaction will occur is an exponentially distributed random number as was shown in the original paper. This fact can be proven from the premises of (I) a well-mixed, spatially homogeneous system in which molecules can be regarded as uniformly distributed and (II) the molecules following a Maxwell-Boltzmann velocity distribution. These were exactly the prerequisites of the derivation for the master equation and in fact, both use similar derivations and are equivalent to one another. Making use of the previously stated probabilities, this can be expressed in the following manner: We see the interval [t, t+τ] divided into k evenly spaced subintervals of length ϵ = ^τ_k in which the event “no reaction” occurs with the probability described in term (13). These are k independent successive events and the probabilities can thus be multiplied to give the overall probability of “no reaction” in the whole interval [t, t+τ]. For a shorter notation, we define the sum over the reactions in term (13) to be

a(x) =

R_m is supposed to occur. And, again referring to the previous section for this probability is the term (14). Thus withdτ →0, we can write

P(τ, m|x, t) = (1−a(x)ϵ+o(ϵ))^k·cmhm(x) (18) and by making the subintervals ϵ= ^τ_k infinitely small byk→ ∞ we obtain the central formula for the SSA:

The two terms of the product (21) are what is now to be simulated in Monte Carlo fashion.

The first term in (21), a(x)e^−a(x)τ, is the exponential distribution from which we sample the time τ. We calculate the quantile function for this distribution (by taking the inverse of its cumulative distribution function) and thus can map a uniform random numberr1 of the interval [0,1] back to such a τ:

Next, the reaction to be chosen will be weighted with its occurrence: We divide the unit interval into M fractions, each of a width proportional to the amount that the corresponding reactionR_m contributes to the sum a(x), namely the fraction in the second term of (21). The algorithm does this by definingm as

“the smallest integer satisfying”

∑

i=1

cihi(x)> r₂a(x), (23)

with r₂ being again a random number from the uniform distribution on [0,1].

As can be seen, the specific reaction that will then be chosen to occur at

timeτ is dependent on both the state X(t) =xof the system (i.e. the number of molecules for each species) as well as the stochastic reaction constants c_m associated with the reactions Rm. Again, the probability to choose a certain reaction is proportional to the product of possible distinct substrate-molecule-combinations h_m(x) and the stochastic rate of the reaction. The rate itself also depends on the volume in which the reaction will take place, since within a smaller space the probability of molecules colliding and reacting will increase.²³ After selecting the reaction Rm that is occurring, the molecule numbers x are updated according to the reactants and products of the reaction in question by adding the corresponding row νm of the stoichiometric matrix. The time is increased by τ and unless the desired simulation time is reached, these steps are repeated. The algorithm can be summarized as follows:

1. Determine the timeτ of next reaction event 2. Determine the reaction Rm that will occur next

3. Update time t=t+τ and the systems state-vector x=x+νm

4. If t < t_final, go to step 1

A typical example of a trajectory for a SSA model for gene expression can be found in Fig. 9. The result of one SSA-run is a possible trajectory out of the solution for the master-equation of the process, discrete in the system-state x(molecule numbers) and continuous in time. By simulating sufficiently enough trajectories, we can then sample and estimate the underlying probability distribution of molecule numbers at times t, i.e. the species population vector X(t).

The SSA is computationally expensive if the rates and/or the numbers of molecules are high - corresponding to the huge number of reactions that are occurring in that case. There are techniques to speed up this process by making simplifications and assumptions that lower the complexity and thus accuracy of the simulations. One such technique is the τ-Leaping version of the SSA (Gillespie, 2001; Cao et al., 2006, 2007), where the species vector

23Of course, by definition reactions of first order are unaffected by the volume since no collisions have to take place.

Fig. 9: Trajectories in one run of a SSA simulation - mRNA and protein in a model for stochastic gene expression (see Shahrezaei and Swain (2008)).

is regarded as constant over a time window larger than the “time to next reaction”. Thus the firing of several successive reactions will be computed by drawing random numbers (from appropriate distributions derived from the equations in this section) telling which and how many occur in this time frame. The approximation of constant species introduces a controllable error.

Another technique would be to employ continuous versions like the chemical Langevin-equation (Gillespie, 2000) and others.

The original detailed description of the algorithm can be found in Gillespie (1977). Tutorials and applications of the Stochastic Simulation Algorithm were subject or tool in numerous publications. As a review, we recommend Gillespie (2007) written by D.T.Gillespie himself.

Im Dokument Information processing in cellular signaling (Seite 44-47)