The Metropolis algorithm - Markov-Chain Monte-Carlo

2.2 Markov-Chain Monte-Carlo

2.2.2 The Metropolis algorithm

The aim of this section is to set up a Markov-chain, which is able, through a very simple sample pro-cedure, to determine the probability distributionπ(C). The probabilities are formally determined by the expression

π(C) = w_C R

ΩdC⁰w_C⁰, (2.32)

where we have access tow_C for any configurationC ∈Ωbut because of its complexity, can not compute the integral in the denominator.

A Markov-Chain is determined by its starting distributionµ⁽⁰⁾, which was unimportant for the conver-gence theorem and its transition matrixp. To successfully obtain the probability distributionπafter a Markov-Chain sampling, we have to ensure thatπis the stationary distribution with respect top, and that the Markov-Chain (and thereforep) is irreducible and aperiodic. Aperiodicity can usually be fulfilled easily by setting up a Markov-Chain, which has the propertyp_CC >0, i.e. a non-zero probability to stay in a certain configuration. This already ensures aperiodicity ofpas one realizes immediately from its definition (2.7).

To ensure that the Markov-Chain has the correct stationary distribution, we use the Metropolis proce-dure, invented by N. Metropolis et al in their famous 1953 paper [132]. This algorithm uses the fact that a reversible (see Def. (2.6)) Markov-Chain with respect to the probability distributionπ, has the same probability distribution as its unique stationary distribution. Since the probabilities are formally known to us, we insert them into (2.29) to get

w_C R

ΩdC⁰w_C⁰p_CC˜= wC˜

ΩdC⁰w_C⁰pCC˜ (2.33)

or by bringing both transition probabilities on the same side of the equation, we end up with pCC˜

p_CC˜

= w_C wC˜

(Detailed Balance). (2.34)

This last equation is the celebrated detailed balance condition, which is the simplest but also most pow-erful procedure to obtain correct transition probabilities pCC˜ , with the stationary distribution πC = w_C/R

ΩdC⁰ w_C⁰. Although the space Ωmay be very huge, the detailed balance condition can be ful-filled very simply because it connects only the two statesCandC, making no additional restriction on˜ the residual transition probabilities or configurations.

To set up the Metropolis procedure (remember that we have to implement this as a numerical procedure), we use the idea of acceptance-rejection sampling. The transition probabilities are decomposed in a prod-uct of two probabilities, namely the proposal and acceptance probability, i.e.p_CC˜=p^acc

CC˜p^prop

CC˜. The proposal probabilities can be assigned arbitrary with the only restriction that they must behave as probabilities, i.e.

0≤p^prop

CC˜≤1 and X

C∈Ω

p^prop

CC˜= 1, ∀C,C ∈˜ Ω. (2.35)

To illustrate the concept of proposal and acceptance probabilities, suppose we are at a certain step of the Markov-Chain in the configurationC, that we now propose a new configuration˜ Cfor the next step and the probability of this proposal is given by the proposal probabilityp^prop

CC˜. As an example, let us consider an infinitely extended grid, with every grid-point having 4 neighboring grid-points and the grid-points represent possible configurations. In a particular step of the Markov-Chain, we sit on a certain grid-point, which we callC. For the next step of the Markov-Chain, we have to propose a possible configuration˜ C, which we do by allowing only the neighboring grid-points to be reached in a single step of the Markov-Chain but all with the same probability1/4. I.e. we simply pick any neighboring point ofC˜and call this the configurationC. This proposal has nothing to do with the actual probabilities of the configurations, it may even be that some of these neighboring grid-pointsChave the actual probabilityπ(C) = 0but this is irrelevant in the proposal of these configurations.

After the proposal of a new configuration for the next step, this configuration can be accepted, with probabilityp^acc

CC˜or can be rejected with probability1−p^acc

CC˜. If it is accepted,Cbecomes the configuration for the next step in the Markov-Chain. On the other hand, if it is rejected, the old configurationC˜stays as the configuration for the next step in the Markov-Chain.

To bring these proposal and acceptance probabilities into the detailed balance equation, we just rewrite (2.34) with these newly introduced probabilities, which then reads

p^acc_˜

CCp^prop_˜

p^acc

CC˜p^prop

CC˜

= w_C wC˜

⇔ p^acc_˜

p^acc

CC˜

=w_Cp^prop

CC˜

wC˜p^prop_˜

. (2.36)

The right side of the right equation is fixed from ”outside”, both by the weightswC, wC˜, which are de-termined by the problem itself, and by the proposal probabilitiesp^prop_˜

CC, p^prop

CC˜, that we have introduced (and consider for the moment, to be arbitrary).

The Metropolis procedure (or Metropolis algorithm) now proposes the following:

• Suppose the Markov-Chain is in thenth step in configurationC, now pick a new configuration˜ C from the set of allowed configurations (this set of course depends onC) with probability˜ p^prop

CC˜.

• This configuration is accepted with the Metropolis probability p^acc

CC˜= min (

1,w_Cp^prop

CC˜

wC˜p^prop_˜

)

. (2.37)

• The configurationCis either accepted, then it is the configuration of the(n+ 1)th step or rejected, then the old configurationC˜stays and also becomes the configuration of the(n+ 1)th step.

Using this algorithm, we have already ensured, thatπC =w_C/R

dC⁰w_C⁰ is the stationary distribution of this Chain. At this point, it is important to realize that the stationary distribution of the Markov-Chain does, in no way, depend on how complicated (or simple) we chose for the proposal probabilities p^prop. As soon as the acceptance probabilities are chosen according to equation (2.37), the stationary distribution of the Markov-Chain stays invariant under the change of the proposal probabilities. This

2.2. Markov-Chain Monte-Carlo 29

important fact gives us the freedom to chose the proposal probabilities such that the last condition of the Markov-Chain convergence theorem is fulfilled, which is the irreducibility condition, as defined in Def. (2.3). The irreducibility condition, can on a formal level, be fulfilled by choosing a Markov-Chain that makesπergodic, which is defined as

Definition 2.8 (Ergodicity)

A stationary distributionπof a Markov-Chain is calledergodic, if its support lies in an irreducible component of the Markov-Chain. Thesupportof a probability distributionπis supp(π) :={C ∈Ω|π(C)>0}.

It is clear, if we had chosen a Markov-Chain such thatπis ergodic, then the above choice of the acceptance probabilities would make the Markov-Chain irreducible. In physics, irreducibility is often meant to be equivalent to ergodicity and since we understand that in the context of the Metropolis algorithm both are equivalent, we will from now on always talk about ergodicity. As mentioned before, there is no general recipe to ensure ergodicity in an arbitrary Markov-Chain and therefore this has to be done ad hoc, depending on the distributionπone is interested in. In general, ensuring ergodicity is the most difficult part in setting up a Markov-Chain sampling method because, on the one hand there is no recipe how to actually do that and on the other hand, ergodicity in a mathematical sense is often not sufficient.

To illustrate that, we use the following example. Suppose configuration spaceΩconsists of two distinct irreducible subspaces, i.e. Ω = Ω1∪Ω2withΩ1∩Ω2 =∅and there exists only a single configuration C ∈ Ω1 from which it is possible to reachΩ2 during a Markov process. This setup will be ergodic in a mathematical sense, since it will be possible to reach any configuration from any other one in a finite number of steps. In a numerical simulation, ifΩ1is very large, it will never happen with finite probability, that a Markov-Chain starting inΩ1 will ever reach a configuration inΩ2 and therefore the numerical simulation will with finite probability give incredibly wrong results. Of course, this example is an extreme case and no one, familiar with MCMC methods, will set up such a Markov-Chain. It simply illustrates that although a Markov-Chain might look ergodic, it might turn out to be not ergodic ”enough” for a numerical simulation. One understands the concept even more clearly, when realizing that the configuration space Ωmight be so complex that it is impossible to overview it and the only predictions that one can make are local ones, i.e. on small subsets ofΩwhich may not even be connected in some manner. Mostly, the only way to verify that a Metropolis algorithm is working, is to perform numerical simulations and test if it is working correctly by benchmarking the results with other analytical and computational methods, which however might only be possible for some limiting cases, where are methods are applicable⁵.

Suppose now we have implemented the Metropolis algorithm correctly and have ensured ergodicity. Then the Markov-Chain convergence theorem applies and in a very long Metropolis sampling procedure, we will be able to determine the probability distributionπ. But how does this help us to compute the integral

I(f, w) = R

ΩdCf_c R

ΩdCwc

= Z

Ω

dCπ(C)gC (2.38)

required in this simulation? Of course, it would be possible to apply a direct sampling procedure now, sinceπis known. But this would be inefficient since we would have to perform a sampling procedure twice and we also would have to store the values ofπfor every configurationCinΩ. Also it would be necessary to determineπfor every single configurationCin the Markov-Chain sampling which would take a tremendous amount of computation time.

The idea is to use the Markov-Chain convergence theorem, as we will see. We start with setting up a Markov-Chain as it is described by the Metropolis algorithm. Then we let this Markov- Chain evolve during a numerical sampling procedure but, instead of counting the number of timesm_C(N)every con-figurationC ∈Ωhas been reached during the sampling, we simply add up the valuesg_C. For every step nin the Markov-Chain, we know the configuration of this stepC, therefore we can computeg_C =f_C/w_C and just add this to a variable which we callJ⁽ⁿ⁾(w, f). This means that in every step only two variables have to be determined, firstJ⁽ⁿ⁾(w, f) =J⁽ⁿ⁻¹⁾(w, f) +gC and second the step numbern(which is just a running integer). Afternsteps of the Markov-ChainJ⁽ⁿ⁾(w, f)can be expressed as

J⁽ⁿ⁾(w, f) =X

C∈Ω

m_C(n)g_C. (2.39)

5As for instance exact diagonalization is only applicable to finite systems but not in the thermodynamic limit, whereas a huge class of Monte-Carlo methods is applicable to both finite and infinite systems

In the limit ofn→ ∞, we can therefore apply the Markov-Chain convergence theorem, which causes the fraction

n→∞lim

J⁽ⁿ⁾(w, f)

n = lim

n→∞

C∈Ω

m_C(n) n g_C =

dCπ(C)g_C =I(w, f) (2.40) to converge to the desired integralI(w, f). For a finite number of steps n < ∞, we have found an approximate solution for the integralI(w, f)by applying Markov-Chain sampling. The error, however, is again purely statistical in nature and we again can apply error analysis tools, known from stochastics. The main difference here, is that in contrast to direct sampling, the individual steps of the sampling procedure are not independent from each other as it is the case for direct sampling processes. The configuration Cof a certain stepnin the Markov-Chain strongly depends on the configurationC˜of the previous step n−1and for this reason can not be independent. This means that the error estimation formulas, obtained for the direct sampling have to be modified to cover Markov-Chains. As a conclusion of this section, we present some remarks.

Remarks:

• In the Markov-Chain sampling, all that has to be calculated explicitly are the weightsw_C and the functionsf_C for certain configurationsC which are passed by the Markov-Chain (the configura-tion spaceΩcontains much more points than actually will be passed during a simulation). The computation and subsequent accumulation ofg_C shall in the following be called a Monte-Carlo measurement.

• The most difficult part in a Markov-Chain sampling is, as mentioned above, the determination of the proposal probabilities and the possible configurations for the current step to take. This has to be done very carefully and it has to be verified afterwards via simulation if the resulting Markov-Chain is irreducible.

• Often it is computationally demanding to determine the functionsf_C (much more thanw_C), which depends on the physical problem that has to be solved. Since the subsequent steps in the Markov-Chain sampling depend very strongly on each other, not much information is gained by adding g_C in every step and it is often useful to have certain intervals during the sampling process where nothing is summed up and the Markov-Chain just evolves.

• Although the starting configuration of the Markov-Chainµ⁽⁰⁾was not important for the Markov-Chain convergence theorem, it is clear that it will have an impact on the convergence speed of a Markov-Chain and also on the values ofgC at the very beginning of the sampling process. It is therefore often useful to wait a significant number of steps before starting to accumulategC. This is often called equilibration of the Markov-Chain.

• The formalism that we used here does not distinguish between classical and quantum mechanical configurationsC. In fact, the only difference between classical and quantum Monte-Carlo (QMC) is the fundamental difference in the configuration space but nothing will change in the sampling procedure, as we will see in the next chapter.

• Metropolis sampling is one realization of Markov-Chain sampling, however it is the most well-known and powerful realization. The Metropolis sampling is the basis for all kinds of Monte-Carlo processes, which are mostly either direct implementations of the Metropolis algorithm or extensions with the same basic idea. All the Monte-Carlo processes that are used during this thesis are based on Metropolis sampling, the specific choice of configuration space and the corresponding proposal probabilities is what makes them unique and why they are not simply called Metropolis or Markov-Chain Monte-Carlo.

Im Dokument Topological phases of interacting fermions in optical lattices with artificial gauge fields (Seite 31-34)