Statistical Data Analysis for Markov-Chain Sampling Procedures

2.2 Markov-Chain Monte-Carlo

2.2.3 Statistical Data Analysis for Markov-Chain Sampling Procedures

In the limit ofn→ ∞, we can therefore apply the Markov-Chain convergence theorem, which causes the fraction

n→∞lim

J⁽ⁿ⁾(w, f)

n = lim

n→∞

C∈Ω

m_C(n) n g_C =

dCπ(C)g_C =I(w, f) (2.40) to converge to the desired integralI(w, f). For a finite number of steps n < ∞, we have found an approximate solution for the integralI(w, f)by applying Markov-Chain sampling. The error, however, is again purely statistical in nature and we again can apply error analysis tools, known from stochastics. The main difference here, is that in contrast to direct sampling, the individual steps of the sampling procedure are not independent from each other as it is the case for direct sampling processes. The configuration Cof a certain stepnin the Markov-Chain strongly depends on the configurationC˜of the previous step n−1and for this reason can not be independent. This means that the error estimation formulas, obtained for the direct sampling have to be modified to cover Markov-Chains. As a conclusion of this section, we present some remarks.

Remarks:

• In the Markov-Chain sampling, all that has to be calculated explicitly are the weightsw_C and the functionsf_C for certain configurationsC which are passed by the Markov-Chain (the configura-tion spaceΩcontains much more points than actually will be passed during a simulation). The computation and subsequent accumulation ofg_C shall in the following be called a Monte-Carlo measurement.

• The most difficult part in a Markov-Chain sampling is, as mentioned above, the determination of the proposal probabilities and the possible configurations for the current step to take. This has to be done very carefully and it has to be verified afterwards via simulation if the resulting Markov-Chain is irreducible.

• Often it is computationally demanding to determine the functionsf_C (much more thanw_C), which depends on the physical problem that has to be solved. Since the subsequent steps in the Markov-Chain sampling depend very strongly on each other, not much information is gained by adding g_C in every step and it is often useful to have certain intervals during the sampling process where nothing is summed up and the Markov-Chain just evolves.

• Although the starting configuration of the Markov-Chainµ⁽⁰⁾was not important for the Markov-Chain convergence theorem, it is clear that it will have an impact on the convergence speed of a Markov-Chain and also on the values ofgC at the very beginning of the sampling process. It is therefore often useful to wait a significant number of steps before starting to accumulategC. This is often called equilibration of the Markov-Chain.

• The formalism that we used here does not distinguish between classical and quantum mechanical configurationsC. In fact, the only difference between classical and quantum Monte-Carlo (QMC) is the fundamental difference in the configuration space but nothing will change in the sampling procedure, as we will see in the next chapter.

• Metropolis sampling is one realization of Markov-Chain sampling, however it is the most well-known and powerful realization. The Metropolis sampling is the basis for all kinds of Monte-Carlo processes, which are mostly either direct implementations of the Metropolis algorithm or extensions with the same basic idea. All the Monte-Carlo processes that are used during this thesis are based on Metropolis sampling, the specific choice of configuration space and the corresponding proposal probabilities is what makes them unique and why they are not simply called Metropolis or Markov-Chain Monte-Carlo.

2.2. Markov-Chain Monte-Carlo 31

was used in equation (2.18) to express the variance of the sum of the variables in terms of the variance of one single variable. For the Markov-Chain sampling, this is impossible, since the single measurements are not independent from each other. This can again be best understood by the example of a Markov-Chain on a grid. Suppose for every grid-pointC, a different valueg_C will be accumulated during the sampling.

In a single step of the Markov-Chain, we can move from a certain grid-point to one of its neighbors. This means the measurement of the(n+ 1)th step depends on thenth step because it can only be made on a neighbor of thenth configuration. Therefore, a lot of stepsmare necessary in the Markov-Chain, until a configurationC at stepn+mis independent of the configurationC˜at stepn. This number of steps is usually unknown and in a complicated sampling may also strongly depend on the configurations and vary throughout the sampling procedure, which usually will make it impossible to estimate the number of intermediate, not measured stepsmuntil the next measurement will be independent from the previous one. For the Markov-Chain sampling therefore one has to use slightly more advanced error analysis tools than for the direct sampling, which we will discuss now.

Again, the statistical error of the accumulated variableF_N is expressed through its variance Var(FN), where we use the same notation as in the previous data analysis section. F_N is the value accumulated through a Markov-Chain sampling withNmeasurements during the sample, i.e.

FN = 1 N

i=1

fi, (2.41)

where thef_i are the individual measurements. Of course, a single sampling procedure to obtain one valueF_N takes a lot of time and computer power and therefore repeating this procedure to get a precise variance ofF_N is not an option. One has to find error analysis tools, which can be applied on a single run and still give the precise variance ofF_N. To do this, we again express the variance ofF_N through the single measurements

Var(FN) =hF_N²i − hFNi²= 1 N² h(

i=1

fi)²i − h

i=1

fii²

. (2.42)

The expectation valueh...iis linear in its individual arguments and thef_iare all equally distributed, i.e.

hfii=hfji ≡ hfiand Var(fi) =Var(fj)≡Var(f)for all1 ≤i, j ≤N. The difference to the previous data analysis section is thathfifji 6=hfiihfjisince they are not independent. This leads to

Var(FN) = 1 N²





i,j=1

hfifji − hfiihfji



= 1 N²



NVar(f) +X

i6=j

hfifji − hfiihfji





= Var(f)

N 1 +

i6=j(hfifji − hfi²) NVar(f)

=Var(f)

N (1 + 2τ_A), (2.43) where we have defined theauto-correlation timeτ_Aas

τA= P

i6=j(hfifji − hfi²)

2NVar(f) . (2.44)

The auto-correlation time containsN(N−1)summands, where it is impossible to make any statement on their behavior during a certain sampling procedure. The best result for the auto-correlation time is that of independent variables, whenτA = 0vanishes, on the other hand, for variables which are not independent, there is no argument that preventsτAfrom scaling asτA = O(N). The latter would be the worst case scenario, because then the total variance Var(FN)would be constant and the Markov-Chain sampling would not converge to a fixed value⁶. From the Markov-Chain convergence theorem, we know that a Markov-Chain that is set-up properly (i.e. fulfilling the conditions of (2.2.2)) will converge to a stationary distribution in the limit of infinite steps. From that we directly conclude, that a proper

6It is obvious that a Markov-Chain, which is converging to a stationary distribution is not allowed to have fixed non-zero variance in the limit of infinite steps.

Markov-Chain will have an auto-correlation timeτAwhich does not scale with the number of stepsN (at least less than linear order). In a ”good” sampling process, for sufficiently largeN,τAwill no longer depend onN and the total standard deviation will again scale as σ(FN) = O(N^−1/2). We can also formulate the previous argument from another perspective. As we have seen, a finite auto-correlation time results from non-independent variablesfi, fj. If we sufficiently increase the number of intermediate sampling stepsmbetween two subsequent measurements, these variables will become independent again and the auto-correlation time vanishes but the number of measurements scales linearly withN, the total number of steps in the sampling. Ifmis lowered now, the variables are no longer independent andτA

becomes finite but the number of measurements still scales linearly withN, which means the variance in total may be larger but it is not allowed to depend onN. The only case, whenτAscales withNis the case when it is impossible to bringτ_Ato zero by the increase ofm. In this case the measurements will never become independent, no matter how many intermediate steps lie between two measurements, which can only be the case when the Markov-Chain does not converge.

With the introduction of the auto-correlation time, we have found a powerful tool to analyze possible Markov-Chain sampling processes. Suppose we could determine the auto-correlation timeτ_Ain a certain sampling process. Then we have a direct measure for the Markov-Chain we have constructed: eitherτA

remains finite in the limit of infinite steps (i.e. is independent ofNforNsufficiently large), such that the Markov-Chain converges to a stationary distribution, orτAbecomes infinite in this limit, and the Markov-Chain clearly does not converge. The latter means we have made a mistake in the implementation and did not properly ensure ergodicity. If the auto-correlation time is finite, we obtain the error of the sampling process in terms of the standard deviation

σ(F_N) =σ(f)

r1 + 2τ_A

N . (2.45)

Although the auto-correlation time is a very powerful quantity, we only benefit from its introduction, if it is possible to determineτAduring a certain sampling procedure. Finding ways to determineτAduring a sampling process is therefore the aim of the remainder of this chapter.

Binning Analysis

The most common procedure to obtain the auto-correlation time is to perform binning analysis. Starting with the original set of measurementsf_i⁽⁰⁾, with i = 1, ..., N, we iteratively obtain a ”binned” set of measurements by averaging over two consecutive entries:

f_i^(l)= 1 2

f_2i−1^(l−1)+f_2i^(l−1)

, withi= 1, ..., N_l≡ N

2^l. (2.46)

These bin averagesf_i^(l) are less correlated than the original measurements, since they belong to two distinct (imaginary) measurements with an increased number of intermediate stepsm^(l) = 2^l∗m⁽⁰⁾, wherem⁽⁰⁾is the number of intermediate steps in the original sampling process. On the other hand, the mean value of the binned averages is always the mean of the original measurementsP

if_i⁽⁰⁾. We can estimate the error of the binned variables, using the variance formula for independent variables, which we know to be incorrect but, however, converges to the correct error in the limit where the bins become independent of each other. Using (2.45) withτA= 0, we obtain

σ(F_N^(l))≈ s

Var(f_i^(l)) Nl

= 2^l N

v u u t

N_l

i=1

f_i^(l)− hf^(l)i²

. (2.47)

Suppose afterl steps of applying the binning, thef_i^(l)had been independent from each other. Then in the next step of the binning the variablesf_i^(l+1)would be also independent from each other and the two factors of2 appearing inσ(F_N^(l+1))would cancel each other resulting inσ(F_N^(l+1)) = σ(F_N^(l)). On the other hand, if thef_i^(l)had not been independent from each other,σ(F_N^(l))from Eq. (2.47) would be the wrong expression for the standard deviation, i.e. would underestimate the exact expression. Since in the next binning step, the variables will become ”more” independent, (2.47) will be a more realistic expression,

Im Dokument Topological phases of interacting fermions in optical lattices with artificial gauge fields (Seite 34-37)