• Keine Ergebnisse gefunden

VI. Numerical methods 75

VI.5. Numerical time propagation

Various numerical time propagation schemes have been established to cover a broad spectrum of applications as documented by the abundant literature on this subject [346, and references therein], implying that a complete overview would certainly go beyond the scope of the present thesis. Nevertheless, it is intended to outline the principles behind the techniques which proved to be useful while providing a wider context to briefly demonstrate their superiority over alternative methods. For the sake of brevity, emphasis is placed on pointing out selected non-obvious issues to give the reader an intuition rather than to develop a consistent mathematical formalism.

Subsequently, reference is made to the potential solutions without explaining them in detail.

Being confronted with memory limitations, Kaiser (2014)[4] suggested to implement the TDRDMapproach on a distributed memory machine, which is generally accompanied by severe additional programming effort because all algorithms –if sufficiently suited for parallelization at all– have to be adapted. As runtime is also an issue, one strives for so-calledscalablealgorithms, i.e., algorithms that are characterized by theidealspeedupS(p) :=Tsim(1)/Tsim(p) = p, where Tsim(p)is the runtime of the program onpprocessors [347,348]. In simple words,ptimes as many processors ideally areptimes as fast. On a side note, the cases of superlinear speedup

6In addition, theDVRpotentially produces sparse matrices in the multidimensional cases of Eq. (II.1) and the TDRDM(as stated above).

89

VI.5. Numerical time propagation

S(p) > p occurring rarely in practice due to cache effects are, though frequently reported in the literature [347,349], not discussed here. For clarification of the technical aspects, it should be mentioned that all parallelized algorithms were implemented based onMPI[350,351]

with parallelI/Oachieved by means of theHDF5library [352]. MPIcan be considered as a quasi-standard [353] and, similarly, HDF5is widely spread7 and proven to be very efficient [355].

It is instructive to begin with an illustrative example that is reduced to its essentials. Consider the1dheat equation as a simple parabolicPDEto be solved with aFDM. For demonstration purposes, theFDMis deliberately preferred over theFEDVRbecause the latter is more compli-cated and it shall be emphasized that this problem is not exclusive to theFEDVR. Now assume that a corresponding initial value problem is numerically solved via theθ-method[346, p. 35 et seqq.], which includes the(forward/explicit) Euler(θ = 0),backward/implicit Euler(θ = 1), and theCrank-Nicolson(CN) (θ = 12) method as special cases. It is very important to note that Euler and backward Euler are both consistent of order 1 but exhibit completely different stability properties, which clearly manifests itself in the practical applications. Although it is not intended to repeat findings already well established in textbooks, it shall be made clear that and why one never should use Euler – not even for testing purposes. Despite using very small time steps, one will often not obtain convergent results. The underlying issue is demonstrated by means of Fig.VI.5, which focuses on one of the many problems one may experience when using Euler.

Letg(x) = exp −12x2

be the initial state at timet = 0stored on a finite grid (see caption for details), and let4dbe the standard second-order central difference that is very common in the FDMfor parabolic problems [314,356]. The time-dependent solutiong(x, t)at timetaftern steps obtained of the Euler method is then written in a somewhat sloppy notation as

g(x, t) = 1 + nt4dn

exp −12x2

. (VI.28)

Hence, many consecutive numerical derivations, as seen in Fig.VI.5, are evaluated, which are subject to an rapidly growing error due to the finite discretization and the finite floating-point precision. Interestingly,42dg perfectly estimates the analytical solution on the scale of Fig.VI.5 but is in fact subject to numerical noise, i.e., small fluctuations between adjacent grid points. If a subsequent numerical derivation is applied to obtain43dg, the procedure yields an approximate to the derivative of the noise and not of the smooth curve 42g. Crucially depending on the discretization in space, the emergence of large errors very likely leads to divergent results. This is in strong contrast toODEsof the formy0 = f(t, y)where the numerical evaluation of the

“right-hand side” functionf(t, y)is virtually exact. In the case of Fig.VI.5, the finite floating-point precision is of paramount concern, whereas increasing the consistency order does not mitigate the situation [not shown]. Regarding the region of absolute stability, the Euler method is anticipated to be even worse for theTDSE[338, p. 284 et seqq.].

In sharp contrast, the backward Euler method isL-stable[346, p. 146], i.e., it introduces numerical damping. Accordingly, there is, due to limited floating-point precision, a decisive difference between the forward and backward Euler method in practice even though both methods are consistent of the same order. There is an important lesson to learn from this: consistency is not everything. Nevertheless, theCNscheme is often more attractive because it is onlyA-stable,

7also forXFELdata inSFXexperiments [354]

90 VI. Numerical methods

i.e., it induces no artificial damping, and is consistent of order 2. Note, however, that L-stability can prove advantageous in certain cases. For a formal definition of A- and L-stability see, e.g., Ref. [357, p. 40 et seqq.]. The situation is somewhat intuitive in the sense that the inverse of the discretized derivation operator can be associated with integration, which, in turn, exhibits a smoothing property [358,359]. numbers on a uniform grid of 6000 points on the interval [−10,10]using the standard second-order central differ-ence. With the discrete method failing dramatically for 43dg, the analytic solution43gis shown as well.

Likewise, explicit propagation meth-ods within the FEDVR are facing the same intrinsic problem. One might con-jecture that this can be avoided within the FEDVR by increasing ng and low-ering ne at the expense of the locality of the derivation operator as outlined above in Eq. (VI.26). However, this idea, analogous to the attempted solution in the FDM based on increasing the con-sistency order of the derivation operator, does not resolve the present issue.

For practical calculations the compu-tational toolkitsPETSc[316] andSLEPc [360] were utilized. Here, one follows the philosophy behind PETSc that al-lows for choosing various numerical time-propagation schemes via runtime options because it is often too time-consuming to investigate beforehand which algo-rithm best matches the respective

prob-lem. This philosophy applies in particular to the deeper level of the implicit methods concerned with the solution the underlying sparse linear systems. Accordingly, many different numerical approaches have been tested and proven useful, including purely explicit methods such as those described in the following.

The utility of well-established embeddedRunge-Kutta(RK) formulae has been investigated, which are in each case associated with three integers. One of them is number ofstages, which is related to the number of evaluations of the “right-hand side” per time step (but is not necessarily identical due to standard optimizations [361, p. 167]). A precise definition of the termstages of an explicit RKmethod can be found in Ref. [361, p. 134]. Then, using the notation from Ref. [361, p. 166], one characterizes an embeddedRKmethod of ordern(m)by the property that it consists of a scheme of ordernand an error estimator of ordermand therewith specifies the remaining two integers. The following embeddedRKschemes have been tested: the Bogacki-Shampine(RK3BS) method of order3(2)with four stages, theFehlberg(RK5F) of order4(5) with six stages, and theDormand-Prince(RK5DP) method of order5(4)with seven stages. It is interesting to note that for the calculations performed in the course of ChapterV,RK3BS significantly outperformed both theRK5FandRK5DPmethod with respect to the requirements in computational runtime and iterations. This is related to the numerical phenomenon seen above in Fig.VI.5because the performance deteriorates upon increasing the order. However,

91

VI.5. Numerical time propagation

this conclusion is of course not universal as it depends on the grid spacing, so the situation can change considerably for a different photon energy, which may produce faster photoelectrons and thus require a finer numerical grid.

Moreover,strong stability-preserving(SSP) ortotal variation diminishing(TVD) time dis-cretization methods [362–364] have been tested to meet a practicality comparable to the embed-dedRKmethods.SSP/TVDmethods fulfill theTVDproperty8, i.e., the methods avoid artificial oscillatory behavior even for problems with discontinuous solutions. Interestingly, the explicit Euler method maintains theTVDproperty for sufficiently small time steps as proven in Ref. [346, p. 226 et seqq.] followed by an example why one should still not use this method but rather its extensions.

Thesplit-operator(SO) technique reliably produced convergent results in1d TDSE calcula-tions at high runtime performance without a trial-and-error search for adequate parameters even for the inexperienced user. The basic concept of theSOmethod is the decomposition of matrix exponentials [342,346,365–367] according to

ψ(x, t+δt) = exp −iV(t+δt)δt2

exp(−iT δt) exp −iV(t)δt2

ψ(x, t) +O δt3

, (VI.29) whereψ(x, t)is the time-dependent solution to theTDSEdescribed by the HamiltonianH(t) = T+V(t)with the kinetic operatorT, whoseFEDVRrepresentation is a matrix with the structure from Eq. (VI.26), and a time-dependent potential termV(t), which is represented by a diagonal matrix. On the one hand, numerically evaluating the matrix exponential of the diagonal matrices and the subsequent matrix-vector multiplication is computationally inexpensive, and there is thus no need for further discussion of these terms. On the other hand, the straightforward calculation of the termexp(−iT δt)yields a dense matrix, which strongly impairs the efficiency of parallelization as noted on page75. Different strategies arose to circumvent this drawback.

For example, one can proceed by splittingT into a sum of two block matrices and applying the same decomposition without affecting the consistency order [342]. However, the implementation of this approach showed that the stability was considerably reduced. Alternatively, one may use one of many algorithms to directly evaluateexp(M)vfor a sparse matrixM and a vectorv [368].

Here, the algorithm from Ref. [369, p. 136 et seqq.], which is implemented inSLEPc, proved to be reliable and efficient for theTDSE, but the rapid convergence slowed down markedly for the mathematically equivalent calculations based on the1RDM.

InTDHFsimulations the concept ofimplicit-explicit(IMEX) methods [346,370–372] is favorable because it allows to distinguish terms of the dynamical equations according to their stiffness, i.e., their suitability for a numerical treatment with explicit methods. Accordingly, the nonlinear Fock term, which is represented by a relatively dense matrix9 inFEDVR(see below in Sec.VI.6), is treated as a non-stiff term.

So far, it was not addressed how the sparse linear systems appearing in, e.g., theIMEXand theCNmethod are solved. This subject area shall be merely mentioned in passing due to the availability of standard textbooks [373,374] and implementations of various algorithms (for example, inPETSc) to solve such systems. It should be noted that algorithms developed for dense linear systems such as Gaussian elimination, albeit suitable for a parallel implementation

8A formal definition of theTVDproperty can be found in Refs. [346,362].

9The corresponding matrix is, of course, not necessarily explicitly stored in memory.

92 VI. Numerical methods

[338, p. 150 et seq.], can commonly not be efficiently applied without more ado because, for instance, Gaussian elimination tends to destroy the sparsity of matrices [373]. Alternatively, iterative algorithms such as thegeneralized minimal residual(GMRES) method are frequently used despite the fact that fast convergence ofGMRESis debatable in the general case [375].

Finally, it shall be briefly explained why the numerical eigenvalue problem is barely touched.

First, the standard algorithms provided by the similar framework ofSLEPc[360] were generally unproblematic and, second, the size of the numerical grid can be markedly reduced in ground-state calculations due to the absence of photoelectrons (see Sec.VI.1). The result can then be extrapolated to obtain the initial state on a grid appropriate for the time-dependent problem.