Applying Fox’s algorithm to Lanczos iterations

3.3 Numerical realization

3.3.3 Applying Fox’s algorithm to Lanczos iterations

As it has been pointed out, the Jacobi basis allows one to perform calculations for a specific physical state at a time which can tremendously reduce the dimensionality of the huge sparse symmetric Hamiltonian matrix needed to be diagonalized. But still, the basis sizes are significant (especially forA≥6 hypernuclei, see also Table3.1) as compared to the memory capacity of the up-to-date supercomputers. Therefore, an efficient scheme to obtain the lowest eigenvalues and the corresponding eigenstates is crucially important. In this aspect, the powerful Lanczos eigenvalue iterations [107] will be the most suitable diagonalization tool to our problems. Here, we employ the parallel Lanczos eigensolver available as a part of the advanced PARPACK library - a parallel version of the popular ARPACK software [108]. The basic idea of the method is to iteratively construct an orthonormal Lanczos basis{v,v¹,· · ·,v^m−1}of aKrylovsubspace [109],

K_m(H,v)= span{v,Hv,H²v,· · · ,H^m−1v}, (3.36) whereHis a Hermitian matrix of sizen×n,vis an arbitrary starting vector of dimensionnwhilemis some integer numbermn(typicallymis of order of 100 or several hundreds at most) that specifies the dimensionality of theKrylovspace. The Lanczos vectorsv^k, k=1,m−1 are constructed (usually in combination with an implicit restart process) such that in this basis the HamiltonianH becomes a

3.3 Numerical realization

system (N_maxJ^πT) α^∗(Y) ≡

α^{∗(Y N)} ≡

(α^∗(2))^∗(Y) ≡

4ΛHe/⁴_ΛH

(22 0⁺¹₂) 118,149 355,008 319,221

(22 1⁺¹₂) 343,490 1,031,424 1,923,957

5ΛHe (14 ¹₂⁺0) 186,155 748,480 1,119,873

6ΛHe/⁶_ΛLi (13 1⁻¹₂) 1,452,047 7,513,728 15,098,199

7ΛLi

(12¹₂⁺0) 871,102 5,782,144 13,843,348

(12 ³₂⁺0) 1,004,129 9,987,776 17,782,800

(10 ⁵₂⁺0) 408,084 2,589,910 6,693,764

(10 ⁷₂⁺0) 407,770 2,5948,32 6,716,857

(10 ¹₂⁺1) 363,963 2,332,047 6,057,652

Table 3.1: Total dimensions of the basis and the intermediate states for theS =0 andS =−1 Hamiltonians.

The second column shows the largest model space sizes for each system.

Chapter 3 Jacobi NCSM forS =−1systems

tridiagonal matrix whose the lowest eigenvalue provides the best approximation to the ground-state binding energy ofH in the full Hilbert space. The main input to the parallel Lanczos eigensolver is a function that can calculate the matrix-vector product,

H_{i j}v^k−1_j →v^k_i, (3.37)

at eachk−thiteration with the two vectors,v^k−1 andv^k, being eithercol- orrow-distributed. It turns out that computing Eq. (3.37) is the most time-consuming part of every Lanczos iteration, hence, it should be performed with very high efficiency. Furthermore, in order to reduce the memory usage, it is necessary that the Hamiltonian matrixH is completely distributed over the process grid. In that sense, the standard algorithm for matrix-vector multiplication is no longer the optimal one since it unavoidably involves some global communications among all processes. The desired efficiency can be however attained by exploiting the beautiful idea of Fox’s algorithm. In order to apply the Fox’s idea we shall distribute the matrixH_{i j} on thenp_row×np_colprocess grid and the vectorv^k−1_j overnp_rowprocesses. Each process now stores a localrowcolmatrixH_rowcol^loc and a local rowvectorv^k−1,loc_row . At every Fox’s iteration, each process will first need to perform the matrix-vector multiplication on its local data,H_rowcol^loc andv^k−1,loc_row , resulting in a temporaryrow-distributed vector v^temp,loc_row ,

H_rowcol^loc v^k−1,loc_row →v^temp,loc_row , (3.38) and then shift itsrowvectorv^k_row^−1,locto its neighbour process in the samecommrowcommunicator in order to prepare for the next Fox’s iteration. At the end, a localized mpi-collective operator mpi_allreduceonv^temp,loc_row must be carried out in everycommcolcommunicator, yielding a final row-distributed product vectorv^k,loc_row. The pseudo code for the Fox’s matrix-vecctor multiplication with the inputH^loc_rowcol, v^k−1,loc_row and the ouputv^k,loc_row is shown below.

Algorithm 2Fox’s algorithm for vector-matrix multiplication

1: procedureFox_matrixvector_multiplication(H^loc_rowcol, v^k_row⁻^1,loc, v^k,loc_row)

2: v^temp_row ←0

3: source←mod(myrowid+1,npe_row)

4: dest←mod(myrowid,npe_row)

5: foriter=0,npe_row−1do

6: root ←mod(myrowid+iter,npe_row)

7: ifroot=mycolidthen

8: v^temp_row ←v^temp_row +H_rowcol^loc ×v^k−1,loc_row

9: MPI_Sendrecv_Replace(v^k_row^−1,loc,dest,source,commrow)

10: MPI_Allreduce(v^temp_row , v^k,loc_row,dest,source,commcol)

We are now ready to apply the just described Fox’s matrix-vector multiplication to the hypernuclear eigenvalue problems. As an example, we will explicitly show the Lanczos procedure that involves

3.3 Numerical realization

only the strange part (S =−1) of the HamiltonianH^S⁼⁻¹, H^S⁼⁻¹

Ψ^k−1=

Ψ^k. (3.39)

Here, Ψ^k−1

and Ψ^k

are the wavefunctions at two successive (k−1)-th andk-th Lanczos iterations, which can be expanded in thecol-distributed basis states|α^∗(Y)ias follows

Ψ^k−1=X

α^∗(Y)

C^k−1_α

α^∗(Y);

Ψ^k=X

α^∗(Y)

C_α^k

α^∗(Y). (3.40) Automatically, the expansion coefficientsC^k−1_α andC_α^k are also distributed overnp_colprocesses in the same manner as the states|α^∗(Y)i. After projecting the equation Eq. (3.39) onto the basis

α^∗(Y) and then making use of the completeness of the intermediate states|α^∗^{(Y N)}iin Eq. (3.26), we obtain

α^0∗(Y)

α^{∗(Y N)} α^{0∗(Y N)}

α^∗^(Y)

α^∗^{(Y N)}α^∗^{(Y N)}

H^S⁼⁻¹

α^0∗^{(Y N)}α^0∗^{(Y N)}

α^0∗^(Y)α^0∗^(Y)

Ψ^k⁻¹=α^∗^(Y) Ψ^k.

(3.41) Now, by inserting the expansion in Eq. (3.40) into Eq. (3.41), one arrives at a set of linear equations Eq. (3.42) for the Lanczos iterations

α^0∗(Y)

α^{∗(Y N)} α^0∗^{(Y N)}

α^∗(Y)

α^{∗(Y N)}α^{∗(Y N)}

H^S=⁻¹

α^{0∗(Y N)}α^{0∗(Y N)}

α^0∗(Y)C^k−1_α⁰ =C_α^k.

(3.42) It is obvious that, during the Lanczos iterations, only the expansion coefficientsC^k−1_α andC_α^k are updated, while the other terms in Eq. (3.42) remain unchanged. It is therefore advisable to prepare the matrix elementsα^∗^{(Y N)}

H_Y

α^0∗^{(Y N)}as well as the overlapα^0∗^{(Y N)}

α^0∗^(Y)and have them stored locally in the desired row- and col-distribution before entering the iterations. Since our basis states|α^∗(Y)iare distributed overnp_colprocesses, the overlap matrixα^{0∗(Y N)}

α^0∗(Y)should also be distributed in theα^0∗^{(Y N)}-rowandα^0∗^(Y)-colmanner. Then the first summation over the|α^0∗^(Y)istates can be straightforwardly performed with the help of the standard matrix-vector multiplication, which yields an intermediaterow-distributed vector

v^inter_row(α^{0∗(Y N)})= X

α^0∗(Y)

hα^{0∗(Y N)}|α^0∗(Y)i_rowcolC^k−1_α⁰_,_col. (3.43) We employ Fox’s matrix-vector multiplication algorithm for the second summation that involves hα^{∗(Y N)}|H^S⁼⁻¹|α^{0∗(Y N)}iandv^inter_row(α^{0∗(Y N)}). Since the latter isrow-distributed, it is required that the matrix hα^{∗(Y N)}|H^S⁼⁻¹|α^{0∗(Y N)}i is alsoα^{∗(Y N)}-row and α^{0∗(Y N)}-col distributed. Applying the Fox’s algorithm 2to the summation over|α^{0∗(Y N)}ithen results in anotherrow-distributed intermediate vector

v^inter2_row (α^{∗(Y N)})= X

α^0∗^{(Y N)}

hα^{∗(Y N)}|H^S⁼⁻¹|α^{0∗(Y N)}i_rowcolv^inter_row(α^{0∗(Y N)}). (3.44)

Chapter 3 Jacobi NCSM forS =−1systems

8 10 12 14

0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75

Memory per node [Gb]

Foxno Fox

8 10 12 14

0 100 200 300 400 500 600 700

Run time [s]

Foxno Fox

Figure 3.5: Memory usage (left figure) and total runtime (right figure) when calculating the ground-state binding energy in⁵_ΛHe with Fox’s algorithm (red bars) and without Fox’s algorithm (green bars) for different model space sizesN. The calculations are performed on the JURECA-Booster supercomputer with 64 nodes.

Finally, the third summation over the|α^{∗(Y N)}istates C^k_α,col= X

α^{∗(Y N)}

hα^∗(Y)|α^{∗(Y N)}i_colrowv^inter2_row (α^{∗(Y N)}), (3.45) is nothing but a normal matrix-vector multiplication, hence, can be performed in the same way as the first summation in Eq. (3.43). The Lanczos procedure involving the non-strange Hamiltomian H^S⁼⁰can be performed in a similar manner as forH^S=⁻¹. In order to illustrate the benefits of using Fox’s matrix-vector multiplication in the Lanczos iterations, in Fig.3.5we compare the memory usage per node (left figure) and total runtime (right figure) when calculating the binding energy of

5ΛHe with Fox’s algorithm (red bars) and without Fox’s algorithm (green bars) for different model space sizes³. One clearly sees that the implementation of the Fox’s multiplication leads to a slight reduction in memory usage and tremendously speeds up the calculations in particular for large model space sizes.

3In binding energy calculations, when saying model spaceNwe mean that all the basis states with the sameJ^πandT and with all allowable HO energy quantum numbers up toN.

C H A P T E R 4

Results for A = 4 − 7 Hypernuclei

In this chapter we explore light hypernuclear systems ranging from⁴_ΛHe (A= 4) to⁷_ΛLi (A = 7) using our Fortran based J-NCSM code. We will first explain the extrapolation procedure employed in order to extract the infinite model-space binding (andΛ-separation) energies together with the theoretical uncertainties. In Section 4.2we carefully study the separation energies B_Λ of these light hypernuclei focusing on the effects of different NN chiral interactions as well as the SRG evolutions. The energy spectrum of⁷_ΛLi is presented in Section4.3. Intriguing correlations between B_Λof different systems are discussed in Section4.4. The impacts of various YN (chiral) interaction models on hypernuclear observables are comprehensively investigated in Sections4.5and4.6. The next section, Section4.7, is devoted to study possible CSB in the A = 7 isotriplet hypernuclei,

7ΛLi(T =1), ⁷_ΛHe and⁷_ΛBe. Finally, we report our J-NCSM results for other interesting quantities like nucleon and hyperon radii, together with NN and YN correlation functions in Section4.8. As it has been mentioned earlier, for all our calculations presented here the NN and YN potentials with partial waves higher than 6 (J>6) are left out. And, for simplicity, the electromagnetic NN interactions [110] as well as Coulomb point-like YN interactions are not included in the SRG evolutions, but only added afterwards. We observed that evolving these interactions changes hypernuclear binding energies only by few keV.

4.1 Extrapolation of the binding energies

Due to the finite truncation in the single-particle Hilbert space, results from the NCSM calculations are dependent on the HO frequencyωas well as the model space sizeN. In order to obtain the converged binding energies, and at the same time, be able to systematically estimate the numerical uncertainties, we shall follow a two-step procedure as employed in [83]. The first step is to minimize (eliminate) the HO-ωdependence. For each model space sizeN, we first calculate the binding energies,E(ω,N), for a wide range of HO-ωand then utilize the following ansatz,

E(ω,N)=E_N+κ(log(ω)−log(ω_opt))², (4.1) to extract the lowest binding energy E_N for the considered model spaceN and the corresponding optimal HO frequencyω_opt. Here,κis some constant to be determined from the parabolic fitting for each E(ω,N). As an example, we show in Fig.4.1the HO-ωdependence of E(⁴_ΛHe,0⁺) for

Chapter 4 Results forA=4−7Hypernuclei

10 12 14 16 18 20 22 24

[MeV]

10.75 10.70 10.65 10.60 10.55 10.50 10.45

E [MeV]

YN= 2.00fm ¹ = 10 = 12 = 14 = 16 = 18 = 20 = 22

14 16 18 20 22 24

14 16 18 20 22

14 16 18 20 22 24

14 16 18 20 22

14 16 18 20 22 24

[MeV]

10.7 10.6 10.5 10.4 10.3 10.2 10.1

E [MeV]

YN= 3.00fm ¹

Figure 4.1:E(⁴_ΛHe,0⁺) as a function of HOω. Solid lines with different colors and markers are the numerical results for different model spaceN.Dashed lines are obtained using the ansatz Eq. (4.1). The calculations are based on the NN Idaho-N³LO(500) potential evolved toλ_NN=1.6 fm^-1and the NLO19 with a regulator of 600 MeV for YN potential evolved to two SRG flow valuesλ_{Y N} =3.00 fm^-1(right figure) andλ_{Y N}=2.0 fm^-1 (left figure).

model spaceN varying from 10 to 22 with a step of 2, computed at two values of the SRG-YN flow parameters:λ_{Y N} = 3.0 (right figure) andλ_{Y N} =2.0 fm^-1(left figure). Generally, the optimal frequencyω_opt corresponding to each model spaceN becomes smaller whenλ_{Y N} decreases. We further notice thatω_optalso shifts to smaller values asN increases, and theω-dependence energy curves of sufficiently large model spaces are practically flat. This basically reflects the intrinsic properties of the HO basis. With increasingN,the basis functions contain many more higher-order polynomials that can efficiently describe the high-momentum (short-distance) part of the wavefunction. The HO basis then can afford smaller HO frequencies so that the resolution at low-momentum (large-distance) can be improved. We note that a similar trend is observed for all investigated hypernuclei hinting at good convergence patterns in all these systems. In the second step, the binding energies with the minimalω-dependence, E_N, are used for extrapolating to a converged result in infinite model space assuming an exponential ansatz

E_N =E_∞+Ae^−BN. (4.2)

The confidence interval for eachE_N in (4.2) can be determined either from the spread of the energy in the vicinity of ω_opt or from the slope between two successive energies, E_N and E_N₊₂. The latter is widely employed in our calculations. We, however, stress that the two ways of assigning confidence intervals are equivalent and lead to the same results within the numerical uncertainties.

This determined intervals will serve as a weight for eachE_N in the model-space fit with the ansatz Eq. (4.2). In Fig.4.2we illustrate the model-space extrapolation forE(⁴_ΛHe,0⁺) for the two chosen SRG cutoffsλ_{Y N}. Here, the red lines are the extrapolated binding energies E_∞ while the shaded areas are the estimated uncertainties which are taken as differences between theE_∞and E_N

max. One clearly sees that, in both cases, the ground state binding energiesE(⁴_ΛHe) calculated using model space up toN_max=22 converge very nicely, with lower SRG cutoffleading to a faster convergence rate (note the energy scale difference on the y-axes of the two plots).

4.1 Extrapolation of the binding energies

10 12 14 16 18 20 22

10.76 10.74 10.72 10.70 10.68 10.66 10.64

E(4He) [MeV]

YN= 2.00fm ¹

10 12 14 16 18 20 22

10.7 10.6 10.5 10.4

E(4He) [MeV]

YN= 3.00fm ¹

Figure 4.2:E(⁴_ΛHe,0⁺) as a function of model space sizeN. Solid line is theN-extrapolated result. Red line with shaded area indicates the converged result and its uncertainty. The calculations are based on the Idaho-N³LO(500) interaction evolved toλ_NN =1.6 fm^-1and the NLO19 with a regulator of 600 MeV for YN potential evolved to two SRG flow valuesλ_{Y N}=3.00 fm^-1(right figure) andλ_{Y N}=2.0 fm^-1(left figure).

In hypernuclear physics, a more interesting quantity is, however, the so-calledΛ−separation energy,B_Λ, which is defined as the difference between the binding energies of a hypernucleus and of the corresponding parent nucleus. Thus, forB_Λ(⁴_ΛHe) can be calculated as

B_Λ(⁴_ΛHe)=E(³He)−E(⁴_ΛHe). (4.3) Following the definition Eq. (4.3), one in principle can subtract the separation energy for eachω andN,

B_Λ(⁴_ΛHe, ω,N)=E(³He, ω,N)−E(⁴_ΛHe, ω,N), (4.4) and then employ the above mentioned two-step procedure to extrapolate the convergedB_Λ. We have, however, observed that, for each model space sizeN, the useful ranges ofωand hence the optimal frequenciesω_optfor the nuclear core ³He and hypernucleus⁴_ΛHe are not the same. It is therefore advisable to eliminate theω-dependence of the binding energies of³He and⁴_ΛHe separately. After that, one subtractsB_Λ(N) for every model spaceN

B_Λ(⁴_ΛHe,N)=E(³He,N)−E(⁴_ΛHe,N), (4.5) and employs the ansatz Eq. (4.2) to extract the converged resultB_Λ(⁴_ΛHe) in infinite model space together with its uncertainty. For demonstration, we also show in Fig.4.3the model-space extrapol-ation of the separextrapol-ation energy in⁴_ΛHe. As expected, evolving the YN potential to low SRG cutoffs indeed speeds up the calculations significantly. When comparing Figs.4.2and4.3, we also notice a faster convergence rate ofB_Λ(⁴_ΛHe) than that of the binding energyE(⁴_ΛHe).

It should be stressed that, while the binding energies are strictly monotonic (variational), it is not necessarily true for B_Λespecially in large systems like⁷_ΛLi. Nevertheless, one will see later that the separation energies always converge faster than the individual binding energies of the hypernucleus and of the corresponding nuclear core. In many cases, one can even use a straight line instead of

Chapter 4 Results forA=4−7Hypernuclei

10 12 14 16 18 20 22

3.18 3.16 3.14 3.12 3.10 3.08

B[MeV]

YN= 2.00fm ¹

10 12 14 16 18 20 22

3.2 3.1 3.0 2.9 2.8

B[MeV]

YN= 3.00fm¹

Figure 4.3:B_Λ(⁴_ΛHe,0⁺) as a function of model space sizeN. Same descriptions of lines and symbols as in Fig.4.2.

the exponential decay function as in Eq. (4.2) for extrapolatingB_Λ. Let us further emphasize that, although the described procedure is rather expensive, it allows for a systematic and very reliable extraction of the final results of the NCSM calculations present in the thesis. Importantly, within the Jacobi-basis formalism such robust extrapolation scheme is doable and yields plausible results for light p-shell hypernuclei as one will see in the comming sections.

Im Dokument Jacobi No-Core Shell Model for P-shell Hypernuclei (Seite 44-52)