• Keine Ergebnisse gefunden

3. The BTDFT program set 13

3.3. Grid setup, parallelization, and algorithms

3.3.1. The real-space grid and grid parallelization

Grid setup The BTDFT real-space grid (see figure 3.2a) is centered around the origin of the coordinate system and framed by a boundary ellipsoid with the half-axes ax, ay, and az in x, y, and z-direction. The grid points are distributed equidistantly with a grid spacing of∆xin all three coordinate directions with one grid point at the origin. Their coordinates can therefore be described by a 3D indexi= (ix, iy, iz)∈N3 through

ri=i∆x with X

γ=x,y,z

i2γ

a2γ(∆x)2 <1. (3.1) The total number of grid points isNgrid.

The maximum z-index (iz) is calledmz. Therefore,izranges within−mz ≤iz≤mz. The maximum y-index (iy) in each xy-plane with z-index iz is called my(|iz|). iy

runs within −my(|iz|) ≤ iy ≤ my(|iz|) in this xy-plane (see figure 3.2b). Finally, the maximum x-index (ix) in each x-row with z-index iz and y-index iy is called mx(|iy|,|iz|). In this x-row, ix is defined within −mx(|iy|,|iz|) ≤ ix ≤ mx(|iy|,|iz|).

The whole grid shape can therefore be described by mz, my(|iz|) and mx(|iy|,|iz|) with maximally possible indices ofmz,my(0), andmx(0,0)in x-, y-, and z-direction, respectively.

A function of space f(r) is represented by its function values at the grid points f(i∆x). Due to the ellipsoidal shape of the grid the function values are stored in a one-dimensional (1D) value array with an 1D index n = 1, . . . , Ngrid ∈ N, i.e., fn =f(i∆x). In view of the later performance engineering (section 3.4) the one-to-one mapping between the 1D indexnand the 3D indexiis kept as simple as possible

(see figure 3.2): n always runs from the most negative index to the most positive one with the z-index being the most slowly varying one and the x-index being the fastest varying one. That means thatnruns, one after another, through all xy-planes starting with the one at the smallest z-index. In each xy-plane, n runs, one after another, through all x-rows starting with the one at the smallest y-index. In each x-row, nagain runs from the smallest to the largest x-index.6

Grid parallelization and halo communication TheNgrid grid points are distributed evenly among a number of MPI processes such that each process gets a subgrid that is contiguous in the 1D index. This is exemplified in figure 3.3. Both subfigures show the decomposition of the grid into three subgrids, once from the perspective of the 1D value arrays and once for the real-space grid (as two-dimensional (2D) sketch). The white and blue highlighted grid points or array elements belong to a process’s own subgrid.

The MPI processes frequently need information from each other and therefore send and receive messages. One of the most common situations where this happens is a spatial integral of some function that is defined on the real-space grid. In this case, the processes perform the integration locally in their own subgrids and subsequently sum up the local values using a collective MPI reduction.

BTDFT spends most of the computation time on the application of the Hamiltonian or the Laplacian, which are represented by finite differences of up to6thorder (Norder = 6). In order to apply the finite-differences Laplacian to a function at a specific grid point, the function’s values at the Norder/2 neighboring grid points are required in each of the six directions. This is shown in figure 3.3b. At the boundary of a process’s subgrid there are boundary layers (blue highlighted in figure 3.3) that must be sent to the processes that contain the neighboring subgrids. Consequently, each process needs additional layers of grid points that store the function values received from processes with neighboring subgrids. These are called halo or ghost layers [HW10, §5.2.1] (red highlighted in figure 3.3) and must be updated each time before a finite differences operator is applied. The latter is called halo communication and indicated with green arrows in figure 3.3.

The grid structure chosen above ensures that the halo layers are almost7 contiguous in the 1D index. Since BTDFT spends much computation time in the application of the Laplacian or the Hamiltonian, the halo layers are directly attached to all arrays in BTDFT that require halo communication. When performing local operations on those arrays, such as the multiplication with a number, the halo layers should, however, be excluded8 since this would cause a lot of overhead.

In addition, figure 3.3a shows one additional array element at the lower end of the 1D arrays. This element contains the value of the respective function outside of the

6In BTDFT the maps are stored in the index arraysidx (3D→1D) andkx,ky, andkz (1D→3D) in the t_grid structure.

7If the grid layers at the surface between two subgrids do not match perfectly, there appear single grid points that are not required by the respective neighboring process and do not need to be sent. Sending the pure halo layer without those values as a single message is still possible through derived MPI types and non-contiguous buffers [MPI12, §4]. Since this usually affects only a view single, isolated grid points in the boundary layers, this has no practical relevance.

8The process-specific 1D indices that define a process’ own range and the halo layers are explained in appendix A.3.2 in more detail.

1 2 3

0

0

0

Ngrid

... ... ... ... ... ... ... ... ... ...

Proc 0

}

Halo n

}

Halo

}

Halo

}

Halo

Proc 1

Proc 2 outside

(a) 1D value arrays on the three processes.

outside

Proc 1

Proc 0 Proc 2

Halo

Halo

(b) Decomposition of a 2D elliptical grid. The contiguous 1D index is indicated by the red line in the topmost subgrid.

Figure 3.3: BTDFT grid parallelization and halo communication among three pro-cesses. White and blue highlighted regions belong to a process’s own sub-grid, red highlighted regions are the halo layers that contain copies of the blue highlighted data from the respective neighboring process. The green arrows indicate the corresponding halo communication.

x y z

Figure 3.4:6th order finite differences stencil in three dimensions. The Laplacian is applied at the centered grid point. The function values at three neighboring grid points in each direction, connected by the greed line, are required.

grid and should contain 0 for the zero boundary conditions9. If the index array that converts a 3D index into its corresponding 1D index is applied to a 3D index outside of the grid, the resulting 1D index points to this element.