• Keine Ergebnisse gefunden

7.3 Filtering with Domain Decomposition

7.3.4 SEIK

As in the case of mode-decomposed ensemble and mode matrices, the analysis algo-rithm of the SEIK filter for domain-decomposition is very similar to that of the SEEK filter. The parallel SEIK analysis algorithm for domain-decomposition is shown as algorithm 7.9. Again we discuss the differences to the SEEK algorithm.

For domain-decomposition, a process knows the full state ensemble for its local do-main. Thus, the computation of ensemble means does not require any MPI operations.

Accordingly, the product of matrixT1p with matrixT in line 7 involves no communi-cations of data. The same is true for the computation of the ensemble mean in line 13 and the application of T to t5 in line 20. Due to this, the amount of communicated data is equal for the analysis algorithms of SEEK and SEIK in the case of domain-decomposition. The algorithm contains several operations which are executed without parallelization. These are the initializations of G and Uinv, the solver step for t5, and the computation of t6. Most costly will be the solver step for t5in line 19, since it involves the inversion ofUinv Rr×r. These operations, together with the required communication operations, will limit the parallel efficiency of the domain-decomposed analysis. The parallel efficiency will be, however, better than in the case of mode-decomposition, since there the amount of communicated data is much higher than for domain-decomposition.

For domain-decomposed states, the resampling algorithm of SEIK, shown as algo-rithm 7.10, has the benefit that no communication operations are required at all. The operations on the small r×r and r ×(r+ 1) matrices are performed equally by all processes. They can be expected to require negligible time compared with the com-putation of the new ensemble states. The operations on the ensemble matrix are fully parallelized. Hence, the domain-decomposed resampling algorithm of SEIK can be ex-pected to show a nearly ideal speedup. To reduce the required memory, we implement the ensemble transformation in line 11 using a block formulation. It is analogous to the block structure described for the SEEK resampling algorithm.

7.3.5 Comparison of Communication and Memory Require-ments

Table 7.3 summarizes the size of the communicated arrays in the domain-decomposed filter algorithms. The numbers assume that no communication is performed in the implementation of the measurement operator and in the subroutine RinvA.

Since we have usually n À m > N, r for realistic large scale models, it is obvi-ous from table 7.3, that with domain decomposition significantly less data has to be communicated between processes. The smallest amount is in the SEIK algorithm. Its analysis algorithm communicates only two arrays of sizesr×r and r. The resampling algorithm of SEIK is even executed without any communication of data. The largest amount will be in the EnKF algorithm, since here arrays involving the dimension m are communicated.

Comparing the mode-decomposed algorithms (7.1 to 7.5) with the algorithms using domain decomposition (7.6 to 7.10), the smaller memory requirements of the domain-decomposed filter algorithms become visible. Using domain-decomposition, all arrays

Subroutine SEIK Analysis Domain(step,np,N,xp,Uinv,Xp) int step {time step counter,input}

int np {state dimension on local domain, input} int N {ensemble size, input}

real xp(np) {local state estimate, output}

real Uinv(r, r) {inverse eigenvalue matrix, output} real Xp(np, N) {local ensemble matrix, input/output} real t4,t5,t6,G,∆Uinv,yp,dp, {fields to be allocated} real T1p,T2p,T3p,t4p,∆Uinvp {fields to be allocated} int mp {dimension of local observation vector}

int i {ensemble loop counter}

int r {rank of covariance matrix, r=N 1}

1: call Get Dim Obs(step, mp) {get observation dimension, user supplied}

2: Allocate fields: t4(r),t5(r),t6(N),G(r, r),∆Uinv(r, r),yp(mp),dp(mp),

3: T1p(mp, N),T2p(mp, r),T3p(mp, r),t4p(r),∆Uinvp(r, r)

4: for i=1,N do

5: call Measurement Operator(step, np, mp,Xp(:, i),T1p(:, i)) {local domain}

6: end for

7: T2p T1p T {implemented with T as operator}

8: call RinvA(step, m, r,T2p,T3p) {operate only on local domain}

9: G(N−1(TT T)−1) {by each process; implemented as direct initialization}

10: ∆Uinvp T2pTT3p {matrix-matrix product of type 3}

11: allreduce summation of∆Uinv from∆Uinvp {global MPI operation}

12: UinvG+∆Uinv {by each process}

13: xp ←N−1PN

i=1Xp(:, i) {get ensemble mean state for local domain}

14: call Measurement Operator(step, np, mp,xp,dp) {user supplied}

15: call Measurement(step, mp,yp) {user supplied}

16: dp ypdp

17: t4p T3pTdp {matrix-matrix product of type 3}

18: allreduce summation oft4 from t4p {global MPI operation}

19: solve Uinv t5=t4for t5 {by each process}

20: t6T t5 {implemented with T as operator}

21: xp xp+Xp t6 {matrix-vector product of type 2}

22: De-allocate local analysis fields

Algorithm 7.9: Structure of the parallel filter analysis routine for the SEIK algorithm for domain decomposed states. The arrays T2p and G are not allocated but stored respectively in T1p and Uinv. Analogously, the contents of t5 are stored in t4.

Subroutine SEIK Resample Domain(np,N,xp,Uinv,Xp) int np {state dimension on local domain, input} int N {ensemble size, input}

real xp(np) {state analysis vector, input}

real Uinv(r, r) {inverse eigenvalue matrix, input} real Xp(np, N) {ensemble matrix, input/output} real T1,T2,ΩT,C,T3p {fields to be allocated} int r {rank of covariance matrix, r =N 1}

1: Allocate fields: T1(r, N),T2(N, N),T(r, N),C(r, r),T3p(np, N)

2: Cholesky decomposition: Uinv =C CT {by each process}

3: initialize T {by each process}

4: solve CTT1=T for T1 {by each process}

5: T2T T1 {implemented withT as operator}

6: for i=1,N do

7: T3p(:, i)Xp(:, i)

8: Xp(:, i)xp

9: end for

10: Xp Xp+N1/2 T3p T2 {matrix-matrix product of type 2}

11: De-allocate local analysis fields

Algorithm 7.10: Structure of the parallel resampling routine for the SEIK algorithm for domain decomposed states. The matrix T1 is never allocated in the program. Its contents are stored inT. Lines 6 to 10 can be implemented with a block formulation.

Then only some rows of T3p are allocated.

Table 7.3: Sizes of arrays involved in global MPI operations in the analysis and re-sampling phases of the SEEK and SEEK algorithms and in the analysis phase of the EnKF algorithm for domain-decomposed states. Next to the matrix size, the name of the matrix is given as well as the information whether the MPI operation is an allgather (g) or allreduce (r) operation.

EnKF SEEK SEIK

analysis mN (T1, g) r2 (∆Uinv, r) r2 (∆Uinv, r) mN (D, g) r (t3, r) r (t4, r) 1 (m, r)

resampling r2 (T1, r)