Summary and Conclusions - Parallel Filter Algorithms for Data Assimilation in Oceanography

A framework for parallel data assimilation based on Kalman filter methods was intro-duced. The framework is based on a clear separation between the model, the filter, and the observational parts. This allows for a structure which requires only minimal changes in an existing model source code when a data assimilation system is implemented using the filter framework. With the framework, an application program interface was intro-duced which defines the calling structure of the framework routines which are called by the model. Also the interfaces to user-supplied routines are defined. These are, e.g., routines which are related to the observations or routines to transfer the state vectors used in the filter algorithms to model fields and vice versa. The interface permits to switch easily between different filter algorithms. In addition, changes to the model and filter source codes can be conducted independently.

Table 8.1: Advantages (+) and drawbacks (−) of the frameworks for the two different process configurations.

one process set for filter and model disjoint process sets

− allocation of sub-ensemble on one process of each model task

+ allocation of a single state vector on one process of each model task

− allocation of filter fields on those model processes which are also filter processes

+ allocation of filter fields on processes separate from the model processes + no additional processes required for

the filter part

− processes additional to the model processes are necessary for the filter part

+ reduced amount of communication if the number of model tasks equals the number of filter processes

− high amount of communication, since each model state vector has to be communicated between filter and model processes

+ model grid information allocated also on filter processes

− model grid information not allocated on filter processes

− load balancing of the forecast by a priori specification of sub-ensemble sizes

+ flexible load balancing due to com-munication of single model state vec-tors

− inflexible possibilities of process con-figurations to achieve good load bal-ance

+ flexible choice of process configura-tions; model and filter can even be executed on different computers

The framework was introduced for two different process configurations. The filter can either execute by some of the model processes (which is denoted below as joint process sets) or the filter and model parts are executed by disjoint process sets. Both variants permit to handle domain-decomposed state vectors as well as a parallelization which decomposes of the ensemble or mode matrices over the modes. To compare the two different process configurations of the framework, advantages and drawbacks of the two configurations are summarized in table 8.1.

A major drawback of the configuration using joint process sets is that at least a part of the ensemble or model matrix has to be allocated on one process of each model task. This can considerably increase the memory requirements of these processes, which also hold fields needed by the model. In addition, fields which are required for the analysis and resampling phases of the filters are allocated on those processes which are also filter processes. These memory requirements can be critical if the computer used for the data assimilation computations poses strong memory limitations. The issue of memory requirements is minor for the case of disjoint process sets. Here only a single state vector is allocated on a single process of each model task. The fields which are required for the filter operations are allocated on the filter processes which are separated from the model processes.

An advantage of the configuration using joint process sets is that the execution of the filter does not require additional processes. All processes of the program are used for model evaluations. In contrast to this, additional processes for the filter part of the program, besides the processes performing the model evaluations, are required for the configuration using disjoint process sets. During the forecast phase, these processes only send control information for the forecast, and communicate state vectors. For large-scale ocean models, the forecast of a state vector takes significantly longer than the communication between the filter and model processes. Due to this, the filter processes will idle most of the time.

Besides the requirement of additional processes for the filter, the configuration with disjoint process sets communicates more data than the variant using joint process sets.

This is due to the fact that all ensemble state vectors, which have to be evolved, need to be send from the filter processes to the model processes and vice versa. For a parallelization using mode-decomposed matrices, the least amount of communication is required in the case of joint process sets if the number of filter processes equals the number of model tasks. In this situation, a sub-ensemble is allocated on each filter process. The communication reduces to that amount which is necessary to distribute the state information to all processes in a model task. For domain-decomposed states, the amount of communications between filter and model can be reduced to zero if the configuration of joint process sets and a single model task is used.

A further potential advantage of the configuration using joint process sets lies in the fact that the information on the model grid is also allocated on the filter processes.

This can be beneficial, e.g., for the implementation of the measurement operator if it requires information on the spatial positions of observations and the elements of the state vector. In the case of disjoint process sets, this information has to be initialized separately from the model.

In addition to reduced memory requirements, the configuration using disjoint pro-cess sets is significantly more flexible in the configuration of the MPI communicators.

Since only single model states are communicated between filter and model tasks, pos-sible deviations in the speed of different model tasks are easily balanced by evolving more states with the faster model tasks than with the slower ones. This flexibility cannot be achieved with joint process sets. Due to the strong separation of filter and model, the configuration using disjoint process sets even permits to execute the filter part of the program on a different computer than the model tasks. Also it is possible to execute model tasks on different computers or to compute forecasts concurrently using different models.

Concluding, this comparison showed, that neither the configuration with joint pro-cess sets nor the configuration using disjoint propro-cess sets for the filter and model parts of the program is clearly preferable. The variant with joint process sets should be preferred if the computer memory permits to store sub-ensembles as well as the fields required for the filter analysis and resampling algorithms on the same processes as the model fields. Joint process sets permit to use all available processes for the model eval-uations and reduces the amount of communicated data. If it is not possible to store the filter fields on the same processes as the model fields, the variant using disjoint process sets for filter and model is preferred. This variant should also be chosen if the use of multiple computers is desired to solve the data assimilation problem.

Filtering Performance and Parallel Efficiency

9.1 Introduction

The parallel filtering framework developed in the preceding chapter 8 has been im-plemented with the Finite Element Ocean Model (FEOM) [12]. The implementation also includes the parallelized filter algorithms developed in chapter 7. FEOM is par-allelized using MPI. Mainly the solver step, required for the implicit time stepping scheme of FEOM, is performed in parallel. The model state fields have to be fully allocated and initialized by all model processes.

The data assimilation system, which is obtained by combining FEOM and the filtering framework, is used to study the parallel efficiency of the framework and of the filter algorithms. In addition, the filtering performance of the three error subspace Kalman filters is analyzed on the basis of twin experiments. These experiments extend the twin experiments performed in chapter 4 to a 3-dimensional test-case. The data assimilation experiments are performed with an idealized configuration of FEOM using a rectangular grid. Assimilated are synthetic observations of the sea surface height.

The major properties of the finite element model FEOM are described in section 9.2.

Subsequently, in section 9.3, the configuration of the twin experiments is described in detail. The filtering performance of the three error subspace Kalman filters SEEK, EnKF and SEIK is examined in section 9.4. Here the abilities of the filter algorithms accurately estimate the 3-dimensional model fields is studied. The parallel efficiency of the framework and the filter algorithms is finally assessed in section 9.5.

Im Dokument Parallel Filter Algorithms for Data Assimilation in Oceanography (Seite 143-147)