logo
    A regulated localization method for ensemble-based Kalman filters
    0
    Citation
    0
    Reference
    20
    Related Paper
    Abstract:
    Data assimilation applications with large-scale numerical models exhibit extreme requirements on computational resources. Good scalability of the assimilation system is necessary to make these applications feasible. Sequential data assimilation methods based on ensemble forecasts, like ensemble-based Kalman filters, provide such good scalability, because the forecast of each ensemble member can be performed independently. However, this parallelism has to be combined with the parallelization of both the numerical model and the data assimilation algorithm. In order to simplify the implementation of scalable data assimilation systems based on existing numerical models, the Parallel Data Assimilation Framework PDAF (http://pdaf.awi.de) has been developed. PDAF provides support for implementing a data assimilation system with parallel ensemble forecasts and parallel numerical models. Further, it includes several optimized parallel filter algorithms, like the Ensemble Transform Kalman Filter. We will discuss the philosophy behind PDAF as well as features and scalability of data assimilation systems based on PDAF on the example of data assimilation with the finite element ocean model FEOM.
    Keywords:
    Ensemble forecasting
    Ensemble Learning
    Assimilation (phonology)
    Data assimilation applications with high-dimensional numerical models exhibit extreme requirements on computational resources. Good scalability of the assimilation system is necessary to make these applications feasible. Sequential data assimilation methods based on ensemble forecasts, like ensemble-based Kalman filters, provide such good scalability, because the forecast of each ensemble member can be performed independently. However, this parallelism has to be combined with the parallelization of both the numerical model and the data assimilation algorithm. In order to simplify the implementation of scalable data assimilation systems based on existing numerical models, the Parallel Data Assimilation Framework PDAF [http://pdaf.awi.de] has been developed. PDAF is suitable for educational use with toy models but also for high-dimensional applications and operational use. PDAF is distributed as open source software. PDAF provides a framework for implementing a data assimilation system with parallel ensemble forecasts and parallel numerical models. For maximum efficiency, a single assimilation program can be built that includes both the model and the analysis step. A well-defined interface connects PDAF to the model as well as to the observations. To compute the analysis, PDAF provides several optimized parallel filter algorithms and smoothers. Included are ensemble filters like the Local Ensemble Transform Kalman Filter (LETKF) and the Error Subspace Transform Kalman Filter (ESTKF). We discuss the philosophy behind PDAF as well as features and scalability of data assimilation systems based on PDAF on the example of data assimilation with the finite element ocean model FEOM.
    Ensemble Learning
    Citations (0)
    Ensemble filter algorithms can be implemented in a generic way such that they can be applied with various models with only a minimum amount of recoding. This is possible due to the fact that ensemble filters can operate on abstract state vectors and require only limited information about the numerical model and the observational data used for a data assimilation application. To build an assimilation system, the analysis step of a filter algorithm needs to be connected to the numerical model. Furthermore, ensemble integrations have to be enabled. The Parallel Data Assimilation Framework PDAF has been developed to provide these features: It is a generic framework that allows to extend a numerical model with a filter to build an ensemble data assimilation system with minimal changes to the model code. PDAF also provides a selection of common ensemble Kalman filter algorithms. As the computational cost of ensemble data assimilation is a multiple of that of a pure forward model, the framework and the filter algorithms are parallelized and support parallelized models. Thus, data assimilation with high-dimensional numerical models is feasible. PDAF is coded in Fortran and available as free software (http://pdaf.awi.de). We discuss the features of PDAF and the parallel computing performance of data assimilation systems based on PDAF on the example of data assimilation with the finite element ocean model FEOM.
    Ensemble Learning
    Ensemble forecasting
    Fortran
    Citations (1)
    Abstract. Data assimilation integrates information from observational measurements with numerical models. When used with coupled models of Earth system compartments, e.g. the atmosphere and the ocean, consistent joint states can be estimated. A common approach for data assimilation are ensemble-based methods which use an ensemble of state realizations to estimate the state and its uncertainty. These methods are far more costly to compute than a single coupled model because of the required integration of the ensemble. However, with uncoupled models, the methods also have been shown to exhibit a particularly good scaling behavior. This study discusses an approach to augment a coupled model with data assimilation functionality provided by the Parallel Data Assimilation Framework (PDAF). Using only minimal changes in the codes of the different compartment models, a particularly efficient data assimilation system is generated that utilizes parallelization and in-memory data transfers between the models and the data assimilation functions and hence avoids most of the filter reading and writing and also model restarts during the data assimilation process. The study explains the required modifications of the programs on the example of the coupled atmosphere-sea ice-ocean model AWI-CM. Using the case of the assimilation of oceanic observations shows that the data assimilation leads only small overheads in computing time of about 15 % compared to the model without data assimilation and a very good parallel scalability. The model-agnostic structure of the assimilation software ensures a separation of concerns in that the development of data assimilation methods and be separated from the model application.
    Assimilation (phonology)
    Citations (4)
    Ensemble data assimilation (EnDA) is used to combine numerical models and observations in a quantitative way. EnDA allows us to join the information from model and observations, e.g. for a better estimate of the system state in all variables represented by the model, include those which are not observed. Further, one can improve the model formulation through the estimation of model parameters. An ensemble of model state realizations is used to estimate the uncertainty of the model state and correlations between different variables. To simplify the implementation of EnDA with numerical models, the open-source Parallel Data Assimilation Framework (PDAF, http://pdaf.awi.de) has been developed. PDAF provides support for the ensemble simulations and optimized filter algorithms so that one can implement the EnDA with very small changes to a model code. This tutorial will first provide an overview of possibilities and components of EnDA. Subsequently, the example of combining the ocean general circulation model MITgcm with PDAF will be used to discuss the required implementation steps for adding EnDA to a model. The tutorial should be useful for scientists to get an overview of the EnDA methodology and to learn how ensemble data assimilation can be added to a numerical model.
    Ensemble forecasting
    Code (set theory)
    Assimilation (phonology)
    Citations (0)
    Discussed is the construction of programs for efficient ensemble data assimilation systems based on a direct connection between a coupled simulation model and ensemble data assimilation software. The strategy allows us to set up a data assimilation program with high flexibility and parallel scalability with only small changes to the model. The direct connection is obtained by first extending the source code of the coupled model so that it is able to run an ensemble of model states. In addition, a filtering step is added using a combination of in-memory access and parallel communication to create an online-coupled ensemble assimilation program. The direct connection avoids the common need to stop and restart a whole coupled model system to perform the assimilation of observations in the analysis step of ensemble-based filter methods like ensemble Kalman or particle filters. Instead, the analysis step is performed in between time steps and is independent of the actual model coupler. This strategy allows us to perform both in-compartment (for weakly coupled assimilation) and cross-compartment (for strongly coupled assimilation) assimilation. The assimilation frequency can be kept flexible, so that assimilation of observations from different compartments can be performed at different time intervals. Using the parallel data assimilation framework (PDAF, http://pdaf.awi.de), the direct connection strategy will be exemplified for the ocean-atmosphere model ECHAM6-FESOM.
    Assimilation (phonology)
    Ensemble Learning
    Ensemble forecasting
    Citations (0)
    The Parallel Data Assimilation Framework (PDAF) is a unified framework for ensemble data assimilation. PDAF has been developed to simplify the implementation of scalable ensemble data assimilation systems with existing high-dimensional numerical models. It provides support for the parallelization of the ensemble integration and fully implemented and parallelized ensemble Kalman and nonlinear filters. PDAF encapsulates the filter algorithms so that model and data assimilation developments can be conducted separately. I will review the structure and features of PDAF and discuss its use in different applications of ocean-biogeochemical and coupled atmosphere-ocean models.
    Ensemble Learning
    Assimilation (phonology)
    Citations (0)
    Using neural network technology, dynamic characteristics can be learned from model output or assimilation results to train the model, which has greatly progressed recently. A data-driven data assimilation method is proposed by combining fully connected neural network with ensemble Kalman filter to emulate dynamic models from sparse and noisy observations. First, the hybrid model couples the original dynamic model with the surrogate model. The surrogate model is learned from model forecast values and assimilation results, and its performance is verified using the training accuracy/loss and the validation accuracy/loss at different training times. Second, the assimilation process includes a “two-stage” procedure. Stage 0 generates the training sets and trains the surrogate model. Then, the hybrid model is used for the next assimilation period in Stage 1. Finally, several numerical experiments are conducted using the Lorenz-63 and Lorenz-96 models to demonstrate that the proposed approach is better than the ensemble Kalman filter in different model error covariances, observation error covariances, and observation time steps. The proposed approach has also been applied to sparse observations to improve assimilation performance. This hybrid model is restricted to the form of the ensemble Kalman filter. However, the basic strategy is not restricted to any particular version of the Kalman filter.
    Surrogate model
    Hybrid systems have become the state of the art among data assimilation methods. These systems combine the benefits of two other systems that are traditionally used in operational weather forecasting: an ensemble-based system and a variational system. One of the most recently proposed hybrid approaches is called hybrid gain (HG). It obtains the final analysis as a linear combination of two analyses, assuming that the innovations (i.e. the forecast and the set of observations used) between the two data assimilation methods are identical. A perfect model experiment was performed using the HG in the SPEEDY model to show a new methodology to assign different weights to the two analyses, LETKF and 3D-Var in the generation of the final analysis. Our new approach uses, in the assignment of the weights, the ensemble spread, considered to be a measure of uncertainty in the LETKF. Thus, it is possible to use the estimation of the uncertainty of the analysis that the LETKF provides, to determine where the system should give more weight to the LETKF or the 3D-Var analysis. For this purpose, we define a geographically varying weighting factor alpha, which multiplies the 3D-Var analysis, as the normalised spread for each variable at each level. Then, (1-alpha), which decreases with increasing spread, becomes the factor that multiplies the LETKF analysis. The underlying mechanism of the spread–error relationship is explained using a toy model experiment. The results are very encouraging: the original HG and the new weighted HG analyses have similar high quality and are better than both 3D-Var and LETKF. However, the dynamically weighted HG analyses are significantly more balanced than the original HG analyses are, which has probably contributed to the consistently improved performance observed in the weighted HG, which increases with time throughout the 5-day forecasts.
    Ensemble Learning
    Assimilation (phonology)
    Ensemble forecasting