skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Computationally efficient parameter estimation for high-dimensional ocean biogeochemical models
Abstract. Biogeochemical (BGC) models are widely used in ocean simulations for a range of applications but typically include parameters that are determined based on a combination of empiricism and convention. Here, we describe and demonstrate an optimization-based parameter estimation method for high-dimensional (in parameter space) BGC ocean models. Our computationally efficient method combines the respective benefits of global and local optimization techniques and enables simultaneous parameter estimation at multiple ocean locations using multiple state variables. We demonstrate the method for a 17-state-variable BGC model with 51 uncertain parameters, where a one-dimensional (in space) physical model is used to represent vertical mixing. We perform a twin-simulation experiment to test the accuracy of the method in recovering known parameters. We then use the method to simultaneously match multi-variable observational data collected at sites in the subtropical North Atlantic and Pacific. We examine the effects of different objective functions, sometimes referred to as cost functions, which quantify the disagreement between model and observational data. We further examine increasing levels of data sparsity and the choice of state variables used during the optimization. We end with a discussion of how the method can be applied to other BGC models, ocean locations, and mixing representations.  more » « less
Award ID(s):
1924636 1924658
PAR ID:
10575165
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
Copernicus Publications
Date Published:
Journal Name:
Geoscientific Model Development
Volume:
17
Issue:
2
ISSN:
1991-9603
Page Range / eLocation ID:
621 to 649
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract. We present a newly developed upper-thermocline, open-ocean biogeochemical flux model that is complex and flexible enough to capture open-ocean ecosystem dynamics but reduced enough to incorporate into highly resolved numerical simulations and parameter optimization studies with limited additional computational cost. The model, which is derived from the full 56-state-variable Biogeochemical Flux Model (BFM56; Vichi et al., 2007), follows a biological and chemical functional group approach and allows for the development of critical non-Redfield nutrient ratios. Matter is expressed in units of carbon, nitrogen, and phosphate, following techniques used in more complex models. To reduce the overall computational cost and to focus on upper-thermocline, open-ocean, and non-iron-limited or non-silicate-limited conditions, the reduced model eliminates certain processes, such as benthic, silicate, and iron influences, and parameterizes others, such as the bacterial loop. The model explicitly tracks 17 state variables, divided into phytoplankton, zooplankton, dissolved organic matter, particulate organic matter, and nutrient groups. It is correspondingly called the Biogeochemical Flux Model 17 (BFM17). After describing BFM17, we couple it with the one-dimensional Princeton Ocean Model for validation using observational data from the Sargasso Sea. The results agree closely with observational data, giving correlations above 0.85, except for chlorophyll (0.63) and oxygen (0.37), as well as with corresponding results from BFM56, with correlations above 0.85, except for oxygen (0.56), including the ability to capture the subsurface chlorophyll maximum and bloom intensity. In comparison to previous models of similar size, BFM17 provides improved correlations between several model output fields and observational data, indicating that reproduction of in situ data can be achieved with a low number of variables, while maintaining the functional group approach. Notable additions to BFM17 over similar complexity models are the explicit tracking of dissolved oxygen, allowance for non-Redfield nutrient ratios, and both dissolved and particulate organic matter, all within the functional group framework. 
    more » « less
  2. Nonlinear state-space models are ubiquitous in modeling real-world dynamical systems. Sequential Monte Carlo (SMC) techniques, also known as particle methods, are a well-known class of parameter estimation methods for this general class of state-space models. Existing SMC-based techniques rely on excessive sampling of the parameter space, which makes their computation intractable for large systems or tall data sets. Bayesian optimization techniques have been used for fast inference in state-space models with intractable likelihoods. These techniques aim to find the maximum of the likelihood function by sequential sampling of the parameter space through a single SMC approximator. Various SMC approximators with different fidelities and computational costs are often available for sample- based likelihood approximation. In this paper, we propose a multi-fidelity Bayesian optimization algorithm for the inference of general nonlinear state-space models (MFBO-SSM), which enables simultaneous sequential selection of parameters and approximators. The accuracy and speed of the algorithm are demonstrated by numerical experiments using synthetic gene expression data from a gene regulatory network model and real data from the VIX stock price index. 
    more » « less
  3. Abstract Data-driven design shows the promise of accelerating materials discovery but is challenging due to the prohibitive cost of searching the vast design space of chemistry, structure, and synthesis methods. Bayesian optimization (BO) employs uncertainty-aware machine learning models to select promising designs to evaluate, hence reducing the cost. However, BO with mixed numerical and categorical variables, which is of particular interest in materials design, has not been well studied. In this work, we survey frequentist and Bayesian approaches to uncertainty quantification of machine learning with mixed variables. We then conduct a systematic comparative study of their performances in BO using a popular representative model from each group, the random forest-based Lolo model (frequentist) and the latent variable Gaussian process model (Bayesian). We examine the efficacy of the two models in the optimization of mathematical functions, as well as properties of structural and functional materials, where we observe performance differences as related to problem dimensionality and complexity. By investigating the machine learning models’ predictive and uncertainty estimation capabilities, we provide interpretations of the observed performance differences. Our results provide practical guidance on choosing between frequentist and Bayesian uncertainty-aware machine learning models for mixed-variable BO in materials design. 
    more » « less
  4. Abstract Hierarchical probability models are being used more often than non-hierarchical deterministic process models in environmental prediction and forecasting, and Bayesian approaches to fitting such models are becoming increasingly popular. In particular, models describing ecosystem dynamics with multiple states that are autoregressive at each step in time can be treated as statistical state space models (SSMs). In this paper, we examine this subset of ecosystem models, embed a process-based ecosystem model into an SSM, and give closed form Gibbs sampling updates for latent states and process precision parameters when process and observation errors are normally distributed. Here, we use simulated data from an example model (DALECev) and study the effects changing the temporal resolution of observations on the states (observation data gaps), the temporal resolution of the state process (model time step), and the level of aggregation of observations on fluxes (measurements of transfer rates on the state process). We show that parameter estimates become unreliable as temporal gaps between observed state data increase. To improve parameter estimates, we introduce a method of tuning the time resolution of the latent states while still using higher-frequency driver information and show that this helps to improve estimates. Further, we show that data cloning is a suitable method for assessing parameter identifiability in this class of models. Overall, our study helps inform the application of state space models to ecological forecasting applications where (1) data are not available for all states and transfers at the operational time step for the ecosystem model and (2) process uncertainty estimation is desired. 
    more » « less
  5. Summary Conditional density estimation seeks to model the distribution of a response variable conditional on covariates. We propose a Bayesian partition model using logistic Gaussian processes to perform conditional density estimation. The partition takes the form of a Voronoi tessellation and is learned from the data using a reversible jump Markov chain Monte Carlo algorithm. The methodology models data in which the density changes sharply throughout the covariate space, and can be used to determine where important changes in the density occur. The Markov chain Monte Carlo algorithm involves a Laplace approximation on the latent variables of the logistic Gaussian process model which marginalizes the parameters in each partition element, allowing an efficient search of the approximate posterior distribution of the tessellation. The method is consistent when the density is piecewise constant in the covariate space or when the density is Lipschitz continuous with respect to the covariates. In simulation and application to wind turbine data, the model successfully estimates the partition structure and conditional distribution. 
    more » « less