skip to main content


Title: Assessing Ecosystem State Space Models: Identifiability and Estimation
Abstract

Hierarchical probability models are being used more often than non-hierarchical deterministic process models in environmental prediction and forecasting, and Bayesian approaches to fitting such models are becoming increasingly popular. In particular, models describing ecosystem dynamics with multiple states that are autoregressive at each step in time can be treated as statistical state space models (SSMs). In this paper, we examine this subset of ecosystem models, embed a process-based ecosystem model into an SSM, and give closed form Gibbs sampling updates for latent states and process precision parameters when process and observation errors are normally distributed. Here, we use simulated data from an example model (DALECev) and study the effects changing the temporal resolution of observations on the states (observation data gaps), the temporal resolution of the state process (model time step), and the level of aggregation of observations on fluxes (measurements of transfer rates on the state process). We show that parameter estimates become unreliable as temporal gaps between observed state data increase. To improve parameter estimates, we introduce a method of tuning the time resolution of the latent states while still using higher-frequency driver information and show that this helps to improve estimates. Further, we show that data cloning is a suitable method for assessing parameter identifiability in this class of models. Overall, our study helps inform the application of state space models to ecological forecasting applications where (1) data are not available for all states and transfers at the operational time step for the ecosystem model and (2) process uncertainty estimation is desired.

 
more » « less
Award ID(s):
2016264 1750113 1926388
NSF-PAR ID:
10400994
Author(s) / Creator(s):
; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Journal of Agricultural, Biological and Environmental Statistics
Volume:
28
Issue:
3
ISSN:
1085-7117
Page Range / eLocation ID:
p. 442-465
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    In ecology, it is common for processes to be bounded based on physical constraints of the system. One common example is the positivity constraint, which applies to phenomena such as duration times, population sizes, and total stock of a system’s commodity. In this paper, we propose a novel method for parameterizing Lognormal state space models using an approach based on moment matching. Our method enforces the positivity constraint, allows for arbitrary mean evolution and variance structure, and has a closed-form Markov transition density which allows for more flexibility in fitting techniques. We discuss two existing Lognormal state space models and examine how they differ from the method presented here. We use 180 synthetic datasets to compare the forecasting performance under model misspecification and assess the estimation of precision parameters between our method and existing methods. We find that our models perform well under misspecification, and that fixing the observation variance both helps to improve estimation of the process variance and improves forecast performance. To test our method on a difficult problem, we compare the predictive performance of two Lognormal state space models in predicting the Leaf Area Index over a 151 day horizon by using a process-based ecosystem model to describe the temporal dynamics. We find that our moment matching model performs better than its competitor, and is better suited for intermediate predictive horizons. Overall, our study helps to inform practitioners about the importance of incorporating sensible dynamics when using models of complex systems to predict out-of-sample.

     
    more » « less
  2. Abstract

    Lakes are biogeochemical hotspots on the landscape, contributing significantly to the global carbon cycle despite their small areal coverage. Observations and models of lake carbon pools and fluxes are rarely explicitly combined through data assimilation despite successful use of this technique in other fields. Data assimilation adds value to both observations and models by constraining models with observations of the system and by leveraging knowledge of the system formalized by the model to objectively fill observation gaps. In this article, we highlight the utility of data assimilation in lake carbon cycling research by using the ensemble Kalman filter to combine simple lake carbon models with observations of lake carbon pools and fluxes. We demonstrate that data assimilation helps reduce uncertainty in estimates of lake carbon pools and fluxes and more accurately estimate the true carbon pool size compared to estimates derived from observations alone. Data assimilation techniques should be embraced as valuable tools for lake biogeochemists interested in learning about ecosystem dynamics and forecasting ecosystem states and processes.

     
    more » « less
  3. Abstract

    Near‐term ecological forecasts provide resource managers advance notice of changes in ecosystem services, such as fisheries stocks, timber yields, or water quality. Importantly, ecological forecasts can identify where there is uncertainty in the forecasting system, which is necessary to improve forecast skill and guide interpretation of forecast results. Uncertainty partitioning identifies the relative contributions to total forecast variance introduced by different sources, including specification of the model structure, errors in driver data, and estimation of current states (initial conditions). Uncertainty partitioning could be particularly useful in improving forecasts of highly variable cyanobacterial densities, which are difficult to predict and present a persistent challenge for lake managers. As cyanobacteria can produce toxic and unsightly surface scums, advance warning when cyanobacterial densities are increasing could help managers mitigate water quality issues. Here, we fit 13 Bayesian state‐space models to evaluate different hypotheses about cyanobacterial densities in a low nutrient lake that experiences sporadic surface scums of the toxin‐producing cyanobacterium,Gloeotrichia echinulata. We used data from several summers of weekly cyanobacteria samples to identify dominant sources of uncertainty for near‐term (1‐ to 4‐week) forecasts ofG. echinulatadensities. Water temperature was an important predictor of cyanobacterial densities during model fitting and at the 4‐week forecast horizon. However, no physical covariates improved model performance over a simple model including the previous week's densities in 1‐week‐ahead forecasts. Even the best fit models exhibited large variance in forecasted cyanobacterial densities and did not capture rare peak occurrences, indicating that significant explanatory variables when fitting models to historical data are not always effective for forecasting. Uncertainty partitioning revealed that model process specification and initial conditions dominated forecast uncertainty. These findings indicate that long‐term studies of different cyanobacterial life stages and movement in the water column as well as measurements of drivers relevant to different life stages could improve model process representation of cyanobacteria abundance. In addition, improved observation protocols could better define initial conditions and reduce spatial misalignment of environmental data and cyanobacteria observations. Our results emphasize the importance of ecological forecasting principles and uncertainty partitioning to refine and understand predictive capacity across ecosystems.

     
    more » « less
  4. Abstract

    Robust carbon monitoring systems are needed for land managers to assess and mitigate the changing effects of ecosystem stress on western United States forests, where most aboveground carbon is stored in mountainous areas. Atmospheric carbon uptake via gross primary productivity (GPP) is an important indicator of ecosystem function and is particularly relevant to carbon monitoring systems. However, limited ground-based observations in remote areas with complex topography represent a significant challenge for tracking regional-scale GPP. Satellite observations can help bridge these monitoring gaps, but the accuracy of remote sensing methods for inferring GPP is still limited in montane evergreen needleleaf biomes, where (a) photosynthetic activity is largely decoupled from canopy structure and chlorophyll content, and (b) strong heterogeneity in phenology and atmospheric conditions is difficult to resolve in space and time. Using monthly solar-induced chlorophyll fluorescence (SIF) sampled at ∼4 km from the TROPOspheric Monitoring Instrument (TROPOMI), we show that high-resolution satellite-observed SIF followed ecological expectations of seasonal and elevational patterns of GPP across a 3000 m elevation gradient in the Sierra Nevada mountains of California. After accounting for the effects of high reflected radiance in TROPOMI SIF due to snow cover, the seasonal and elevational patterns of SIF were well correlated with GPP estimates from a machine-learning model (FLUXCOM) and a land surface model (CLM5.0-SP), outperforming other spectral vegetation indices. Differences in the seasonality of TROPOMI SIF and GPP estimates were likely attributed to misrepresentation of moisture limitation and winter photosynthetic activity in FLUXCOM and CLM5.0 respectively, as indicated by discrepancies with GPP derived from eddy covariance observations in the southern Sierra Nevada. These results suggest that satellite-observed SIF can serve as a useful diagnostic and constraint to improve upon estimates of GPP toward multiscale carbon monitoring systems in montane, evergreen conifer biomes at regional scales.

     
    more » « less
  5. We propose an algorithm to impute and forecast a time series by transforming the observed time series into a matrix, utilizing matrix estimation to recover missing values and de-noise observed entries, and performing linear regression to make predictions. At the core of our analysis is a representation result, which states that for a large class of models, the transformed time series matrix is (approximately) low-rank. In effect, this generalizes the widely used Singular Spectrum Analysis (SSA) in the time series literature, and allows us to establish a rigorous link between time series analysis and matrix estimation. The key to establishing this link is constructing a Page matrix with non-overlapping entries rather than a Hankel matrix as is commonly done in the literature (e.g., SSA). This particular matrix structure allows us to provide finite sample analysis for imputation and prediction, and prove the asymptotic consistency of our method. Another salient feature of our algorithm is that it is model agnostic with respect to both the underlying time dynamics and the noise distribution in the observations. The noise agnostic property of our approach allows us to recover the latent states when only given access to noisy and partial observations a la a Hidden Markov Model; e.g., recovering the time-varying parameter of a Poisson process without knowing that the underlying process is Poisson. Furthermore, since our forecasting algorithm requires regression with noisy features, our approach suggests a matrix estimation based method-coupled with a novel, non-standard matrix estimation error metric-to solve the error-in-variable regression problem, which could be of interest in its own right. Through synthetic and real-world datasets, we demonstrate that our algorithm outperforms standard software packages (including R libraries) in the presence of missing data as well as high levels of noise. 
    more » « less