skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Soil organic carbon models need independent time-series validation for reliable prediction
Abstract Numerical models are crucial to understand and/or predict past and future soil organic carbon dynamics. For those models aiming at prediction, validation is a critical step to gain confidence in projections. With a comprehensive review of ~250 models, we assess how models are validated depending on their objectives and features, discuss how validation of predictive models can be improved. We find a critical lack of independent validation using observed time series. Conducting such validations should be a priority to improve the model reliability. Approximately 60% of the models we analysed are not designed for predictions, but rather for conceptual understanding of soil processes. These models provide important insights by identifying key processes and alternative formalisms that can be relevant for predictive models. We argue that combining independent validation based on observed time series and improved information flow between predictive and conceptual models will increase reliability in predictions.  more » « less
Award ID(s):
1926413
PAR ID:
10451279
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; « less
Date Published:
Journal Name:
Communications Earth & Environment
Volume:
4
Issue:
1
ISSN:
2662-4435
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract. The terrestrial carbon cycle plays a critical role in modulating the interactions of climate with the Earth system, but different models often make vastly different predictions of its behavior. Efforts to reduce model uncertainty have commonly focused on model structure, namely by introducing additional processes and increasing structural complexity. However, the extent to which increased structural complexity can directly improve predictive skill is unclear. While adding processes may improve realism, the resulting models are often encumbered by a greater number of poorly determined or over-generalized parameters. To guide efficient model development, here we map the theoretical relationship between model complexity and predictive skill. To do so, we developed 16 structurally distinct carbon cycle models spanning an axis of complexity and incorporated them into a model–data fusion system. We calibrated each model at six globally distributed eddy covariance sites with long observation time series and under 42 data scenarios that resulted in different degrees of parameter uncertainty. For each combination of site, data scenario, and model, we then predicted net ecosystem exchange (NEE) and leaf area index (LAI) for validation against independent local site data. Though the maximum model complexity we evaluated is lower than most traditional terrestrial biosphere models, the complexity range we explored provides universal insight into the inter-relationship between structural uncertainty, parametric uncertainty, and model forecast skill. Specifically, increased complexity only improves forecast skill if parameters are adequately informed (e.g., when NEE observations are used for calibration). Otherwise, increased complexity can degrade skill and an intermediate-complexity model is optimal. This finding remains consistent regardless of whether NEE or LAI is predicted. Our COMPLexity EXperiment (COMPLEX) highlights the importance of robust observation-based parameterization for land surface modeling and suggests that data characterizing net carbon fluxes will be key to improving decadal predictions of high-dimensional terrestrial biosphere models. 
    more » « less
  2. Phenotypic evaluation and efficient utilization of germplasm collections can be time-intensive, laborious, and expensive. However, with the plummeting costs of next-generation sequencing and the addition of genomic selection to the plant breeder’s toolbox, we now can more efficiently tap the genetic diversity within large germplasm collections. In this study, we applied and evaluated genomic prediction’s potential to a set of 482 pea ( Pisum sativum L.) accessions—genotyped with 30,600 single nucleotide polymorphic (SNP) markers and phenotyped for seed yield and yield-related components—for enhancing selection of accessions from the USDA Pea Germplasm Collection. Genomic prediction models and several factors affecting predictive ability were evaluated in a series of cross-validation schemes across complex traits. Different genomic prediction models gave similar results, with predictive ability across traits ranging from 0.23 to 0.60, with no model working best across all traits. Increasing the training population size improved the predictive ability of most traits, including seed yield. Predictive abilities increased and reached a plateau with increasing number of markers presumably due to extensive linkage disequilibrium in the pea genome. Accounting for population structure effects did not significantly boost predictive ability, but we observed a slight improvement in seed yield. By applying the best genomic prediction model (e.g., RR-BLUP), we then examined the distribution of genotyped but nonphenotyped accessions and the reliability of genomic estimated breeding values (GEBV). The distribution of GEBV suggested that none of the nonphenotyped accessions were expected to perform outside the range of the phenotyped accessions. Desirable breeding values with higher reliability can be used to identify and screen favorable germplasm accessions. Expanding the training set and incorporating additional orthogonal information (e.g., transcriptomics, metabolomics, physiological traits, etc.) into the genomic prediction framework can enhance prediction accuracy. 
    more » « less
  3. Cook, S; Infante, N (Ed.)
    Even though algebraic conceptual understanding is recognized as a critical skill, existing larger scale validated algebra assessments consist mostly of computational tasks, or only assess a very narrow range of conceptions in a smaller focused domain. Further, few instruments have been validated for use with college students. In this paper, we describe the creation and validation of an algebra concept inventory for college students. We describe how items were administered, revised, and tested for validity and reliability. Results suggest that algebraic conceptual understanding is a measurable construct, and that the instrument has reasonable validity and reliability. Revision and validation is ongoing; however, lessons learned thus far provide information about what conceptual understanding in algebra might look like and how it might be assessed. 
    more » « less
  4. Abstract An estimated 1700 Pg of carbon is frozen in the Arctic permafrost and the fate of this carbon is unclear because of the complex interaction of biophysical, ecological and biogeochemical processes that govern the Arctic carbon budget. Two key processes determining the region’s long-term carbon budget are: (a) carbon uptake through increased plant growth, and (b) carbon release through increased heterotrophic respiration (HR) due to warmer soils. Previous predictions for how these two opposing carbon fluxes may change in the future have varied greatly, indicating that improved understanding of these processes and their feedbacks is critical for advancing our predictive ability for the fate of Arctic peatlands. In this study, we implement and analyze a vertically-resolved model of peatland soil carbon into a cohort-based terrestrial biosphere model to improve our understanding of how on-going changes in climate are altering the Arctic carbon budget. A key feature of the formulation is that accumulation of peat within the soil column modifies its texture, hydraulic conductivity, and thermal conductivity, which, in turn influences resulting rates of HR within the soil column. Analysis of the model at three eddy covariance tower sites in the Alaskan tundra shows that the vertically-resolved soil column formulation accurately captures the zero-curtain phenomenon, in which the temperature of soil layers remain at or near 0 °C during fall freezeback due to the release of latent heat, is critical to capturing observed patterns of wintertime respiration. We find that significant declines in net ecosystem productivity (NEP) occur starting in 2013 and that these declines are driven by increased HR arising from increased precipitation and warming. Sensitivity analyses indicate that the cumulative NEP over the decade responds strongly to the estimated soil carbon stock and more weakly to vegetation abundance at the beginning of the simulation. 
    more » « less
  5. Machine learning algorithms are often used to model and predict animal habitat selection—the relationships between animal occurrences and habitat characteristics. For broadly distributed species, habitat selection often varies among populations and regions; thus, it would seem preferable to fit region- or population-specific models of habitat selection for more accurate inference and prediction, rather than fitting large-scale models using pooled data. However, where the aim is to make range-wide predictions, including areas for which there are no existing data or models of habitat selection, how can regional models best be combined? We propose that ensemble approaches commonly used to combine different algorithms for a single region can be reframed, treating regional habitat selection models as the candidate models. By doing so, we can incorporate regional variation when fitting predictive models of animal habitat selection across large ranges. We test this approach using satellite telemetry data from 168 humpback whales across five geographic regions in the Southern Ocean. Using random forests, we fitted a large-scale model relating humpback whale locations, versus background locations, to 10 environmental covariates, and made a circumpolar prediction of humpback whale habitat selection. We also fitted five regional models, the predictions of which we used as input features for four ensemble approaches: an unweighted ensemble, an ensemble weighted by environmental similarity in each cell, stacked generalization, and a hybrid approach wherein the environmental covariates and regional predictions were used as input features in a new model. We tested the predictive performance of these approaches on an independent validation dataset of humpback whale sightings and whaling catches. These multiregional ensemble approaches resulted in models with higher predictive performance than the circumpolar naive model. These approaches can be used to incorporate regional variation in animal habitat selection when fitting range-wide predictive models using machine learning algorithms. This can yield more accurate predictions across regions or populations of animals that may show variation in habitat selection. 
    more » « less