skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 1, 2025

Title: Knowledge-guided machine learning can improve carbon cycle quantification in agroecosystems
Abstract Accurate and cost-effective quantification of the carbon cycle for agroecosystems at decision-relevant scales is critical to mitigating climate change and ensuring sustainable food production. However, conventional process-based or data-driven modeling approaches alone have large prediction uncertainties due to the complex biogeochemical processes to model and the lack of observations to constrain many key state and flux variables. Here we propose a Knowledge-Guided Machine Learning (KGML) framework that addresses the above challenges by integrating knowledge embedded in a process-based model, high-resolution remote sensing observations, and machine learning (ML) techniques. Using the U.S. Corn Belt as a testbed, we demonstrate that KGML can outperform conventional process-based and black-box ML models in quantifying carbon cycle dynamics. Our high-resolution approach quantitatively reveals 86% more spatial detail of soil organic carbon changes than conventional coarse-resolution approaches. Moreover, we outline a protocol for improving KGML via various paths, which can be generalized to develop hybrid models to better predict complex earth system dynamics.  more » « less
Award ID(s):
2147195 2239175 1847334 2034385
PAR ID:
10503419
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Nature
Date Published:
Journal Name:
Nature Communications
Volume:
15
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Laser beam powder bed fusion (LB-PBF) is a widely-used metal additive manufacturing process due to its high potential for fabrication flexibility and quality. Its process and performance optimization are key to improving product quality and promote further adoption of LB-PBF. In this article, the state-of-the-art machine learning (ML) applications for process and performance optimization in LB-PBF are reviewed. In these applications, ML is used to model the process-structure–property relationships in a data-driven way and optimize process parameters for high-quality fabrication. We review these applications in terms of their modeled relationships by ML (e.g., process—structure, process—property, or structure—property) and categorize the ML algorithms into interpretable ML, conventional ML, and deep ML according to interpretability and accuracy. This way may be particularly useful for practitioners as a comprehensive reference for selecting the ML algorithms according to the particular needs. It is observed that of the three types of ML above, conventional ML has been applied in process and performance optimization the most due to its balanced performance in terms of model accuracy and interpretability. To explore the power of ML in discovering new knowledge and insights, interpretation with additional steps is often needed for complex models arising from conventional ML and deep ML, such as model-agnostic methods or sensitivity analysis. In the future, enhancing the interpretability of ML, standardizing a systemic procedure for ML, and developing a collaborative platform to share data and findings will be critical to promote the integration of ML in LB-PBF applications on a large scale. 
    more » « less
  2. Abstract Sea surface height observations provided by satellite altimetry since 1993 show a rising rate (3.4 mm yr−1) for global mean sea level. While on average, sea level has risen 10 cm over the last 30 years, there is considerable regional variation in the sea level change. Through this work, we predict sea level trends 30 years into the future at a 2° spatial resolution and investigate the future patterns of the sea level change. We show the potential of machine learning (ML) in this challenging application of long-term sea level forecasting over the global ocean. Our approach incorporates sea level data from both altimeter observations and climate model simulations. We develop a supervised learning framework using fully connected neural networks (FCNNs) that can predict the sea level trend based on climate model projections. Alongside this, our method provides uncertainty estimates associated with the ML prediction. We also show the effectiveness of partitioning our spatial dataset and learning a dedicated ML model for each segmented region. We compare two partitioning strategies: one achieved using domain knowledge and the other employing spectral clustering. Our results demonstrate that segmenting the spatial dataset with spectral clustering improves the ML predictions. Significance StatementLong-term projections are needed to help coastal communities adapt to sea level rise. Forecasting multidecadal sea level change is a complex problem. In this paper, we show the promise of machine learning in producing such forecasts 30 years in advance and over the global ocean. Continued improvements in prediction skills that build on this work will be vital in sea level rise adaptation efforts. 
    more » « less
  3. Physics-guided machine learning (PGML) has become a prevalent approach in studying scientific systems due to its ability to integrate scientific theories for enhancing machine learning (ML) models. However, most PGML approaches are tailored to isolated and relatively simple tasks, which limits their applicability to complex systems involving multiple interacting processes and numerous influencing features. In this paper, we propose a Physics-Guided Foundation Model (PGFM) that combines pre-trained ML models and physics-based models and leverages their complementary strengths to improve the modeling of multiple coupled processes. To effectively conduct pre-training, we construct a simulated environmental system that encompasses a wide range of influencing features and various simulated variables generated by physics-based models. The model is pre-trained in this system to adaptively select important feature interactions guided by multi-task objectives. We then fine-tune the model for each specific task using true observations, while maintaining consistency with established physical theories, such as the principles of mass and energy conservation. We demonstrate the effectiveness of this methodology in modeling water temperature and dissolved oxygen dynamics in real-world lakes. The proposed PGFM is also broadly applicable to a range of scientific fields where physics-based models are being used. 
    more » « less
  4. Abstract Machine learning (ML) provides a powerful framework for the analysis of high‐dimensional datasets by modelling complex relationships, often encountered in modern data with many variables, cases and potentially non‐linear effects. The impact of ML methods on research and practical applications in the educational sciences is still limited, but continuously grows, as larger and more complex datasets become available through massive open online courses (MOOCs) and large‐scale investigations. The educational sciences are at a crucial pivot point, because of the anticipated impact ML methods hold for the field. To provide educational researchers with an elaborate introduction to the topic, we provide an instructional summary of the opportunities and challenges of ML for the educational sciences, show how a look at related disciplines can help learning from their experiences, and argue for a philosophical shift in model evaluation. We demonstrate how the overall quality of data analysis in educational research can benefit from these methods and show how ML can play a decisive role in the validation of empirical models. Specifically, we (1) provide an overview of the types of data suitable for ML and (2) give practical advice for the application of ML methods. In each section, we provide analytical examples and reproducible R code. Also, we provide an extensive Appendix on ML‐based applications for education. This instructional summary will help educational scientists and practitioners to prepare for the promises and threats that come with the shift towards digitisation and large‐scale assessment in education. Context and implicationsRationale for this studyIn 2020, the worldwide SARS‐COV‐2 pandemic forced the educational sciences to perform a rapid paradigm shift with classrooms going online around the world—a hardly novel but now strongly catalysed development. In the context of data‐driven education, this paper demonstrates that the widespread adoption of machine learning techniques is central for the educational sciences and shows how these methods will become crucial tools in the collection and analysis of data and in concrete educational applications. Helping to leverage the opportunities and to avoid the common pitfalls of machine learning, this paper provides educators with the theoretical, conceptual and practical essentials.Why the new findings matterThe process of teaching and learning is complex, multifaceted and dynamic. This paper contributes a seminal resource to highlight the digitisation of the educational sciences by demonstrating how new machine learning methods can be effectively and reliably used in research, education and practical application.Implications for educational researchers and policy makersThe progressing digitisation of societies around the globe and the impact of the SARS‐COV‐2 pandemic have highlighted the vulnerabilities and shortcomings of educational systems. These developments have shown the necessity to provide effective educational processes that can support sometimes overwhelmed teachers to digitally impart knowledge on the plan of many governments and policy makers. Educational scientists, corporate partners and stakeholders can make use of machine learning techniques to develop advanced, scalable educational processes that account for individual needs of learners and that can complement and support existing learning infrastructure. The proper use of machine learning methods can contribute essential applications to the educational sciences, such as (semi‐)automated assessments, algorithmic‐grading, personalised feedback and adaptive learning approaches. However, these promises are strongly tied to an at least basic understanding of the concepts of machine learning and a degree of data literacy, which has to become the standard in education and the educational sciences.Demonstrating both the promises and the challenges that are inherent to the collection and the analysis of large educational data with machine learning, this paper covers the essential topics that their application requires and provides easy‐to‐follow resources and code to facilitate the process of adoption. 
    more » « less
  5. Coarse-grained molecular dynamics (CGMD) simulations address lengthscales and timescales that are critical to many chemical and material applications. Nevertheless, contemporary CGMD modeling is relatively bespoke and there are no black-box CGMD methodologies available that could play a comparable role in discovery applications that density functional theory plays for electronic structure. This gap might be filled by machine learning (ML)-based CGMD potentials that simplify model development, but these methods are still in their early stages and have yet to demonstrate a significant advantage over existing physics-based CGMD methods. Here, we explore the potential of Δ-learning models to leverage the advantages of these two approaches. This is implemented by using ML-based potentials to learn the difference between the target CGMD variable and the predictions of physics-based potentials. The Δ-models are benchmarked against the baseline models in reproducing on-target and off-target atomistic properties as a function of CG resolution, mapping operator, and system topology. The Δ-models outperform the reference ML-only CGMD models in nearly all scenarios. In several cases, the ML-only models manage to minimize training errors while still producing qualitatively incorrect dynamics, which is corrected by the Δ-models. Given their negligible added cost, Δ-models provide essentially free gains over their ML-only counterparts. Nevertheless, an unexpected finding is that neither the Δ-learning models nor the ML-only models significantly outperform the elementary pairwise models in reproducing atomistic properties. This fundamental failure is attributed to the relatively large irreducible force errors associated with coarse-graining that produces little benefit from using more complex potentials. 
    more » « less