skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 10:00 PM ET on Friday, December 8 until 2:00 AM ET on Saturday, December 9 due to maintenance. We apologize for the inconvenience.

Search for: All records

Creators/Authors contains: "Gentine, Pierre"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available October 16, 2024
  2. Abstract

    We provide a global, long-term carbon flux dataset of gross primary production and ecosystem respiration generated using meta-learning, calledMetaFlux. The idea behind meta-learning stems from the need to learn efficiently given sparse data by learning how to learn broad features across tasks to better infer other poorly sampled ones. Using meta-trained ensemble of deep models, we generate global carbon products on daily and monthly timescales at a 0.25-degree spatial resolution from 2001 to 2021, through a combination of reanalysis and remote-sensing products. Site-level validation finds that MetaFlux ensembles have lower validation error by 5–7% compared to their non-meta-trained counterparts. In addition, they are more robust to extreme observations, with 4–24% lower errors. We also checked for seasonality, interannual variability, and correlation to solar-induced fluorescence of the upscaled product and found that MetaFlux outperformed other machine-learning based carbon product, especially in the tropics and semi-arids by 10–40%. Overall, MetaFlux can be used to study a wide range of biogeochemical processes.

    more » « less
  3. Geostationary satellite reveals the asymmetrical impact of heatwaves on plant diurnal photosynthesis at the continental scale. 
    more » « less
    Free, publicly-accessible full text available August 4, 2024
  4. Accurate prediction of precipitation intensity is crucial for both human and natural systems, especially in a warming climate more prone to extreme precipitation. Yet, climate models fail to accurately predict precipitation intensity, particularly extremes. One missing piece of information in traditional climate model parameterizations is subgrid-scale cloud structure and organization, which affects precipitation intensity and stochasticity at coarse resolution. Here, using global storm-resolving simulations and machine learning, we show that, by implicitly learning subgrid organization, we can accurately predict precipitation variability and stochasticity with a low-dimensional set of latent variables. Using a neural network to parameterize coarse-grained precipitation, we find that the overall behavior of precipitation is reasonably predictable using large-scale quantities only; however, the neural network cannot predict the variability of precipitation ( R 2 ∼ 0.45) and underestimates precipitation extremes. The performance is significantly improved when the network is informed by our organization metric, correctly predicting precipitation extremes and spatial variability ( R 2 ∼ 0.9). The organization metric is implicitly learned by training the algorithm on a high-resolution precipitable water field, encoding the degree of subgrid organization. The organization metric shows large hysteresis, emphasizing the role of memory created by subgrid-scale structures. We demonstrate that this organization metric can be predicted as a simple memory process from information available at the previous time steps. These findings stress the role of organization and memory in accurate prediction of precipitation intensity and extremes and the necessity of parameterizing subgrid-scale convective organization in climate models to better project future changes of water cycle and extremes. 
    more » « less
    Free, publicly-accessible full text available May 16, 2024
  5. Abstract

    The Consistent Artificial Intelligence (AI)-based Soil Moisture (CASM) dataset is a global, consistent, and long-term, remote sensing soil moisture (SM) dataset created using machine learning. It is based on the NASA Soil Moisture Active Passive (SMAP) satellite mission SM data and is aimed at extrapolating SMAP-like quality SM back in time using previous satellite microwave platforms. CASM represents SM in the top soil layer, and it is defined on a global 25 km EASE-2 grid and for 2002–2020 with a 3-day temporal resolution. The seasonal cycle is removed for the neural network training to ensure its skill is targeted at predicting SM extremes. CASM comparison to 367 globalin-situSM monitoring sites shows a SMAP-like median correlation of 0.66. Additionally, the SM product uncertainty was assessed, and both aleatoric and epistemic uncertainties were estimated and included in the dataset. CASM dataset can be used to study a wide range of hydrological, carbon cycle, and energy processes since only a consistent long-term dataset allows assessing changes in water availability and water stress.

    more » « less
  6. Abstract The process of evapotranspiration transfers liquid water from vegetation and soil surfaces to the atmosphere, the so-called latent heat flux ( Q LE ), and modulates the Earth’s energy, water, and carbon cycle. Vegetation controls Q LE by regulating leaf stomata opening (surface resistance r s in the Big Leaf approach) and by altering surface roughness (aerodynamic resistance r a ). Estimating r s and r a across different vegetation types is a key challenge in predicting Q LE . We propose a hybrid approach that combines mechanistic modeling and machine learning for modeling Q LE . The hybrid model combines a feed-forward neural network which estimates the resistances from observations as intermediate variables and a mechanistic model in an end-to-end setting. In the hybrid modeling setup, we make use of the Penman–Monteith equation in conjunction with multi-year flux measurements across different forest and grassland sites from the FLUXNET database. This hybrid model setup is successful in predicting Q LE , however, this approach leads to equifinal solutions in terms of estimated physical parameters. We follow two different strategies to constrain the hybrid model and therefore control for the equifinality that arises when the two resistances are estimated simultaneously. One strategy is to impose an a priori constraint on r a based on mechanistic assumptions (theory-driven strategy), while the other strategy makes use of more observational data and adds a constraint in predicting r a through multi-task learning of both latent and sensible heat flux ( Q H ; data-driven strategy) together. Our results show that all hybrid models predict the target variables with a high degree of success, with R 2 = 0.82–0.89 for grasslands and R 2 = 0.70–0.80 for forest sites at the mean diurnal scale. The predicted r s and r a show strong physical consistency across the two regularized hybrid models, but are physically implausible in the under-constrained hybrid model. The hybrid models are robust in reproducing consistent results for energy fluxes and resistances across different scales (diurnal, seasonal, and interannual), reflecting their ability to learn the physical dependence of the target variables on the meteorological inputs. As a next step, we propose to test these heavily observation-informed parameterizations derived through hybrid modeling as a substitute for ad hoc formulations in Earth system models. 
    more » « less
    Free, publicly-accessible full text available March 1, 2024
  7. Free, publicly-accessible full text available March 1, 2024
  8. Abstract. The annual area burned due to wildfires in the western United States (WUS) increased bymore than 300 % between 1984 and 2020. However, accounting for the nonlinear, spatially heterogeneous interactions between climate, vegetation, and human predictors driving the trends in fire frequency and sizes at different spatial scales remains a challenging problem for statistical fire models. Here we introduce a novel stochastic machine learning (SML) framework, SMLFire1.0, to model observed fire frequencies and sizes in 12 km × 12 km grid cells across the WUS. This framework is implemented using mixture density networks trained on a wide suite of input predictors. The modeled WUS fire frequency matches observations at both monthly (r=0.94) and annual (r=0.85) timescales, as do the monthly (r=0.90) and annual (r=0.88) area burned. Moreover, the modeled annual time series of both fire variables exhibit strong correlations (r≥0.6) with observations in 16 out of 18 ecoregions. Our ML model captures the interannual variability and the distinct multidecade increases in annual area burned for both forested and non-forested ecoregions. Evaluating predictor importance with Shapley additive explanations, we find that fire-month vapor pressure deficit (VPD) is the dominant driver of fire frequencies and sizes across the WUS, followed by 1000 h dead fuel moisture (FM1000), total monthly precipitation (Prec), mean daily maximum temperature (Tmax), and fraction of grassland cover in a grid cell. Our findings serve as a promising use case of ML techniques for wildfire prediction in particular and extreme event modeling more broadly. They also highlight the power of ML-driven parameterizations for potential implementation in fire modules of dynamic global vegetation models (DGVMs) and earth system models (ESMs). 
    more » « less
    Free, publicly-accessible full text available January 1, 2024
  9. Free, publicly-accessible full text available January 1, 2024
  10. Process-based modelling offers interpretability and physical consistency in many domains of geosciences but struggles to leverage large datasets efficiently. Machine-learning methods, especially deep networks, have strong predictive skills yet are unable to answer specific scientific questions. In this Perspective, we explore differentiable modelling as a pathway to dissolve the perceived barrier between process-based modelling and machine learning in the geosciences and demonstrate its potential with examples from hydrological modelling. ‘Differentiable’ refers to accurately and efficiently calculating gradients with respect to model variables or parameters, enabling the discovery of high-dimensional unknown relationships. Differentiable modelling involves connecting (flexible amounts of) prior physical knowledge to neural networks, pushing the boundary of physics-informed machine learning. It offers better interpretability, generalizability, and extrapolation capabilities than purely data-driven machine learning, achieving a similar level of accuracy while requiring less training data. Additionally, the performance and efficiency of differentiable models scale well with increasing data volumes. Under data-scarce scenarios, differentiable models have outperformed machine-learning models in producing short-term dynamics and decadal-scale trends owing to the imposed physical constraints. Differentiable modelling approaches are primed to enable geoscientists to ask questions, test hypotheses, and discover unrecognized physical relationships. Future work should address computational challenges, reduce uncertainty, and verify the physical significance of outputs. 
    more » « less
    Free, publicly-accessible full text available July 11, 2024