skip to main content

Title: Using Machine Learning With Partial Dependence Analysis to Investigate Coupling Between Soil Moisture and Near‐Surface Temperature

Soil moisture (SM) influences near‐surface air temperature by partitioning downwelling radiation into latent and sensible heat fluxes, through which dry soils generally lead to higher temperatures. The strength of this coupled soil moisture‐temperature (SM‐T) relationship is not spatially uniform, and numerous methods have been developed to assess SM‐T coupling strength across the globe. These methods tend to involve either idealized climate‐model experiments or linear statistical methods which cannot fully capture nonlinear SM‐T coupling. In this study, we propose a nonlinear machine‐learning (ML)‐based approach for analyzing SM‐T coupling and apply this method to various mid‐latitude regions using historical reanalysis datasets. We first train convolutional neural networks (CNNs) to predict daily maximum near‐surface air temperature (TMAX) given daily SM and geopotential height fields. We then use partial dependence analysis to isolate the average sensitivity of each CNN's TMAX prediction to the SM input under daily atmospheric conditions. The resulting SM‐T relationships broadly agree with previous assessments of SM‐T coupling strength. Over many regions, we find nonlinear relationships between the CNN's TMAX prediction and the SM input map. These nonlinearities suggest that the coupled interactions governing SM‐T relationships vary under different SM conditions, but these variations are regionally dependent. We also apply this method to test the influence of SM memory on SM‐T coupling and find that our results are consistent with previous studies. Although our study focuses specifically on local SM‐T coupling, our ML‐based method can be extended to investigate other coupled interactions within the climate system using observed or model‐derived datasets.

more » « less
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
DOI PREFIX: 10.1029
Date Published:
Journal Name:
Journal of Geophysical Research: Atmospheres
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Human heat stress depends jointly on atmospheric temperature and humidity. Wetter soils reduce temperature but also raise humidity, making the collective impact on heat stress unclear. To better understand these interactions, we use ERA5 to examine the coupling between daily average soil moisture and wet-bulb temperature (Tw) and its seasonal and diurnal cycle at global scale. We identify a global soil moisture–Twcoupling pattern with both widespread negative and positive correlations in contrast to the well-established cooling effect of wet soil on dry-bulb temperature. Regions showing positive correlations closely resemble previously identified land–atmosphere coupling hotspots where soil moisture effectively controls surface energy partition. Soil moisture–Twcoupling varies seasonally closely tied to monsoon development, and the positive coupling is slightly stronger and more widespread during nighttime. Local-scale analysis demonstrates a nonlinear structure of soil moisture–Twcoupling with stronger coupling under relatively dry soils. Hot days with highTwvalues show wetter-than-normal soil, anomalous high latent and low sensible heat flux from a cooler surface, and a shallower boundary layer. This supports the hypothesis that wetter soil increasesTwby concentrating surface moist enthalpy flux within a shallower boundary layer and reducing free-troposphere-air entrainment. We identify areas of particular interest for future studies on the physical mechanisms of soil moisture–heat stress coupling. Our findings suggest that increasing soil moisture might amplify heat stress over large portions of the world including several densely populated areas. These results also raise questions about the effectiveness of evaporative cooling strategies in ameliorating urban heat stress.

    Significance Statement

    The purpose of this study is to provide a global picture of the relationship between soil moisture anomalies and a heat stress metric that includes the joint effects of temperature and humidity. This is important because a better understanding of this relationship will help improve the prediction of extreme heat stress events and inform strategies for ameliorating heat stress. We find a widespread positive correlation between soil moisture and heat stress, in contrast to studies relying on temperature alone. This raises the possibility that, over much of the world, and in the most populous regions, strategies like irrigation or “greening” that can reduce temperature might be ineffective or even harmful in reducing heat stress with humidity incorporated.

    more » « less
  2. Abstract. The annual area burned due to wildfires in the western United States (WUS) increased bymore than 300 % between 1984 and 2020. However, accounting for the nonlinear, spatially heterogeneous interactions between climate, vegetation, and human predictors driving the trends in fire frequency and sizes at different spatial scales remains a challenging problem for statistical fire models. Here we introduce a novel stochastic machine learning (SML) framework, SMLFire1.0, to model observed fire frequencies and sizes in 12 km × 12 km grid cells across the WUS. This framework is implemented using mixture density networks trained on a wide suite of input predictors. The modeled WUS fire frequency matches observations at both monthly (r=0.94) and annual (r=0.85) timescales, as do the monthly (r=0.90) and annual (r=0.88) area burned. Moreover, the modeled annual time series of both fire variables exhibit strong correlations (r≥0.6) with observations in 16 out of 18 ecoregions. Our ML model captures the interannual variability and the distinct multidecade increases in annual area burned for both forested and non-forested ecoregions. Evaluating predictor importance with Shapley additive explanations, we find that fire-month vapor pressure deficit (VPD) is the dominant driver of fire frequencies and sizes across the WUS, followed by 1000 h dead fuel moisture (FM1000), total monthly precipitation (Prec), mean daily maximum temperature (Tmax), and fraction of grassland cover in a grid cell. Our findings serve as a promising use case of ML techniques for wildfire prediction in particular and extreme event modeling more broadly. They also highlight the power of ML-driven parameterizations for potential implementation in fire modules of dynamic global vegetation models (DGVMs) and earth system models (ESMs). 
    more » « less
  3. Abstract

    Heatwaves are extreme near-surface temperature events that can have substantial impacts on ecosystems and society. Early warning systems help to reduce these impacts by helping communities prepare for hazardous climate-related events. However, state-of-the-art prediction systems can often not make accurate forecasts of heatwaves more than two weeks in advance, which are required for advance warnings. We therefore investigate the potential of statistical and machine learning methods to understand and predict central European summer heatwaves on time scales of several weeks. As a first step, we identify the most important regional atmospheric and surface predictors based on previous studies and supported by a correlation analysis: 2-m air temperature, 500-hPa geopotential, precipitation, and soil moisture in central Europe, as well as Mediterranean and North Atlantic sea surface temperatures, and the North Atlantic jet stream. Based on these predictors, we apply machine learning methods to forecast two targets: summer temperature anomalies and the probability of heatwaves for 1–6 weeks lead time at weekly resolution. For each of these two target variables, we use both a linear and a random forest model. The performance of these statistical models decays with lead time, as expected, but outperforms persistence and climatology at all lead times. For lead times longer than two weeks, our machine learning models compete with the ensemble mean of the European Centre for Medium-Range Weather Forecast’s hindcast system. We thus show that machine learning can help improve subseasonal forecasts of summer temperature anomalies and heatwaves.

    Significance Statement

    Heatwaves (prolonged extremely warm temperatures) cause thousands of fatalities worldwide each year. These damaging events are becoming even more severe with climate change. This study aims to improve advance predictions of summer heatwaves in central Europe by using statistical and machine learning methods. Machine learning models are shown to compete with conventional physics-based models for forecasting heatwaves more than two weeks in advance. These early warnings can be used to activate effective and timely response plans targeting vulnerable communities and regions, thereby reducing the damage caused by heatwaves.

    more » « less
  4. Abstract. A key challenge for biological oceanography is relating the physiologicalmechanisms controlling phytoplankton growth to the spatial distribution ofthose phytoplankton. Physiological mechanisms are often isolated by varyingone driver of growth, such as nutrient or light, in a controlled laboratorysetting producing what we call “intrinsic relationships”. We contrastthese with the “apparent relationships” which emerge in the environment inclimatological data. Although previous studies have found machine learning(ML) can find apparent relationships, there has yet to be a systematic studyexamining when and why these apparent relationships diverge from theunderlying intrinsic relationships found in the lab and how and why this may depend on the method applied. Here we conduct a proof-of-concept studywith three scenarios in which biomass is by construction a function oftime-averaged phytoplankton growth rate. In the first scenario, the inputsand outputs of the intrinsic and apparent relationships vary over thesame monthly timescales. In the second, the intrinsic relationships relateaverages of drivers that vary on hourly timescales to biomass, but theapparent relationships are sought between monthly averages of these inputsand monthly-averaged output. In the third scenario we apply ML to the outputof an actual Earth system model (ESM). Our results demonstrated that whenintrinsic and apparent relationships operate on the same spatial andtemporal timescale, neural network ensembles (NNEs) were able to extract theintrinsic relationships when only provided information about the apparentrelationships, while colimitation and its inability to extrapolate resulted in random forests (RFs) diverging from the true response. Whenintrinsic and apparent relationships operated on different timescales (aslittle separation as hourly versus daily), NNEs fed with apparentrelationships in time-averaged data produced responses with the right shapebut underestimated the biomass. This was because when the intrinsicrelationship was nonlinear, the response to a time-averaged input differedsystematically from the time-averaged response. Although the limitationsfound by NNEs were overestimated, they were able to produce more realisticshapes of the actual relationships compared to multiple linear regression.Additionally, NNEs were able to model the interactions between predictorsand their effects on biomass, allowing for a qualitative assessment of thecolimitation patterns and the nutrient causing the most limitation. Futureresearch may be able to use this type of analysis for observational datasetsand other ESMs to identify apparent relationships between biogeochemicalvariables (rather than spatiotemporal distributions only) and identifyinteractions and colimitations without having to perform (or at leastperforming fewer) growth experiments in a lab. From our study, it appearsthat ML can extract useful information from ESM output and could likely doso for observational datasets as well. 
    more » « less
  5. Abstract

    Regional, automated meteorological networks, such as the Oklahoma Mesonet can potentially provide high quality forcing data for generating gridded surfaces, but proven methods of interpolating weather variables between the station locations are needed. We compared two interpolation methods, ordinary kriging (OK) and empirical Bayesian kriging (EBK), with and without using long‐term climate imprints (CI), for creating spatially continuous, daily weather datasets. Daily meteorological variables (maximum and minimum temperature, solar radiation, and precipitation) from the Oklahoma Mesonet for the period 1997–2014 were interpolated using geoprocessing tools in ArcGIS. Cross‐validation was used for evaluation of interpolation methods, with 90% of sites chosen randomly for the training set and the remaining 10% left for validation. For all interpolation approaches, cross‐validation showed coefficient of determination (R2) values of .99 and .98 for daily maximum and minimum air temperatures, with mean absolute error (MAE) ranging from ±0.45–0.50 °C for maximum temperature and ±0.77–0.80 °C for minimum temperature. Likewise, for daily solar radiation,R2values of .94 and .93 showed overall good prediction accuracy with MAE values 1.00 and 1.01 MJ m–2 d–1for EBK and OK, respectively. However, for rainfall, all methods yieldedR2values ≤.67, suggesting a need for more effective interpolation method. Based on its lower computational time and lower input data requirement, OK appears preferable to the other approaches tested here to provide the daily weather data for gridded models in Oklahoma and other regions with similar monitoring networks.

    more » « less