skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Upscaling Soil Organic Carbon Measurements at the Continental Scale Using Multivariate Clustering Analysis and Machine Learning
Abstract Estimates of soil organic carbon (SOC) stocks are essential for many environmental applications. However, significant inconsistencies exist in SOC stock estimates for the U.S. across current SOC maps. We propose a framework that combines unsupervised multivariate geographic clustering (MGC) and supervised Random Forests regression, improving SOC maps by capturing heterogeneous relationships with SOC drivers. We first used MGC to divide the U.S. into 20 SOC regions based on the similarity of covariates (soil biogeochemical, bioclimatic, biological, and physiographic variables). Subsequently, separate Random Forests models were trained for each SOC region, utilizing environmental covariates and SOC observations. Our estimated SOC stocks for the U.S. (52.6 ± 3.2 Pg for 0–30 cm and 108.3 ± 8.2 Pg for 0–100 cm depth) were within the range estimated by existing products like Harmonized World Soil Database, HWSD (46.7 Pg for 0–30 cm and 90.7 Pg for 0–100 cm depth) and SoilGrids 2.0 (45.7 Pg for 0–30 cm and 133.0 Pg for 0–100 cm depth). However, independent validation with soil profile data from the National Ecological Observatory Network showed that our approach (R2 = 0.51) outperformed the estimates obtained from Harmonized World Soil Database (R2 = 0.23) and SoilGrids 2.0 (R2 = 0.39) for the topsoil (0–30 cm). Uncertainty analysis (e.g., low representativeness and high coefficients of variation) identified regions requiring more measurements, such as Alaska and the deserts of the U.S. Southwest. Our approach effectively captures the heterogeneous relationships between widely available predictors and the current SOC baseline across regions, offering reliable SOC estimates at 1 km resolution for benchmarking Earth system models.  more » « less
Award ID(s):
2217817 2106137 2106138
PAR ID:
10493478
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
DOI PREFIX: 10.1029
Date Published:
Journal Name:
Journal of Geophysical Research: Biogeosciences
Volume:
129
Issue:
2
ISSN:
2169-8953
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. {"Abstract":["Data Description<\/strong>:<\/p>\n\nTo improve SOC estimation in the United States, we upscaled site-based SOC measurements to the continental scale using multivariate geographic clustering (MGC) approach coupled with machine learning models. First, we used the MGC approach to segment the United States at 30 arc second resolution based on principal component information from environmental covariates (gNATSGO soil properties, WorldClim bioclimatic variables, MODIS biological variables, and physiographic variables) to 20 SOC regions. We then trained separate random forest model ensembles for each of the SOC regions identified using environmental covariates and soil profile measurements from the International Soil Carbon Network (ISCN) and an Alaska soil profile data. We estimated United States SOC for 0-30 cm and 0-100 cm depths were 52.6 + 3.2 and 108.3 + 8.2 Pg C, respectively.<\/p>\n\nFiles in collection (32):<\/p>\n\nCollection contains 22 soil properties geospatial rasters, 4 soil SOC geospatial rasters, 2 ISCN site SOC observations csv files, and 4 R scripts<\/p>\n\ngNATSGO TIF files:<\/p>\n\n├── available_water_storage_30arc_30cm_us.tif                   [30 cm depth soil available water storage]\n├── available_water_storage_30arc_100cm_us.tif                 [100 cm depth soil available water storage]\n├── caco3_30arc_30cm_us.tif                                                 [30 cm depth soil CaCO3 content]\n├── caco3_30arc_100cm_us.tif                                               [100 cm depth soil CaCO3 content]\n├── cec_30arc_30cm_us.tif                                                     [30 cm depth soil cation exchange capacity]\n├── cec_30arc_100cm_us.tif                                                   [100 cm depth soil cation exchange capacity]\n├── clay_30arc_30cm_us.tif                                                     [30 cm depth soil clay content]\n├── clay_30arc_100cm_us.tif                                                   [100 cm depth soil clay content]\n├── depthWT_30arc_us.tif                                                        [depth to water table]\n├── kfactor_30arc_30cm_us.tif                                                 [30 cm depth soil erosion factor]\n├── kfactor_30arc_100cm_us.tif                                               [100 cm depth soil erosion factor]\n├── ph_30arc_100cm_us.tif                                                      [100 cm depth soil pH]\n├── ph_30arc_100cm_us.tif                                                      [30 cm depth soil pH]\n├── pondingFre_30arc_us.tif                                                     [ponding frequency]\n├── sand_30arc_30cm_us.tif                                                    [30 cm depth soil sand content]\n├── sand_30arc_100cm_us.tif                                                  [100 cm depth soil sand content]\n├── silt_30arc_30cm_us.tif                                                        [30 cm depth soil silt content]\n├── silt_30arc_100cm_us.tif                                                      [100 cm depth soil silt content]\n├── water_content_30arc_30cm_us.tif                                      [30 cm depth soil water content]\n└── water_content_30arc_100cm_us.tif                                   [100 cm depth soil water content]<\/p>\n\nSOC TIF files:<\/p>\n\n├──30cm SOC mean.tif                             [30 cm depth soil SOC]\n├──100cm SOC mean.tif                           [100 cm depth soil SOC]\n├──30cm SOC CV.tif                                 [30 cm depth soil SOC coefficient of variation]\n└──100cm SOC CV.tif                              [100 cm depth soil SOC coefficient of variation]<\/p>\n\nsite observations csv files:<\/p>\n\nISCN_rmNRCS_addNCSS_30cm.csv       30cm ISCN sites SOC replaced NRCS sites with NCSS centroid removed data<\/p>\n\nISCN_rmNRCS_addNCSS_100cm.csv       100cm ISCN sites SOC replaced NRCS sites with NCSS centroid removed data<\/p>\n\n\nData format<\/strong>:<\/p>\n\nGeospatial files are provided in Geotiff format in Lat/Lon WGS84 EPSG: 4326 projection at 30 arc second resolution.<\/p>\n\nGeospatial projection<\/strong>: <\/p>\n\nGEOGCS["GCS_WGS_1984",\n DATUM["D_WGS_1984",\n SPHEROID["WGS_1984",6378137,298.257223563]],\n PRIMEM["Greenwich",0],\n UNIT["Degree",0.017453292519943295]]\n(base) [jbk@theseus ltar_regionalization]$ g.proj -w\nGEOGCS["wgs84",\n DATUM["WGS_1984",\n SPHEROID["WGS_1984",6378137,298.257223563]],\n PRIMEM["Greenwich",0],\n UNIT["degree",0.0174532925199433]]\n<\/code>\n\n <\/p>"]} 
    more » « less
  2. ABSTRACT The current soil carbon paradigm puts particulate organic carbon (POC) as one of the major components of soil organic carbon worldwide, highlighting its pivotal role in carbon mitigation. In this study, we compiled a global dataset of 3418 data points of POC concentration in soils and applied empirical modeling and machine learning algorithms to investigate the spatial variation in POC concentration and its controls. The global POC concentration in topsoil (0–30 cm) is estimated as 3.02 g C/kg dry soil, exhibiting a declining trend from polar regions to the equator. Boreal forests contain the highest POC concentration, averaging at 4.58 g C/kg dry soil, whereas savannas exhibit the lowest at 1.41 g C/kg dry soil. We developed a global map of soil POC density in soil profiles of 0‐30 cm and 0–100 cm with an empirical model. The global stock of POC is 158.15 Pg C for 0–30 cm and 222.75 Pg C for 0–100 cm soil profiles with a substantial spatial variation. Analysis with a machine learning algorithm concluded the predominate controls of edaphic factors (i.e., bulk density and soil C content) on POC concentration across biomes. However, the secondary controls vary among biomes, with solid climate controls in grassland, pasture, and shrubland, while strong vegetation controls in forests. The biome‐level estimates and maps of POC density provide a benchmark for modeling C fractions in soils; the various controls on POC suggest incorporating biological and physiochemical mechanisms in soil C models to assess and forecast the soil POC dynamics in response to global change. 
    more » « less
  3. Abstract. Soil represents the largest phosphorus (P) stock in terrestrialecosystems. Determining the amount of soil P is a critical first step inidentifying sites where ecosystem functioning is potentially limited by soilP availability. However, global patterns and predictors of soil total Pconcentration remain poorly understood. To address this knowledge gap, weconstructed a database of total P concentration of 5275 globallydistributed (semi-)natural soils from 761 published studies. We quantifiedthe relative importance of 13 soil-forming variables in predicting soiltotal P concentration and then made further predictions at the global scaleusing a random forest approach. Soil total P concentration variedsignificantly among parent material types, soil orders, biomes, andcontinents and ranged widely from 1.4 to 9630.0 (median 430.0 and mean570.0) mg kg−1 across the globe. About two-thirds (65 %) of theglobal variation was accounted for by the 13 variables that we selected,among which soil organic carbon concentration, parent material, mean annualtemperature, and soil sand content were the most important ones. Whilepredicted soil total P concentrations increased significantly with latitude,they varied largely among regions with similar latitudes due to regionaldifferences in parent material, topography, and/or climate conditions. SoilP stocks (excluding Antarctica) were estimated to be 26.8 ± 3.1 (mean ± standard deviation) Pg and 62.2 ± 8.9 Pg (1 Pg = 1 × 1015 g) in the topsoil (0–30 cm) and subsoil (30–100 cm), respectively.Our global map of soil total P concentration as well as the underlyingdrivers of soil total P concentration can be used to constraint Earth systemmodels that represent the P cycle and to inform quantification of globalsoil P availability. Raw datasets and global maps generated in this studyare available at https://doi.org/10.6084/m9.figshare.14583375(He et al., 2021). 
    more » « less
  4. Abstract Emerging evidence points out that the responses of soil organic carbon (SOC) to nitrogen (N) addition differ along the soil profile, highlighting the importance of synthesizing results from different soil layers. Here, using a global meta‐analysis, we found that N addition significantly enhanced topsoil (0–30 cm) SOC by 3.7% (±1.4%) in forests and grasslands. In contrast, SOC in the subsoil (30–100 cm) initially increased with N addition but decreased over time. The model selection analysis revealed that experimental duration and vegetation type are among the most important predictors across a wide range of climatic, environmental, and edaphic variables. The contrasting responses of SOC to N addition indicate the importance of considering deep soil layers, particularly for long‐term continuous N deposition. Finally, the lack of depth‐dependent SOC responses to N addition in experimental and modeling frameworks has likely resulted in the overestimation of changes in SOC storage under enhanced N deposition. 
    more » « less
  5. Abstract The storage and cycling of soil organic carbon (SOC) are governed by multiple co-varying factors, including climate, plant productivity, edaphic properties, and disturbance history. Yet, it remains unclear which of these factors are the dominant predictors of observed SOC stocks, globally and within biomes, and how the role of these predictors varies between observations and process-based models. Here we use global observations and an ensemble of soil biogeochemical models to quantify the emergent importance of key state factors – namely, mean annual temperature, net primary productivity, and soil mineralogy – in explaining biome- to global-scale variation in SOC stocks. We use a machine-learning approach to disentangle the role of covariates and elucidate individual relationships with SOC, without imposing expected relationshipsa priori. While we observe qualitatively similar relationships between SOC and covariates in observations and models, the magnitude and degree of non-linearity vary substantially among the models and observations. Models appear to overemphasize the importance of temperature and primary productivity (especially in forests and herbaceous biomes, respectively), while observations suggest a greater relative importance of soil minerals. This mismatch is also evident globally. However, we observe agreement between observations and model outputs in select individual biomes – namely, temperate deciduous forests and grasslands, which both show stronger relationships of SOC stocks with temperature and productivity, respectively. This approach highlights biomes with the largest uncertainty and mismatch with observations for targeted model improvements. Understanding the role of dominant SOC controls, and the discrepancies between models and observations, globally and across biomes, is essential for improving and validating process representations in soil and ecosystem models for projections under novel future conditions. 
    more » « less