Abstract Estimates of soil organic carbon (SOC) stocks are essential for many environmental applications. However, significant inconsistencies exist in SOC stock estimates for the U.S. across current SOC maps. We propose a framework that combines unsupervised multivariate geographic clustering (MGC) and supervised Random Forests regression, improving SOC maps by capturing heterogeneous relationships with SOC drivers. We first used MGC to divide the U.S. into 20 SOC regions based on the similarity of covariates (soil biogeochemical, bioclimatic, biological, and physiographic variables). Subsequently, separate Random Forests models were trained for each SOC region, utilizing environmental covariates and SOC observations. Our estimated SOC stocks for the U.S. (52.6 ± 3.2 Pg for 0–30 cm and 108.3 ± 8.2 Pg for 0–100 cm depth) were within the range estimated by existing products like Harmonized World Soil Database, HWSD (46.7 Pg for 0–30 cm and 90.7 Pg for 0–100 cm depth) and SoilGrids 2.0 (45.7 Pg for 0–30 cm and 133.0 Pg for 0–100 cm depth). However, independent validation with soil profile data from the National Ecological Observatory Network showed that our approach (R2 = 0.51) outperformed the estimates obtained from Harmonized World Soil Database (R2 = 0.23) and SoilGrids 2.0 (R2 = 0.39) for the topsoil (0–30 cm). Uncertainty analysis (e.g., low representativeness and high coefficients of variation) identified regions requiring more measurements, such as Alaska and the deserts of the U.S. Southwest. Our approach effectively captures the heterogeneous relationships between widely available predictors and the current SOC baseline across regions, offering reliable SOC estimates at 1 km resolution for benchmarking Earth system models.
more »
« less
Upscaling soil organic carbon measurements at the continental scale using multivariate clustering analysis and machine learning
{"Abstract":["Data Description<\/strong>:<\/p>\n\nTo improve SOC estimation in the United States, we upscaled site-based SOC measurements to the continental scale using multivariate geographic clustering (MGC) approach coupled with machine learning models. First, we used the MGC approach to segment the United States at 30 arc second resolution based on principal component information from environmental covariates (gNATSGO soil properties, WorldClim bioclimatic variables, MODIS biological variables, and physiographic variables) to 20 SOC regions. We then trained separate random forest model ensembles for each of the SOC regions identified using environmental covariates and soil profile measurements from the International Soil Carbon Network (ISCN) and an Alaska soil profile data. We estimated United States SOC for 0-30 cm and 0-100 cm depths were 52.6 + 3.2 and 108.3 + 8.2 Pg C, respectively.<\/p>\n\nFiles in collection (32):<\/p>\n\nCollection contains 22 soil properties geospatial rasters, 4 soil SOC geospatial rasters, 2 ISCN site SOC observations csv files, and 4 R scripts<\/p>\n\ngNATSGO TIF files:<\/p>\n\n├── available_water_storage_30arc_30cm_us.tif [30 cm depth soil available water storage]\n├── available_water_storage_30arc_100cm_us.tif [100 cm depth soil available water storage]\n├── caco3_30arc_30cm_us.tif [30 cm depth soil CaCO3 content]\n├── caco3_30arc_100cm_us.tif [100 cm depth soil CaCO3 content]\n├── cec_30arc_30cm_us.tif [30 cm depth soil cation exchange capacity]\n├── cec_30arc_100cm_us.tif [100 cm depth soil cation exchange capacity]\n├── clay_30arc_30cm_us.tif [30 cm depth soil clay content]\n├── clay_30arc_100cm_us.tif [100 cm depth soil clay content]\n├── depthWT_30arc_us.tif [depth to water table]\n├── kfactor_30arc_30cm_us.tif [30 cm depth soil erosion factor]\n├── kfactor_30arc_100cm_us.tif [100 cm depth soil erosion factor]\n├── ph_30arc_100cm_us.tif [100 cm depth soil pH]\n├── ph_30arc_100cm_us.tif [30 cm depth soil pH]\n├── pondingFre_30arc_us.tif [ponding frequency]\n├── sand_30arc_30cm_us.tif [30 cm depth soil sand content]\n├── sand_30arc_100cm_us.tif [100 cm depth soil sand content]\n├── silt_30arc_30cm_us.tif [30 cm depth soil silt content]\n├── silt_30arc_100cm_us.tif [100 cm depth soil silt content]\n├── water_content_30arc_30cm_us.tif [30 cm depth soil water content]\n└── water_content_30arc_100cm_us.tif [100 cm depth soil water content]<\/p>\n\nSOC TIF files:<\/p>\n\n├──30cm SOC mean.tif [30 cm depth soil SOC]\n├──100cm SOC mean.tif [100 cm depth soil SOC]\n├──30cm SOC CV.tif [30 cm depth soil SOC coefficient of variation]\n└──100cm SOC CV.tif [100 cm depth soil SOC coefficient of variation]<\/p>\n\nsite observations csv files:<\/p>\n\nISCN_rmNRCS_addNCSS_30cm.csv 30cm ISCN sites SOC replaced NRCS sites with NCSS centroid removed data<\/p>\n\nISCN_rmNRCS_addNCSS_100cm.csv 100cm ISCN sites SOC replaced NRCS sites with NCSS centroid removed data<\/p>\n\n\nData format<\/strong>:<\/p>\n\nGeospatial files are provided in Geotiff format in Lat/Lon WGS84 EPSG: 4326 projection at 30 arc second resolution.<\/p>\n\nGeospatial projection<\/strong>: <\/p>\n\n
GEOGCS["GCS_WGS_1984",\n DATUM["D_WGS_1984",\n SPHEROID["WGS_1984",6378137,298.257223563]],\n PRIMEM["Greenwich",0],\n UNIT["Degree",0.017453292519943295]]\n(base) [jbk@theseus ltar_regionalization]$ g.proj -w\nGEOGCS["wgs84",\n DATUM["WGS_1984",\n SPHEROID["WGS_1984",6378137,298.257223563]],\n PRIMEM["Greenwich",0],\n UNIT["degree",0.0174532925199433]]\n<\/code>\n\n
<\/p>"]}
more »
« less
- PAR ID:
- 10430670
- Publisher / Repository:
- Zenodo
- Date Published:
- Subject(s) / Keyword(s):
- the United States SOC US soil properties Gridded National Soil Survey Geographic Database gNATSGO International Soil Carbon Network (ISCN)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Emerging evidence points out that the responses of soil organic carbon (SOC) to nitrogen (N) addition differ along the soil profile, highlighting the importance of synthesizing results from different soil layers. Here, using a global meta‐analysis, we found that N addition significantly enhanced topsoil (0–30 cm) SOC by 3.7% (±1.4%) in forests and grasslands. In contrast, SOC in the subsoil (30–100 cm) initially increased with N addition but decreased over time. The model selection analysis revealed that experimental duration and vegetation type are among the most important predictors across a wide range of climatic, environmental, and edaphic variables. The contrasting responses of SOC to N addition indicate the importance of considering deep soil layers, particularly for long‐term continuous N deposition. Finally, the lack of depth‐dependent SOC responses to N addition in experimental and modeling frameworks has likely resulted in the overestimation of changes in SOC storage under enhanced N deposition.more » « less
-
Soil nitrogen (N) is an important driver of plant productivity and ecosystem functioning; consequently, it is critical to understand its spatial variability from local-to-global scales. Here we provide a quantitative assessment of the three-dimensional spatial distribution of soil N across the conterminous United States (CONUS) using a digital soil mapping (DSM) approach. We used a random forest-regression kriging algorithm to predict soil N concentrations and associated uncertainty across six soil depths (0-5, 5-15, 15-30, 30-60, 60-100, 100-200 cm) at 5 km spatial grids. Across CONUS, there is a strong spatial dependence of soil N, where soil N concentrations decrease but uncertainty increases with soil depth. Soil N was higher in Pacific Northwest, Northeast, and Great Lakes National Ecological Observatory Network (NEON) ecoclimatic domains. Model uncertainty was higher in Atlantic Neotropical, Southern Rockies/Colorado Plateau and Southeast NEON domains. We also compared our soil N predictions with satellite-derived gross primary production (GPP) and forest biomass from the National Biomass and Carbon Dataset. Finally, we used uncertainty information to propose optimized locations for designing future soil surveys and found that the Atlantic Neotropical, Pacific Northwest, Pacific Southwest, and Appalachian/Cumberland Plateau NEON domains may require larger survey efforts. We highlight the need to increase knowledge of biophysical factors regulating soil processes at deeper depths to better characterize the three-dimensional space of soils. Our results provide a national benchmark regarding the spatial variability and uncertainty of soil N and reveal areas in need of a better representation.</p></p>This dataset includes all covariates used for modeling soil Nitrogen, the training data, and the modeling output. The output represents raster files at 5km resolution of soil N at different depths and associated model uncertainty.</p></p>Main reference:</p>Smith EM, Guevara M, Tarin T, Pouyat R, Vargas R. Spatial variability and uncertainty of soil nitrogen across the conterminous United States (in review). Ecosphere.</p>more » « less
-
Objectives:Fine roots significantly influence ecosystem-scale cycling of nutrients, carbon (C), and water, yet there is limited understanding of how fine root traits vary across and within tropical forests, some of Earth's most C-rich ecosystems. The biomass of fine roots can impact soil carbon storage, as root mortality is a primary source of new carbon to soils. A positive relationship has been observed between fine root biomass and soil carbon stocks in Panama (Cusack et al 2018). Beyond biomass, root characteristics like specific root length (SRL) could also influence soil carbon, as roots with higher SRL are less dense and thinner, potentially decomposing more easily or promoting soil aggregation. Understanding the effects of root morphology and tissue quality on soil carbon storage and with soil properties in general can improve predictions of landscape-scale carbon patterns. We aggregated new data of root biomass, morphology and nutrient content at 0-10 cm, 10-20 cm, 20-50 cm and 50-100 cm depth increments across four distinct lowland Panamanian forests and paired with already published datasets (Cusack et al 2018; Cusack and Turner 2020) of soil chemistry from the same sites and soil depths to explore relationship between soil carbon stocks and root characteristics.Datasets included:The datasets provided include .csv and .xlsx files for fine root characteristics and soil chemistry from four different forests across 0-10 cm, 10-20 cm, 20-50 cm, and 50-100 cm depth increments. Root characteristics include live fine root biomass, dead fine root biomass, coarse root biomass, specific root length, root diameter, root tissue density, specific root area, root %N, root %C, and root C/N ratio. Soil chemistry data includes total carbon (TC), dissolved organic carbon (DOC), bulk density, total phosphorus (TP), available phosphorus (AEM Pi), and various Mehlich-extractable elements such as aluminum, calcium, iron, potassium, manganese, phosphorus, and zinc. Nitrogen content measures include ammonium, nitrate, total dissolved nitrogen (TDN), dissolved inorganic nitrogen (DIN), and dissolved organic nitrogen (DON). The dataset also includes total exchangeable bases (TEB) and effective cation exchange capacity (ECEC) in both centimoles of charge per kilogram and micromoles of charge per gram. The soil chemistry data was obtained from Cusack et al (2018) and Cusack and Turner (2020) and paired with root characteristics data for the same depth increments and sites. Additionally, a .kml file is provided with coordinates for all 32 plots included in the study across four forests (n = 8 plots per site). Root data was averaged across these 8 plots per site and soil data was collected in one pit in each site. This dataset serves as baseline data before a throughfall exclusion experiment, Panama Rainforest Changes with Experimental Drying (PARCHED), was implemented. No special software is needed to open these files.more » « less
-
{"Abstract":["A biodiversity dataset graph: UCSB-IZC<\/p>\n\nThe intended use of this archive is to facilitate (meta-)analysis of the UC Santa Barbara Invertebrate Zoology Collection (UCSB-IZC). UCSB-IZC is a natural history collection of invertebrate zoology at Cheadle Center of Biodiversity and Ecological Restoration, University of California Santa Barbara.<\/p>\n\nThis dataset provides versioned snapshots of the UCSB-IZC network as tracked by Preston [2,3] between 2021-10-08 and 2021-11-04 using [preston track "https://api.gbif.org/v1/occurrence/search/?datasetKey=d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0"].<\/p>\n\nThis archive contains 14349 images related to 32533 occurrence/specimen records. See included sample-image.jpg and their associated meta-data sample-image.json [4].<\/p>\n\nThe images were counted using:<\/p>\n\n$$ preston cat hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c\\\n | grep -o -P ".*depict"\\\n | sort\\\n | uniq\\\n | wc -l<\/p>\n\nAnd the occurrences were counted using:<\/p>\n\n$$ preston cat hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c\\\n | grep -o -P "occurrence/([0-9])+"\\\n | sort\\\n | uniq\\\n | wc -l<\/p>\n\nThe archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance files and data files. Only two index and provenance files are included and have been individually included in this dataset publication. Index files provide a way to links provenance files in time to establish a versioning mechanism.<\/p>\n\nTo retrieve and verify the downloaded UCSB-IZC biodiversity dataset graph, first download preston-*.tar.gz. Then, extract the archives into a "data" folder. Alternatively, you can use the Preston [2,3] command-line tool to "clone" this dataset using:<\/p>\n\n$$ java -jar preston.jar clone --remote https://archive.org/download/preston-ucsb-izc/data.zip/,https://zenodo.org/record/5557670/files,https://zenodo.org/record/5557670/files/5660088<\/p>\n\nAfter that, verify the index of the archive by reproducing the following provenance log history:<\/p>\n\n$$ java -jar preston.jar history\n<urn:uuid:0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/pav/hasVersion> <hash://sha256/d5eb492d3e0304afadcc85f968de1e23042479ad670a5819cee00f2c2c277f36> .\n<hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c> <http://purl.org/pav/previousVersion> <hash://sha256/d5eb492d3e0304afadcc85f968de1e23042479ad670a5819cee00f2c2c277f36> .<\/p>\n\nTo check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.<\/p>\n\n$ java -jar preston.jar verify\nhash://sha256/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c file:/home/jhpoelen/ucsb-izc/data/ce/1d/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c OK CONTENT_PRESENT_VALID_HASH 66438 hash://sha256/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c\nhash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844 file:/home/jhpoelen/ucsb-izc/data/f6/8d/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844 OK CONTENT_PRESENT_VALID_HASH 4093 hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844\nhash://sha256/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef file:/home/jhpoelen/ucsb-izc/data/3e/70/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef OK CONTENT_PRESENT_VALID_HASH 5746 hash://sha256/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef\nhash://sha256/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b file:/home/jhpoelen/ucsb-izc/data/99/58/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b OK CONTENT_PRESENT_VALID_HASH 6147 hash://sha256/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b<\/p>\n\nNote that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".<\/p>\n\nFiles in this data publication:<\/p>\n\n--- start of file descriptions ---<\/p>\n\n-- description of archive and its contents (this file) --\nREADME<\/p>\n\n-- executable java jar containing preston [2,3] v0.3.1. --\npreston.jar<\/p>\n\n-- preston archive containing UCSB-IZC (meta-)data/image files, associated provenance logs and a provenance index --\npreston-[00-ff].tar.gz<\/p>\n\n-- individual provenance index files --\n2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a<\/p>\n\n-- example image and meta-data --\nsample-image.jpg (with hash://sha256/916ba5dc6ad37a3c16634e1a0e3d2a09969f2527bb207220e3dbdbcf4d6b810c)\nsample-image.json (with hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844)<\/p>\n\n--- end of file descriptions ---<\/p>\n\n\nReferences<\/p>\n\n[1] Cheadle Center for Biodiversity and Ecological Restoration (2021). University of California Santa Barbara Invertebrate Zoology Collection. Occurrence dataset https://doi.org/10.15468/w6hvhv accessed via GBIF.org on 2021-11-04 as indexed by the Global Biodiversity Informatics Facility (GBIF) with provenance hash://sha256/d5eb492d3e0304afadcc85f968de1e23042479ad670a5819cee00f2c2c277f36 hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c.\n[2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 .\n[3] MJ Elliott, JH Poelen, JAB Fortes (2020). Toward Reliable Biodiversity Dataset References. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2020.101132\n[4] Cheadle Center for Biodiversity and Ecological Restoration (2021). University of California Santa Barbara Invertebrate Zoology Collection. Occurrence dataset https://doi.org/10.15468/w6hvhv accessed via GBIF.org on 2021-10-08. https://www.gbif.org/occurrence/3323647301 . hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844 hash://sha256/916ba5dc6ad37a3c16634e1a0e3d2a09969f2527bb207220e3dbdbcf4d6b810c<\/p>"],"Other":["This work is funded in part by grant NSF OAC 1839201 and NSF DBI 2102006 from the National Science Foundation."]}more » « less