skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The National Eutrophication Survey: lake characteristics and historical nutrient concentrations
Abstract. Historical ecological surveys serve as a baseline and provide context for contemporary research, yet many of these records are not preserved in a way that ensures their long-term usability. The National Eutrophication Survey (NES) database is currently only available as scans of the original reports (PDF files) with no embedded character information. This limits its searchability, machine readability, and the ability of current and future scientists to systematically evaluate its contents. The NES data were collected by the US Environmental Protection Agency between 1972 and 1975 as part of an effort to investigate eutrophication in freshwater lakes and reservoirs. Although several studies have manually transcribed small portions of the database in support of specific studies, there have been no systematic attempts to transcribe and preserve the database in its entirety. Here we use a combination of automated optical character recognition and manual quality assurance procedures to make these data available for analysis. The performance of the optical character recognition protocol was found to be linked to variation in the quality (clarity) of the original documents. For each of the four archival scanned reports, our quality assurance protocol found an error rate between 5.9 and 17%. The goal of our approach was to strike a balance between efficiency and data quality by combining entry of data by hand with digital transcription technologies. The finished database contains information on the physical characteristics, hydrology, and water quality of about 800 lakes in the contiguous US (Stachelek et al.(2017), https://doi.org/10.5063/F1639MVD). Ultimately, this database could be combined with more recent studies to generate meta-analyses of water quality trends and spatial variation across the continental US.  more » « less
Award ID(s):
1637653 1027253
PAR ID:
10073385
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Earth System Science Data
Volume:
10
Issue:
1
ISSN:
1866-3516
Page Range / eLocation ID:
81 to 86
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The TIERRAS project is an open-access platform that compiles a database of more than 400 tracer injection experiments in rivers and streams, sourced from previously published studies and reports. It also includes interactive features that allow users to explore, download, and contribute new data. The goal is to provide a centralized and accessible repository for researchers, environmental managers, and anyone interested in water quality, hydrological modeling, and stream solute dynamics.   These experiments were collected from various sources, including published studies, unpublished data, and technical reports from different authors. The original data were in diverse formats and units; all data were curated and standardized to a consistent format and to the Imperial (U.S. customary) units.   Visit TIERRAS at https://www.tierras.org/ Cite: Rodríguez, L., Tunby, P., Abusang, A., Tartakovsky, A., Carroll, K., Ginn, T., & González-Pinzón, R. (2025). TIERRAS Tracer Injection Experiments in RiveRs And Streams (2.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.15794259 
    more » « less
  2. Abstract. Water quality in lakes is an emergent property of complex biotic and abiotic processes that differ across spatial and temporal scales. Water quality is also a determinant of ecosystem services that lakes provide and is thus of great interest to ecologists. Machine learning and other computer science techniques are increasingly being used to predict water quality dynamics as well as to gain a greater understanding of water quality patterns and controls. To benefit the sciences of both ecology and computer science, we have created a benchmark dataset of lake water quality time series and vertical profiles. LakeBeD-US contains over 500 million unique observations of lake water quality collected by multiple long-term monitoring programs across 17 water quality variables from 21 lakes in the United States. There are two published versions of LakeBeD-US: the “Ecology Edition” published in the Environmental Data Initiative repository (https://doi.org/10.6073/pasta/c56a204a65483790f6277de4896d7140, McAfee et al., 2024) and the “Computer Science Edition” published in the Hugging Face repository (https://doi.org/10.57967/hf/3771, Pradhan et al., 2024). Each edition is formatted in a manner conducive to inquiries and analyses specific to each domain. For ecologists, LakeBeD-US: Ecology Edition provides an opportunity to study the spatial and temporal dynamics of several lakes with varying water quality, ecosystem, and landscape characteristics. For computer scientists, LakeBeD-US: Computer Science Edition acts as a benchmark dataset that enables the advancement of machine learning for water quality prediction. 
    more » « less
  3. This dataset provides segregation energy spectra information for cobalt solute in 7272 aluminum grain boundaries that span the 5D space of crystallographic character. The dataset and some of its characteristics are described in detail in https://doi.org/10.1016/j.actamat.2024.120448. The information about the segregation energy spectra are included in a CSV file. Each GB is identified by a computeID that is listed in the CSV file. The crystallographic character and selected properties for each GB, as well as its structure, are available in another dataset at https://doi.org/10.17632/4ykjz4ngwt, and which is described in an article at https://doi.org/10.1016/j.actamat.2022.118006. Note that the A README file provides a description of the columns of the CSV file. 
    more » « less
  4. Abstract. Recent observations of near-surface soil temperatures over the circumpolarArctic show accelerated warming of permafrost-affected soils. Theavailability of a comprehensive near-surface permafrost and active layerdataset is critical to better understanding climate impacts and toconstraining permafrost thermal conditions and its spatial distribution inland system models. We compiled a soil temperature dataset from 72 monitoringstations in Alaska using data collected by the U.S. Geological Survey, theNational Park Service, and the University of Alaska Fairbanks permafrostmonitoring networks. The array of monitoring stations spans a large range oflatitudes from 60.9 to 71.3N and elevations from near sea level to∼1300m, comprising tundra and boreal forest regions. This datasetconsists of monthly ground temperatures at depths up to 1m,volumetric soil water content, snow depth, and air temperature during1997–2016. These data have been quality controlled in collection andprocessing. Meanwhile, we implemented data harmonization evaluation for theprocessed dataset. The final product (PF-AK, v0.1) is available at the ArcticData Center (https://doi.org/10.18739/A2KG55). 
    more » « less
  5. The LAGOS-US LIMNO data package is one of the core data modules of LAGOS-US, an extensible research-ready platform designed to study the 479,950 lakes and reservoirs larger than or equal to 1 ha in the conterminous US (48 states plus the District of Columbia). The LIMNO module contains in situ observations of 47 parameters of lake physics, chemistry, and biology (hereafter referred to as chemistry) from lake surface samples (defined as observations taken from the epilimnion of a lake) obtained from the Water Quality Portal, the National Lakes Assessment (2007, 2012, 2017), and NEON programs. LIMNO provides 3,511,020 observations across all parameters collected between 1975 and 2021 from 20,329 lakes; the number of observations per lake ranged from 1 to 20,605 with a median of 32. The database design that supports the LAGOS-US research platform was created based on several important design features: lakes are the fundamental unit of consideration, all lakes in the spatial extent above the minimum size must be represented, and most information is connected to individual lakes. The design is modular, interoperable (the modules can be used with each other, as well as other comprehensive lake data products such as the USGS NHD), and extensible (future database modules can be developed and used in the LAGOS-US research platform by others). Users are encouraged to use the other two core data modules that are part of the LAGOS-US platform: LOCUS (location, identifiers, and physical characteristics of lakes and their watersheds) and GEO (characteristics defining geospatial and temporal ecological setting quantified at multiple spatial divisions) that are each found in their own data packages. 
    more » « less