skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: LakeBeD-US: Ecology Edition - a benchmark dataset of lake water quality time series and vertical profiles
LakeBeD-US: Ecology Edition is a harmonized lake water quality dataset containing time series and vertical profiles of 21 lakes in the United States monitored by long-term monitoring institutions. These institutions include the North Temperate Lakes Long-Term Ecological Research program (NTL-LTER), Niwot Ridge Long-Term Ecological Research program (NWT-LTER), National Ecological Observatory Network (NEON), and the Carey Lab at Virginia Tech as part of the Virginia Reservoirs Long-Term Research in Environmental Biology (LTREB) site in collaboration with the Western Virginia Water Authority. The data include depth-discrete observations of 17 water quality variables including temperature, dissolved oxygen, chemical properties, Secchi depth, and more. Observations are divided into data collected by automated sensors at a relatively high temporal frequency and manually sampled data at a relatively low temporal frequency. All data were collected in situ. The data are available as Apache Parquet files, and the included R scripts give guidance on how to utilize and query the dataset in R. LakeBeD-US: Ecology Edition is an ecological science-oriented companion to LakeBeD-US: Computer Science Edition. The Computer Science Edition is available on the Hugging Face Hub.  more » « less
Award ID(s):
2025982 2217817 2224439
PAR ID:
10571866
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Environmental Data Initiative
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract. Water quality in lakes is an emergent property of complex biotic and abiotic processes that differ across spatial and temporal scales. Water quality is also a determinant of ecosystem services that lakes provide and is thus of great interest to ecologists. Machine learning and other computer science techniques are increasingly being used to predict water quality dynamics as well as to gain a greater understanding of water quality patterns and controls. To benefit the sciences of both ecology and computer science, we have created a benchmark dataset of lake water quality time series and vertical profiles. LakeBeD-US contains over 500 million unique observations of lake water quality collected by multiple long-term monitoring programs across 17 water quality variables from 21 lakes in the United States. There are two published versions of LakeBeD-US: the “Ecology Edition” published in the Environmental Data Initiative repository (https://doi.org/10.6073/pasta/c56a204a65483790f6277de4896d7140, McAfee et al., 2024) and the “Computer Science Edition” published in the Hugging Face repository (https://doi.org/10.57967/hf/3771, Pradhan et al., 2024). Each edition is formatted in a manner conducive to inquiries and analyses specific to each domain. For ecologists, LakeBeD-US: Ecology Edition provides an opportunity to study the spatial and temporal dynamics of several lakes with varying water quality, ecosystem, and landscape characteristics. For computer scientists, LakeBeD-US: Computer Science Edition acts as a benchmark dataset that enables the advancement of machine learning for water quality prediction. 
    more » « less
  2. This dataset contains nitrogen and phosphorus excretion rate, as well as dry biomass, estimates for individual vertebrate and invertebrate animals in marine and estuarine environments. This dataset is a product of an LTER Synthesis Working Group aimed at evaluating the spatiotemporal variability in consumer nutrient dynamics in the wake of global change across eight long-term ecological research projects. These projects include seven long-term ecological research programs (LTER) funded by the National Science Foundation: (1) California Current Ecosystem, (2) Florida Coastal Everglades, (3) Moorea Coral Reef, (4) Northern Gulf of Alaska, (5) Plum Island Ecosystems, (6) Santa Barbara Coastal, and (7) Virginia Coast Reserve LTER projects. Additionally, the dataset includes data from (8) The Partnership for Interdisciplinary Science of Coastal Oceans (PISCO) research program. The temporal coverage of each time series data varies among projects, with the earliest record in 1997 and the most recent in 2023. This data package also includes two folders of R scripts used for data harmonization, identical to those in the LTER Synthesis Working Group: Consumer-Mediated Nutrient Dynamics Project, v2.0.0. You can find the release in GitHub here: https://github.com/lter/lterwg-marine-cnd/releases/tag/v2.0.0 
    more » « less
  3. Urban lakes are heavily impacted by human activities and climate variability, and they provide many ecosystem services to residents. The MSP LTER program is studying long term changes in urban lake water quality, ecology and management as part of our long term studies of urban environments. The goal of this dataset is to understand how land-use change, management, and climate have impacted urban lake biogeochemistry over time. This dataset includes parameters characterizing the long term (> 5 years) surface water quality and chemistry of 294 lakes and ponds in the Minneapolis-Saint Paul Seven County Metropolitan Area, Minnesota, USA. The dataset draws from data publicly available through the Minnesota Pollution Control Agency and data provided by individual agencies, park districts and cities. The dataset is distinct from other lake datasets because it is curated to only report a single value per lake x date x parameter, minimizing the amount of data manipulation needed before use in statistical analyses. All data come from the top two meters of the water column. In the case of multiple spatial measurements on a single lake or multiple agencies sampling the same lake on the same day, chemistry data were averaged to generate a single value. For Secchi data, the deepest reported observation on a given lake x date was used. Parameters: total phosphorus, total nitrogen, total Kjeldahl nitrogen, nitrate, nitrite, nitrate + nitrite (NOx), ammonium, chlorophyll a (corrected and not corrected for pheophytin), specific conductivity, chloride, and Secchi depth. These waterbodies are identified by their DNR Division of Water (DOW) number with minor alterations for subbasin identification. This dataset does not comprehensively represent all lentic waterbodies that have substantial water quality data in the metro area, and some included waterbodies may be considered wetlands according to state classifications. The data brought together in this database has undergone QAQC by the organizations that originally collected it, as well as a screening process during data harmonization. While we believe that the resulting dataset is robust, we cannot guarantee that it is free of errors or inaccuracies. 
    more » « less
  4. Two programs that provide high-quality long-term ecological data, the Environmental Data Initiative (EDI) and the National Ecological Observatory Network (NEON), have recently teamed up with data users interested in synthesizing biodiversity data, such as ecological synthesis working groups supported by the US Long Term Ecological Research (LTER) Network Office, to make their data more Findable, Interoperable, Accessible, and Reusable (FAIR). To this end: we have developed a flexible intermediate data design pattern for ecological community data (L1 formatted data in Fig. 1, see Fig. 2 for design details) called "ecocomDP" (O'Brien et al. 2021), and we provide tools to work with data packages in which this design pattern has been implemented. we have developed a flexible intermediate data design pattern for ecological community data (L1 formatted data in Fig. 1, see Fig. 2 for design details) called "ecocomDP" (O'Brien et al. 2021), and we provide tools to work with data packages in which this design pattern has been implemented. The ecocomDP format provides a data pattern commonly used for reporting community level data, such as repeated observations of species-level measures of biomass, abundance, percent cover, or density across multiple locations. The ecocomDP library for R includes tools to search for data packages, download or import data packages into an R (programming language) session in a standard format, and visualization tools for data exploration steps that are recommended for data users prior to any cross-study synthesis work. To date, EDI has created 70 ecocomDP data packages derived from their holdings, which include data from the US Long Term Ecological Research (US LTER) program, Long Term Research in Environmental Biology (LTREB) program, and other projects, which are now discoverable and accessible using the ecocomDP library. Similarly, NEON data products for 12 taxonomic groups are discoverable using the ecocomDP search tool. Input from data users provided guidance for the ecocomDP developers in mapping the NEON data products to the ecocomDP format to facilitate interoperability with the ecocomDP data packages available from the EDI repository. The standardized data design pattern allows common data visualizations across data packages, and has the potential to facilitate the development of new tools and workflows for biodiversity synthesis. The broader impacts of this collaboration are intended to lower the barriers for researchers in ecology and the environmental sciences to access and work with long-term biodiversity data and provide a hub around which data providers and data users can develop best practices that will build a diverse and inclusive community of practice. 
    more » « less
  5. Abstract The scale of ecological research is getting larger and larger. At such scales, collaboration is indispensable, but there is little consensus on what factors enable collaboration. In the present article, we investigated the temporal and spatial pattern of institutional collaboration within the US Long Term Ecological Research (LTER) Network on the basis of the bibliographic database. Social network analysis and the Monte Carlo method were applied to identify the characteristics of papers published by LTER researchers within a baseline of papers from 158 leading ecological journals. Long-term and long-distance collaboration were more frequent in the LTER Network, and we investigate and discuss the underlying mechanisms. We suggest that the maturing infrastructure and environment for collaboration within the LTER Network could encourage scientists to make large-scale hypotheses and to ask big questions in ecology. 
    more » « less