Abstract. Water quality in lakes is an emergent property of complex biotic and abiotic processes that differ across spatial and temporal scales. Water quality is also a determinant of ecosystem services that lakes provide and is thus of great interest to ecologists. Machine learning and other computer science techniques are increasingly being used to predict water quality dynamics as well as to gain a greater understanding of water quality patterns and controls. To benefit the sciences of both ecology and computer science, we have created a benchmark dataset of lake water quality time series and vertical profiles. LakeBeD-US contains over 500 million unique observations of lake water quality collected by multiple long-term monitoring programs across 17 water quality variables from 21 lakes in the United States. There are two published versions of LakeBeD-US: the “Ecology Edition” published in the Environmental Data Initiative repository (https://doi.org/10.6073/pasta/c56a204a65483790f6277de4896d7140, McAfee et al., 2024) and the “Computer Science Edition” published in the Hugging Face repository (https://doi.org/10.57967/hf/3771, Pradhan et al., 2024). Each edition is formatted in a manner conducive to inquiries and analyses specific to each domain. For ecologists, LakeBeD-US: Ecology Edition provides an opportunity to study the spatial and temporal dynamics of several lakes with varying water quality, ecosystem, and landscape characteristics. For computer scientists, LakeBeD-US: Computer Science Edition acts as a benchmark dataset that enables the advancement of machine learning for water quality prediction.
more »
« less
LakeBeD-US: Ecology Edition - a benchmark dataset of lake water quality time series and vertical profiles
LakeBeD-US: Ecology Edition is a harmonized lake water quality dataset containing time series and vertical profiles of 21 lakes in the United States monitored by long-term monitoring institutions. These institutions include the North Temperate Lakes Long-Term Ecological Research program (NTL-LTER), Niwot Ridge Long-Term Ecological Research program (NWT-LTER), National Ecological Observatory Network (NEON), and the Carey Lab at Virginia Tech as part of the Virginia Reservoirs Long-Term Research in Environmental Biology (LTREB) site in collaboration with the Western Virginia Water Authority. The data include depth-discrete observations of 17 water quality variables including temperature, dissolved oxygen, chemical properties, Secchi depth, and more. Observations are divided into data collected by automated sensors at a relatively high temporal frequency and manually sampled data at a relatively low temporal frequency. All data were collected in situ. The data are available as Apache Parquet files, and the included R scripts give guidance on how to utilize and query the dataset in R. LakeBeD-US: Ecology Edition is an ecological science-oriented companion to LakeBeD-US: Computer Science Edition. The Computer Science Edition is available on the Hugging Face Hub.
more »
« less
- PAR ID:
- 10571866
- Publisher / Repository:
- Environmental Data Initiative
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Two programs that provide high-quality long-term ecological data, the Environmental Data Initiative (EDI) and the National Ecological Observatory Network (NEON), have recently teamed up with data users interested in synthesizing biodiversity data, such as ecological synthesis working groups supported by the US Long Term Ecological Research (LTER) Network Office, to make their data more Findable, Interoperable, Accessible, and Reusable (FAIR). To this end: we have developed a flexible intermediate data design pattern for ecological community data (L1 formatted data in Fig. 1, see Fig. 2 for design details) called "ecocomDP" (O'Brien et al. 2021), and we provide tools to work with data packages in which this design pattern has been implemented. we have developed a flexible intermediate data design pattern for ecological community data (L1 formatted data in Fig. 1, see Fig. 2 for design details) called "ecocomDP" (O'Brien et al. 2021), and we provide tools to work with data packages in which this design pattern has been implemented. The ecocomDP format provides a data pattern commonly used for reporting community level data, such as repeated observations of species-level measures of biomass, abundance, percent cover, or density across multiple locations. The ecocomDP library for R includes tools to search for data packages, download or import data packages into an R (programming language) session in a standard format, and visualization tools for data exploration steps that are recommended for data users prior to any cross-study synthesis work. To date, EDI has created 70 ecocomDP data packages derived from their holdings, which include data from the US Long Term Ecological Research (US LTER) program, Long Term Research in Environmental Biology (LTREB) program, and other projects, which are now discoverable and accessible using the ecocomDP library. Similarly, NEON data products for 12 taxonomic groups are discoverable using the ecocomDP search tool. Input from data users provided guidance for the ecocomDP developers in mapping the NEON data products to the ecocomDP format to facilitate interoperability with the ecocomDP data packages available from the EDI repository. The standardized data design pattern allows common data visualizations across data packages, and has the potential to facilitate the development of new tools and workflows for biodiversity synthesis. The broader impacts of this collaboration are intended to lower the barriers for researchers in ecology and the environmental sciences to access and work with long-term biodiversity data and provide a hub around which data providers and data users can develop best practices that will build a diverse and inclusive community of practice.more » « less
-
Abstract The scale of ecological research is getting larger and larger. At such scales, collaboration is indispensable, but there is little consensus on what factors enable collaboration. In the present article, we investigated the temporal and spatial pattern of institutional collaboration within the US Long Term Ecological Research (LTER) Network on the basis of the bibliographic database. Social network analysis and the Monte Carlo method were applied to identify the characteristics of papers published by LTER researchers within a baseline of papers from 158 leading ecological journals. Long-term and long-distance collaboration were more frequent in the LTER Network, and we investigate and discuss the underlying mechanisms. We suggest that the maturing infrastructure and environment for collaboration within the LTER Network could encourage scientists to make large-scale hypotheses and to ask big questions in ecology.more » « less
-
Abstract The North Temperate Lakes Long-Term Ecological Research (NTL-LTER) program has been extensively used to improve understanding of how aquatic ecosystems respond to environmental stressors, climate fluctuations, and human activities. Here, we report on the metagenomes of samples collected between 2000 and 2019 from Lake Mendota, a freshwater eutrophic lake within the NTL-LTER site. We utilized the distributed metagenome assembler MetaHipMer to coassemble over 10 terabases (Tbp) of data from 471 individual Illumina-sequenced metagenomes. A total of 95,523,664 contigs were assembled and binned to generate 1,894 non-redundant metagenome-assembled genomes (MAGs) with ≥50% completeness and ≤10% contamination. Phylogenomic analysis revealed that the MAGs were nearly exclusively bacterial, dominated by Pseudomonadota (Proteobacteria, N = 623) and Bacteroidota (N = 321). Nine eukaryotic MAGs were identified by eukCC with six assigned to the phylum Chlorophyta. Additionally, 6,350 high-quality viral sequences were identified by geNomad with the majority classified in the phylum Uroviricota. This expansive coassembled metagenomic dataset provides an unprecedented foundation to advance understanding of microbial communities in freshwater ecosystems and explore temporal ecosystem dynamics.more » « less
-
Scientific outreach to K12 education typically centers around the direct dissemination of scientific findings or by engaging students in citizen science data collection. Rather than viewing science outreach purely through the lens of knowledge transmission or through the lens of specific data collection practices, we present a view of science outreach as a bridge to bring K12 students into ecologists’ communities of practice. We exemplify this outreach model using the Luquillo Long-Term Ecological Research (LTER) Schoolyard program as an example. The schoolyard program brings middle-school and high-school students into the Luquillo LTER community of practice through authentic scientific inquiry with long-term ecological data. Long-term data provides an essential means for students to investigate large-scale, long-term phenomena and develop essential data science skills.more » « less
An official website of the United States government
