skip to main content


Title: LAGOS-US RESERVOIR: Data module classifying conterminous U.S. lakes 4 hectares and larger as natural lakes or reservoirs
The LAGOS-US RESERVOIR data module (hereafter, RESERVOIR) classifies all 137,465 lakes > 4 hectares in the conterminous U.S. into one of the following three categories using a machine-learning predictive model based on visual interpretation of lake outlines and a classification rule based on lake shape. Natural Lakes (NLs) are defined as lakes that are likely to be entirely or mostly naturally-formed and that do not have large, flow-altering structures on or near them; Reservoir Class A’s (RSVR_A) are defined as lakes that are likely to be either human-made or highly human-altered by the presence of a relatively large water control structure that appears to significantly change the flow of water; and Reservoir Class B’s (RSVR_Bs) are lakes that are likely to be entirely human-made based on isolation from rivers and a highly angular shape that is rarely, if ever, seen in natural lakes also often. We trained the machine learning models on 12,162 manually-classified lakes to assign probabilities of a lake being in 1 of 2 of the categories (NL or RSVR), then we further classified the RSVR classification into either A or B based on NHD Fcodes, isolation, and angularity. The data module includes a detailed User Guide, metadata tables, and a data table that includes information such as location, lake geometry, surface water connectivity class, and official name. Using our definition, our classification indicates that over 46 % of lakes > 4 ha in the conterminous U.S. are reservoir lakes. These data can be combined with other LAGOS-US data modules and U.S. national databases using unique lake identifiers to study both reservoir lakes and natural lakes at broad scales.  more » « less
Award ID(s):
1638554
NSF-PAR ID:
10466415
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Environmental Data Initiative
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The LAGOS‐US RESERVOIR data module classifies all 137,465 lakes ≥ 4 ha in the conterminous U.S. into three categories using a machine learning predictive model based on visual interpretation of lake outlines and a lake shape classification rule. Natural Lakes (NLs) are defined as naturally formed, lacking large, flow‐altering structures; Reservoir Class A's (RSVR_A) are defined as lakes likely human‐made or human‐altered by a large water control structure; and Reservoir Class B's (RSVR_Bs) are lakes likely human‐made but are not connected to streams and have a shape rare in NLs. We trained machine learning models on 12,162 manually classified lakes to predict assignment as an NL or RSVR, then further classified RSVRs based on NHD Fcodes, isolation, and angularity. Our classification indicates that > 46% of lakes ≥ 4 ha in the conterminous U.S. are reservoir lakes. These data can be easily combined with other LAGOS‐US modules and U.S. national databases for the broad‐scale study of reservoir lakes and NLs.

     
    more » « less
  2. The LAGOS-US LAKE DEPTH v1.0 module (hereafter, called DEPTH) contains in situ measurements of lake depth for a subset of all lakes (n = 17,675) in the conterminous U.S. > 1 ha (3.7% of 479,950) that are in the LAGOS-US LOCUS v1.0 data module (Smith et al. 2021). All 17,675 lakes in DEPTH have a maximum depth value and 6,137 lakes have a mean depth. DEPTH includes approximately 65 data sources obtained from community, government, and university monitoring programs, as well as academic reports and commercial websites. DEPTH includes lake identifiers, lake location, lake area, lake depth (both maximum and mean depth when available), source information, and data flags. The unique lake identifier (lagoslakeid) for all lakes is the same one used in LAGOS-US LOCUS v1.0. 
    more » « less
  3. We conducted a macroscale study of 2,210 shallow lakes (mean depth ≤ 3m or a maximum depth ≤ 5m) in the Upper Midwestern and Northeastern U.S. We asked: What are the patterns and drivers of shallow lake total phosphorus (TP), chlorophyll a (CHLa), and TP–CHLa relationships at the macroscale, how do these differ from those for 4,360 non-shallow lakes, and do results differ by hydrologic connectivity class? To answer this question, we assembled the LAGOS-NE Shallow Lakes dataset described herein, a dataset derived from existing LAGOS-NE, LAGOS-DEPTH, and LAGOS-CLIMATE datasets. Response data variables were the median of available summer (e.g., 15 June to 15 September) values of total phosphorus (TP) and chlorophyll a (CHLa). Predictor variables were assembled at two spatial scales for incorporation into hierarchical models. At the local or lake-specific scale (including the individual lake, its inter-lake watershed [iws] or corresponding HU12 watershed), variables included those representing land use/cover, hydrology, climate, morphometry, and acid deposition. At the regional scale (e.g., HU4 watershed), variables included a smaller set of predictor variables for hydrology and land use/cover. The dataset also includes the unique identifier assigned by LAGOS-NE(lagoslakeid); the latitude and longitude of the study lakes; their maximum and mean depths along with a depth classification of Shallow or non-Shallow; connectivity class (i.e., whether a lake was classified as connected (with inlets and outlets) or unconnected (lacking inlets); and the zone id for the HU4 to which each lake belongs. Along with the database, we provide the R scripts for the hierarchical models predicting TP or CHLa (TPorCHL_predictive_model.R), and the TP—CHLa relationship (TP_CHL_CSI_Model.R) for depth and connectivity subsets of the study lakes. 
    more » « less
  4. This data package, LAGOS-US LOCUS v1.0, is one of the core data modules of the LAGOS-US platform that provides an extensible research-ready platform to study the 479,950 lakes and reservoirs larger than or equal to 1 ha in the conterminous US (48 states plus the District of Columbia). This data module contains information on the location, identifiers, and physical characteristics of lakes and their watersheds. The characteristics in this module include: variables that can be obtained from GIS data such as location and geometry; variables that can be derived using GIS processing such as lake watersheds and their geometry, lake glaciation history, and lake connectivity; and commonly used identifiers from GIS and other data products useful for linking with LAGOS-US. LOCUS is based on a snapshot of the high-resolution National Hydrography Dataset product available at the initiation of the project that provided the basis for locating, identifying, and characterizing the geometry of all lakes in LAGOS-US. The database design that supports the LAGOS-US research platform was created based on several important design features. Lakes are the fundamental unit of consideration, all lakes in the spatial extent must be represented (above a minimum size) and most information is connected to individual lakes. The design is modular, interoperable (the modules can be used with each other), and extensible (future database modules can be developed and used in the LAGOS-US research platform by others). Users are encouraged to use the other 2 core data modules that are part of the LAGOS-US platform: GEO (which includes geospatial ecological context at multiple spatial and temporal scales for lakes and their watersheds) and LIMNO (in situ lake surface-water physical, chemical, and biological measurements through time) that are each found in their own data packages. 
    more » « less
  5. The LAGOS-US LIMNO data package is one of the core data modules of LAGOS-US, an extensible research-ready platform designed to study the 479,950 lakes and reservoirs larger than or equal to 1 ha in the conterminous US (48 states plus the District of Columbia). The LIMNO module contains in situ observations of 47 parameters of lake physics, chemistry, and biology (hereafter referred to as chemistry) from lake surface samples (defined as observations taken from the epilimnion of a lake) obtained from the Water Quality Portal, the National Lakes Assessment (2007, 2012, 2017), and NEON programs. LIMNO provides 3,511,020 observations across all parameters collected between 1975 and 2021 from 20,329 lakes; the number of observations per lake ranged from 1 to 20,605 with a median of 32. The database design that supports the LAGOS-US research platform was created based on several important design features: lakes are the fundamental unit of consideration, all lakes in the spatial extent above the minimum size must be represented, and most information is connected to individual lakes. The design is modular, interoperable (the modules can be used with each other, as well as other comprehensive lake data products such as the USGS NHD), and extensible (future database modules can be developed and used in the LAGOS-US research platform by others). Users are encouraged to use the other two core data modules that are part of the LAGOS-US platform: LOCUS (location, identifiers, and physical characteristics of lakes and their watersheds) and GEO (characteristics defining geospatial and temporal ecological setting quantified at multiple spatial divisions) that are each found in their own data packages. 
    more » « less