skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, June 12 until 2:00 AM ET on Friday, June 13 due to maintenance. We apologize for the inconvenience.


Title: A process approach to quality management doubles NEON sensor data quality
Award ID(s):
1724433
PAR ID:
10376681
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; « less
Date Published:
Journal Name:
Methods in Ecology and Evolution
Volume:
13
Issue:
9
ISSN:
2041-210X
Page Range / eLocation ID:
1849 to 1865
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Sensor‐based, semicontinuous observations of water quality parameters have become critical to understanding how changes in land use, management, and rainfall‐runoff processes impact water quality at diurnal to multidecadal scales. While some commercially available water quality sensors function adequately under a range of turbidity conditions, other instruments, including those used to measure nutrient concentrations, cease to function in high turbidity waters (> 100 nephelometric turbidity units [NTU]) commonly found in large rivers, arid‐land rivers, and coastal areas. This is particularly true during storm events, when increases in turbidity are often concurrent with increases in nutrient transport. Here, we present the development and validation of a system that can affordably provide Self‐Cleaning FiLtrAtion for Water quaLity SenSors (SC‐FLAWLeSS), and enables long‐term, semicontinuous data collection in highly turbid waters. The SC‐FLAWLeSS system features a three‐step filtration process where: (1) a coarse screen at the inlet removes particles with diameter > 397 μm, (2) a settling tank precipitates and then removes particles with diameters between 10 and 397 μm, and (3) a self‐cleaning, low‐cost, hollow fiber membrane technology removes particles ≥ 0.2μm. We tested the SC‐FLAWLeSS system by measuring nitrate sensor data loss during controlled, serial sediment additions in the laboratory and validated it by monitoring soluble phosphate concentrations in the arid Rio Grande river (New Mexico, U.S.A.), at hourly sampling resolution. Our data demonstrate that the system can resolve turbidity‐related interference issues faced by in situ optical and wet chemistry sensors, even at turbidity levels > 10,000 NTU. 
    more » « less
  2. Abstract AimSpecies occurrence data are valuable information that enables one to estimate geographical distributions, characterize niches and their evolution, and guide spatial conservation planning. Rapid increases in species occurrence data stem from increasing digitization and aggregation efforts, and citizen science initiatives. However, persistent quality issues in occurrence data can impact the accuracy of scientific findings, underscoring the importance of filtering erroneous occurrence records in biodiversity analyses. InnovationWe introduce an R package, occTest, that synthesizes a growing open‐source ecosystem of biodiversity cleaning workflows to prepare occurrence data for different modelling applications. It offers a structured set of algorithms to identify potential problems with species occurrence records by employing a hierarchical organization of multiple tests. The workflow has a hierarchical structure organized in testPhases(i.e. cleaning vs. testing)that encompass different testBlocksgrouping differenttestTypes(e.g.environmental outlier detection), which may use differenttestMethods(e.g.Rosner test, jacknife,etc.). Four differenttestBlockscharacterize potential problems in geographic, environmental, human influence and temporal dimensions. Filtering and plotting functions are incorporated to facilitate the interpretation of tests. We provide examples with different data sources, with default and user‐defined parameters. Compared to other available tools and workflows, occTest offers a comprehensive suite of integrated tests, and allows multiple methods associated with each test to explore consensus among data cleaning methods. It uniquely incorporates both coordinate accuracy analysis and environmental analysis of occurrence records. Furthermore, it provides a hierarchical structure to incorporate future tests yet to be developed. Main conclusionsoccTest will help users understand the quality and quantity of data available before the start of data analysis, while also enabling users to filter data using either predefined rules or custom‐built rules. As a result, occTest can better assess each record's appropriateness for its intended application. 
    more » « less
  3. Abstract We report the results of the “UM‐TBM” and “Zheng” groups in CASP15 for protein monomer and complex structure prediction. These prediction sets were obtained using the D‐I‐TASSER and DMFold‐Multimer algorithms, respectively. For monomer structure prediction, D‐I‐TASSER introduced four new features during CASP15: (i) a multiple sequence alignment (MSA) generation protocol that combines multi‐source MSA searching and a structural modeling‐based MSA ranker; (ii) attention‐network based spatial restraints; (iii) a multi‐domain module containing domain partition and arrangement for domain‐level templates and spatial restraints; (iv) an optimized I‐TASSER‐based folding simulation system for full‐length model creation guided by a combination of deep learning restraints, threading alignments, and knowledge‐based potentials. For 47 free modeling targets in CASP15, the final models predicted by D‐I‐TASSER showed average TM‐score 19% higher than the standard AlphaFold2 program. We thus showed that traditional Monte Carlo‐based folding simulations, when appropriately coupled with deep learning algorithms, can generate models with improved accuracy over end‐to‐end deep learning methods alone. For protein complex structure prediction, DMFold‐Multimer generated models by integrating a new MSA generation algorithm (DeepMSA2) with the end‐to‐end modeling module from AlphaFold2‐Multimer. For the 38 complex targets, DMFold‐Multimer generated models with an average TM‐score of 0.83 and Interface Contact Score of 0.60, both significantly higher than those of competing complex prediction tools. Our analyses on complexes highlighted the critical role played by MSA generating, ranking, and pairing in protein complex structure prediction. We also discuss future room for improvement in the areas of viral protein modeling and complex model ranking. 
    more » « less
  4. Abstract Midwestern cities require forecasts of surface nitrate loads to bring additional treatment processes online or activate alternative water supplies. Concurrently, networks of nitrate monitoring stations are being deployed in river basins, co‐locating water quality observations with established stream gauges. However, tools to evaluate the future value of expanded networks to improve water quality forecasts remains challenging. Here, we construct a synthetic data set of stream discharge and nitrate for the Wabash River Basin—one of the United States’ most nutrient polluted basins—using the established Agro‐IBIS and THMB models. Synthetic data enables rapid, unbiased and low‐cost assessment of potential sensor placements to support management objectives, such as near‐term forecasting. Using the synthetic data, we established baseline 1‐day forecasts for surface water nitrate at 12 cities in the basin using support vector machine regression (SVMR; RMSE 0.48–3.3 ppm). Next, we used the SVMRs to evaluate the improvement in forecast performance associated with deployment of additional nitrate sensors. We identified the optimal sensor placement to improve forecasts at each city, and the relative value of sensors at each candidate location. Finally, we assessed the co‐benefit realized by other cities when a sensor is deployed to optimize a forecast at one city, finding significant positive externalities in all cases. Ultimately, our study explores the potential for machine learning to make near‐term predictions and critically evaluate the improvement realized by expanding a monitoring network. While we use nitrate pollution in the Wabash River Basin as a case study, this approach could be readily applied to any problem where the future value of sensors and network design are being evaluated. 
    more » « less