skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on August 28, 2026

Title: ContaminOSO: Ontological Foundations and Design Choices for an Ontology for Environmental Contamination Data
Contamination by heavy metals, per- and polyfluoroalkyl substances (PFAS), and other emerging pollutants poses serious risks to environmental and human health. Effective monitoring and tracing require integrating data from diverse sources. A knowledge graph approach enables semantic integration, but relies on an ontology that supports intuitive and robust querying and reasoning. To address this, we present the Contaminant Observations and Samples Ontology (ContaminOSO), a framework for semantically enriching environmental contaminant data. Built on SOSA and QUDT ontologies, ContaminOSO introduces key extensions to meet contamination-specific needs and real-world data challenges. This paper highlights four of its core design solutions: (1) extending SOSA to model multiple features of interest; (2) using QUDT to standardize the representation of contaminants and observed properties; (3) developing a detailed and nuanced pattern for measurement result representation using QUDT and STAD; and (4) adopting a pragmatic approach for connecting to existing taxonomies from the OBO Foundry, such as the NCBI organismal classification and relevant subsets of the Food Ontology (FoodOn), for classifying samples.  more » « less
Award ID(s):
2333782
PAR ID:
10640765
Author(s) / Creator(s):
; ; ;
Editor(s):
Prince_Sales, Tiago; Masolo, Claudio; Keet, Maria
Publisher / Repository:
IOS Press
Date Published:
Edition / Version:
1
Page Range / eLocation ID:
284-298
Subject(s) / Keyword(s):
contamination ontology, measurements, SOSA, QUDT, knowledge graph, environmental contaminants, PFAS
Format(s):
Medium: X Size: 8MB Other: application/pdf
Size(s):
8MB
Sponsoring Org:
National Science Foundation
More Like this
  1. Catherine Murphy, University of (Ed.)
    Heavy metal contamination due to industrial and agricultural waste represents a growing threat to water supplies. Frequent and widespread monitoring for toxic metals in drinking and agricultural water sources is necessary to prevent their accumulation in humans, plants, and animals, which results in disease and environmental damage. Here, the metabolic stress response of bacteria is used to report the presence of heavy metal ions in water by transducing ions into chemical signals that can be fingerprinted using machine learning analysis of vibrational spectra. Surface-enhanced Raman scattering surfaces amplify chemical signals from bacterial lysate and rapidly generate large, reproducible datasets needed for machine learning algorithms to decode the complex spectral data. Classification and regression algorithms achieve limits of detection of 0.5 pM for As3+ and 6.8 pM for Cr6+, 100,000 times lower than the World Health Organization recommended limits, and accurately quantify concentrations of analytes across six orders of magnitude, enabling early warning of rising contaminant levels. Trained algorithms are generalizable across water samples with different impurities; water quality of tap water and wastewater was evaluated with 92% accuracy. 
    more » « less
  2. Heavy metal contamination due to industrial and agricultural waste represents a growing threat to water supplies. Frequent and widespread monitoring for toxic metals in drinking and agricultural water sources is necessary to prevent their accumulation in humans, plants, and animals, which results in disease and environmental damage. Here, the metabolic stress response of bacteria is used to report the presence of heavy metal ions in water by transducing ions into chemical signals that can be fingerprinted using machine learning analysis of vibrational spectra. Surface-enhanced Raman scattering surfaces amplify chemical signals from bacterial lysate and rapidly generate large, reproducible datasets needed for machine learning algorithms to decode the complex spectral data. Classification and regression algorithms achieve limits of detection of 0.5 pM for As 3+ and 6.8 pM for Cr 6+ , 100,000 times lower than the World Health Organization recommended limits, and accurately quantify concentrations of analytes across six orders of magnitude, enabling early warning of rising contaminant levels. Trained algorithms are generalizable across water samples with different impurities; water quality of tap water and wastewater was evaluated with 92% accuracy. 
    more » « less
  3. Abstract Environmental contamination is one of the major drivers of ecosystem change in the Anthropocene. Toxic chemicals are not constrained to their source of origin as they cross ecosystem boundaries via biotic (e.g., animal migration) and abiotic (e.g., water flow) vectors. Meta‐ecology has led to important insights on how spatial flows or subsidies of matter across ecosystem boundaries can have broad impacts on local and regional ecosystem dynamics but has not yet addressed the dynamics of pollutants in recipient ecosystems. Incorporating meta‐ecosystem processes (i.e., flux of materials across ecosystem boundaries) into contaminant dynamics can elucidate how contaminants may reverberate among local food chains. Here, we derive a modeling framework to predict how spatial ecosystem fluxes can influence contaminant dynamics and how this influence is dependent on the type of ecosystem flux (e.g., herbivore movement vs. abiotic chemical flows). We mix an analytical and numerical approach to analyze our integrative model which couples two subcomponents that have previously been studied independently of each other—an ecosystem model and a contaminant model. We observe an array of dynamics for how chemical concentrations change with increasing nutrient input and loss rate across trophic levels. When we tailor our range of chemical parameter values (e.g., environmental uptake of contaminant and assimilation efficiency of the contaminant) to specific organic chemicals, our results demonstrate that increasing nutrient input rates can lead to trophic dilution in pollutants such as polychlorinated biphenyls across trophic levels. However, increasing nutrient loss rate causes an increase in the concentrations of chemicals across all trophic levels. A sensitivity analysis demonstrates that nutrient recycling is an important ecosystem process impacting contaminant concentrations, generating predictions to be addressed by future empirical studies. Importantly, our model demonstrates the utility of our framework for identifying drivers of contaminant dynamics in connected ecosystems including the importance that (1) ecosystem processes and (2) movement, especially movement of lower trophic levels, have on contaminant concentrations. 
    more » « less
  4. Introduction:Detecting water contamination in community housing is crucial for protecting public health. Early detection enables timely action to prevent waterborne diseases and ensures equitable access to safe drinking water. Traditional methods recommended by the Environmental Protection Agency (EPA) rely on collecting water samples and conducting lab tests, which can be both time-consuming and costly. Methods:To address these limitations, this study introduces a Graph Attention Network (GAT) to predict lead contamination in drinking water. The GAT model leverages publicly available municipal records and housing information to model interactions between homes and identify contamination patterns. Each house is represented as a node, and relationships between nodes are analyzed to provide a clearer understanding of contamination risks within the community. Results:Using data from Flint, Michigan, the model demonstrated higher performance compared to traditional methods. Specifically, the GAT achieved an accuracy of 0.80, precision of 0.71, and recall of 0.93, outperforming XGBoost, a classical machine learning algorithm, which had an accuracy of 0.70, precision of 0.66, and recall of 0.67. Discussion:In addition to its predictive capabilities, the GAT model identifies key factors contributing to lead contamination, enabling more precise targeting of at-risk areas. This approach offers a practical tool for policymakers and public health officials to assess and mitigate contamination risks, ultimately improving community health and safety. 
    more » « less
  5. The Upper Clark Fork River (UCFR) Long Term Research in Environmental Biology (LTREB) umbrella monitoring project generating these data is conducted separately and complementarily to the 200-million-dollar (USD) superfund project for ecological restoration of the UCFR, associated tributaries, and head water streams including Silver Bow and Warm Springs Creeks. Restoration along the UCFR in western Montana includes removal of metal-laden floodplain soils, lowering of the floodplain to its original elevation, and re-vegetation of over 70 km of the river’s floodplain closest to contaminant sources. The UCFR LTREB project includes bi-weekly water quality monitoring across the first 200 km of the river and its major tributaries along a gradient of heavy metal contamination associated with historic mining. Monitoring includes inorganic phosphorus and nitrogen concentrations, biotic standing stocks, and dissolved and whole-water heavy metal concentrations. The monitoring program began in 2017 with funding extended through 2028. The original analytical intent for these data was to assess the response of river dissolved organic carbon to the floodplain restoration. Data are primarily Aurora Total Organic Carbon combustion analyses of the concentration of organic carbon dissolved in filtered samples of well-mixed river thalweg water. A few samples from the final campaign in the dataset were analyzed with a Shimadzu instrument using a similar method. Data are from the 2022 water year (1 Oct 2021 to 30 Sep 2022) from samples collected on the Upper Clark Fork River (USGS HUC 17010201) at project sites distributed along the river from the vicinity of Anaconda to Missoula, Montana, USA. 
    more » « less