skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Using a units ontology to annotate pre-existing metadata
Abstract Automated processing of environmental data is hindered by the wide array of unit representations provided in the metadata of digital datasets. For example, gm/m2, g/m2, gm-2, g/m^2, g.m-2 and gramPerMeterSquared are all representations of a single complex unit that might be human-readable but are not machine-interpretable. Connectingad hocunits to a single unit concept in an ontology permits the identification of datasets sharing units and provides additional information regarding labels, definitions, dimensions and transformations provided in the ontology. Here we use successive string transformations to linkad hocunit representations to units in the QUDT ontology (e.g., unit: GM-PER-M2). Although only 896 of 7,110 distinct units in a corpus of ecological metadata from DataONE, the Environmental Data Initiative and the U.S. National Ecological Observatory Network were matched, 324,811 unit uses (instances) out of 355,057 of total unit uses were successfully mapped to QUDT units (91%). The resulting lookup table was used to enable a web service and R functions for adding annotation elements to Ecological Metadata Language documents.  more » « less
Award ID(s):
2217817 2224545
PAR ID:
10572628
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Data
Volume:
12
Issue:
1
ISSN:
2052-4463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In the metadata of digital environmental datasets, automated processing is hindered by the wide variety of representations for unit that may be human-readable, but may not be unambiguous or machine-interpretable, (e.g., grams per square meter, gm/m2, g/m2, gm-2, g/m^2, g.m-2, g m-2 and gramPerMeterSquared). Matching disparate representations of the same unit into a single unit concept from an ontology assists with interpretation and reuse by providing a linkage to a complete unit definitions with label, description, dimensions. Datasets with shared units can be identified during searches, and are more suitable for automating analyses and potential transformation. This dataset contains data and code associated with a project to map units in ecological metadata collected between 2013 and 2022 by DataONE, the Environmental Data Initiative and the U.S. National Ecological Observatory Network to the QUDT ontology using successive string transformations. Data entities include a) raw metadata as received (355,057 unit instances); b) integrated raw data; c) substitution tables for string transformations; d) resulting lookup table for 896 distinct units matched to QUDT units; e) associated R code used for QUDT matching plus a web service and R functions for adding annotation elements to Ecological Metadata Language metadata documents. Using these substitutions and code, 91% of unit instances in the raw metadata could be matched to QUDT. Data and results are discussed in “Porter JH, M O’Brien, M Frants, S Earl, M Martin, C Laney. (in review) Using a Units Ontology to Annotate Pre-Existing Metadata. Submitted to Scientific Data. 
    more » « less
  2. Above ground plant, belowground stem and root biomass was measured in moist acidic tussock tundra experimental sites established in 2006 by the Arctic Long-term Ecological Research site (ARC-LTER. Control plots and plots amended with three different levels of nitrogen(N) and phosphorus(P), F10 (10 g/m2 N and 5 g/m2 P); F5 (5 g/m2 N and 2.5 g/m2 P); F2 (2 g/m2 N and 1 g/m2 P), were sampled. 
    more » « less
  3. Among the sustainable initiatives for renewable energy technologies, anaerobic digestion (AD) is a potential contender to replace fossil fuels. The anaerobic co-digestions of goat manure (GM) with sorghum (SG), cotton gin trash (CGT), and food waste (FW) having different mixing ratios, volumes, temperatures, and additives were optimized in single and two-stage bioreactors. The biochemical methane potential assays (having different mixing ratios of double and triple substrates) were run in 250 mL serum bottles in triplicates. The best-yielding ratio was up-scaled to fabricated 2 L bioreactors. The biodegradability, biomethane recovery, and process efficacy are discussed. The co-digestion of GM with SG in a 70:30 ratio yielded the highest biomethane of 239.3 ± 15.6 mL/gvs, and it was further up-scaled to a two-stage temperature-phased process supplemented with an anaerobic medium and fly ash (FA) in fabricated 2 L bioreactors. This system yielded the highest biomethane of 266.0 mL/gvs, having an anaerobic biodegradability of 67.3% in 70:30 GM:SG co-digestion supplemented with an anaerobic medium. The BMP of the FA-amended treatment may be lower because of its high Ca concentration of 205.74 ± 3.6. The liquid fraction of the effluents can be applied as N and P fertigation. The Ca concentration was found to be 24.3, 25.1, and 6.3 g/kg in GM and GM:SG (TS) and SG solid fractions, respectively, whereas K was found to be 26.6, 10.8, and 7.4 g/kg. The carbon to nitrogen ratio of solid fraction varied between 2.0 and 24.8 for return to the soils to enhance its quality. This study involving feedstock acquisition, characterization, and their anaerobic digestion optimization provides comprehensive information and may assist small farmers operating on-farm anaerobic digesters. 
    more » « less
  4. Background: Biomarkers for Alzheimer’s disease (AD) are crucial for early diagnosis and treatment monitoring once disease modifying therapies become available. Objective: This study aims to quantify the forward magnetization transfer rate (kfor) map from brain tissue water to macromolecular protons and use it to identify the brain regions with abnormal kfor in AD and AD progression. Methods: From the Cardiovascular Health Study (CHS) cognition study, magnetization transfer imaging (MTI) was acquired at baseline from 63 participants, including 20 normal controls (NC), 18 with mild cognitive impairment (MCI), and 25 AD subjects. Of those, 53 participants completed a follow-up MRI scan and were divided into four groups: 15 stable NC, 12 NC-to-MCI, 12 stable MCI, and 14 MCI/AD-to-AD subjects. kfor maps were compared across NC, MCI, and AD groups at baseline for the cross-sectional study and across four longitudinal groups for the longitudinal study. Results: We found a lower kfor in the frontal gray matter (GM), parietal GM, frontal corona radiata (CR) white matter (WM) tracts, frontal and parietal superior longitudinal fasciculus (SLF) WM tracts in AD relative to both NC and MCI. Further, we observed progressive decreases of kfor in the frontal GM, parietal GM, frontal and parietal CR WM tracts, and parietal SLF WM tracts in stable MCI. In the parietal GM, parietal CR WM tracts, and parietal SLF WM tracts, we found trend differences between MCI/AD-to-AD and stable NC. Conclusion: Forward magnetization transfer rate is a promising biomarker for AD diagnosis and progression. 
    more » « less
  5. Seascape genomics provides a powerful framework to evaluate the presence and strength of environmental pressures on marine organisms, as well as to forecast long term species stability under various perturbations. In the highly productive North Pacific, forage fishes, key trophic links across ecosystems, are also contending with a rapidly warming climate and a litany of associated oceanographic changes (e.g., changes in salinity, dissolved oxygen, pH, primary production, etc.). These changes can place substantial selective pressures on populations over space and time. While several population genomics studies have targeted forage fishes in the North Pacific, none have formally analyzed the interactions between genotype and environment. However, when population genomics studies provide collection location information and other critical data, it is possible to supplement a published genomic dataset with environmental data from existing public databases and perform “post hocseascape genomics” analyses. In reviewing the literature, we find pertinent metadata (dates and locations of sample collection) are rarely provided. We identify specific factors that may impede the application of seascape genomics methods in the North Pacific. Finally, we present an approach for supplementing data in a reproducible way to allow forpost hocseascape genomics analysis, in instances when metadata are reported. Overall, our goal is to demonstrate – via literature review – the utility and importance of seascape genomics to understanding the long term health of forage fish species in the North Pacific. 
    more » « less