skip to main content


Title: Modifier Ontologies for frequency, certainty, degree, and coverage phenotype modifier
Background: When phenotypic characters are described in the literature, they may be constrained or clarified with additional information such as the location or degree of expression, these terms are called “modifiers”. With effort underway to convert narrative character descriptions to computable data, ontologies for such modifiers are needed. Such ontologies can also be used to guide term usage in future publications. Spatial and method modifiers are the subjects of ontologies that already have been developed or are under development. In this work, frequency (e.g., rarely, usually), certainty (e.g., probably, definitely), degree (e.g., slightly, extremely), and coverage modifiers (e.g., sparsely, entirely) are collected, reviewed, and used to create two modifier ontologies with different design considerations. The basic goal is to express the sequential relationships within a type of modifiers, for example, usually is more frequent than rarely, in order to allow data annotated with ontology terms to be classified accordingly. Method: Two designs are proposed for the ontology, both using the list pattern: a closed ordered list (i.e., five-bin design) and an open ordered list design. The five-bin design puts the modifier terms into a set of 5 fixed bins with interval object properties, for example, one_level_more/less_frequently_than, where new terms can only be added as synonyms to existing classes. The open list approach starts with 5 bins, but supports the extensibility of the list via ordinal properties, for example, more/less_frequently_than, allowing new terms to be inserted as a new class anywhere in the list. The consequences of the different design decisions are discussed in the paper. CharaParser was used to extract modifiers from plant, ant, and other taxonomic descriptions. After a manual screening, 130 modifier words were selected as the candidate terms for the modifier ontologies. Four curators/experts (three biologists and one information scientist specialized in biosemantics) reviewed and categorized the terms into 20 bins using the Ontology Term Organizer (OTO) (http://biosemantics.arizona.edu/OTO). Inter-curator variations were reviewed and expressed in the final ontologies. Results: Frequency, certainty, degree, and coverage terms with complete agreement among all curators were used as class labels or exact synonyms. Terms with different interpretations were either excluded or included using “broader synonym” or “not recommended” annotation properties. These annotations explicitly allow for the user to be aware of the semantic ambiguity associated with the terms and whether they should be used with caution or avoided. Expert categorization results showed that 16 out of 20 bins contained terms with full agreements, suggesting differentiating the modifiers into 5 levels/bins balances the need to differentiate modifiers and the need for the ontology to reflect user consensus. Two ontologies, developed using the Protege ontology editor, are made available as OWL files and can be downloaded from https://github.com/biosemantics/ontologies. Contribution: We built the first two modifier ontologies following a consensus-based approach with terms commonly used in taxonomic literature. The five-bin ontology has been used in the Explorer of Taxon Concepts web toolkit to compute the similarity between characters extracted from literature to facilitate taxon concepts alignments. The two ontologies will also be used in an ontology-informed authoring tool for taxonomists to facilitate consistency in modifier term usage.  more » « less
Award ID(s):
1661485
NSF-PAR ID:
10104346
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
Biodiversity Data Journal
Volume:
6
ISSN:
1314-2836
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. It takes great effort to manually or semi-automatically convert free-text phenotype narratives (e.g., morphological descriptions in taxonomic works) to a computable format before they can be used in large-scale analyses. We argue that neither a manual curation approach nor an information extraction approach based on machine learning is a sustainable solution to produce computable phenotypic data that are FAIR (Findable, Accessible, Interoperable, Reusable) (Wilkinson et al. 2016). This is because these approaches do not scale to all biodiversity, and they do not stop the publication of free-text phenotypes that would need post-publication curation. In addition, both manual and machine learning approaches face great challenges: the problem of inter-curator variation (curators interpret/convert a phenotype differently from each other) in manual curation, and keywords to ontology concept translation in automated information extraction, make it difficult for either approach to produce data that are truly FAIR. Our empirical studies show that inter-curator variation in translating phenotype characters to Entity-Quality statements (Mabee et al. 2007) is as high as 40% even within a single project. With this level of variation, curated data integrated from multiple curation projects may still not be FAIR. The key causes of this variation have been identified as semantic vagueness in original phenotype descriptions and difficulties in using standardized vocabularies (ontologies). We argue that the authors describing characters are the key to the solution. Given the right tools and appropriate attribution, the authors should be in charge of developing a project's semantics and ontology. This will speed up ontology development and improve the semantic clarity of the descriptions from the moment of publication. In this presentation, we will introduce the Platform for Author-Driven Computable Data and Ontology Production for Taxonomists, which consists of three components: a web-based, ontology-aware software application called 'Character Recorder,' which features a spreadsheet as the data entry platform and provides authors with the flexibility of using their preferred terminology in recording characters for a set of specimens (this application also facilitates semantic clarity and consistency across species descriptions); a set of services that produce RDF graph data, collects terms added by authors, detects potential conflicts between terms, dispatches conflicts to the third component and updates the ontology with resolutions; and an Android mobile application, 'Conflict Resolver,' which displays ontological conflicts and accepts solutions proposed by multiple experts. a web-based, ontology-aware software application called 'Character Recorder,' which features a spreadsheet as the data entry platform and provides authors with the flexibility of using their preferred terminology in recording characters for a set of specimens (this application also facilitates semantic clarity and consistency across species descriptions); a set of services that produce RDF graph data, collects terms added by authors, detects potential conflicts between terms, dispatches conflicts to the third component and updates the ontology with resolutions; and an Android mobile application, 'Conflict Resolver,' which displays ontological conflicts and accepts solutions proposed by multiple experts. Fig. 1 shows the system diagram of the platform. The presentation will consist of: a report on the findings from a recent survey of 90+ participants on the need for a tool like Character Recorder; a methods section that describes how we provide semantics to an existing vocabulary of quantitative characters through a set of properties that explain where and how a measurement (e.g., length of perigynium beak) is taken. We also report on how a custom color palette of RGB values obtained from real specimens or high-quality specimen images, can be used to help authors choose standardized color descriptions for plant specimens; and a software demonstration, where we show how Character Recorder and Conflict Resolver can work together to construct both human-readable descriptions and RDF graphs using morphological data derived from species in the plant genus Carex (sedges). The key difference of this system from other ontology-aware systems is that authors can directly add needed terms to the ontology as they wish and can update their data according to ontology updates. a report on the findings from a recent survey of 90+ participants on the need for a tool like Character Recorder; a methods section that describes how we provide semantics to an existing vocabulary of quantitative characters through a set of properties that explain where and how a measurement (e.g., length of perigynium beak) is taken. We also report on how a custom color palette of RGB values obtained from real specimens or high-quality specimen images, can be used to help authors choose standardized color descriptions for plant specimens; and a software demonstration, where we show how Character Recorder and Conflict Resolver can work together to construct both human-readable descriptions and RDF graphs using morphological data derived from species in the plant genus Carex (sedges). The key difference of this system from other ontology-aware systems is that authors can directly add needed terms to the ontology as they wish and can update their data according to ontology updates. The software modules currently incorporated in Character Recorder and Conflict Resolver have undergone formal usability studies. We are actively recruiting Carex experts to participate in a 3-day usability study of the entire system of the Platform for Author-Driven Computable Data and Ontology Production for Taxonomists. Participants will use the platform to record 100 characters about one Carex species. In addition to usability data, we will collect the terms that participants submit to the underlying ontology and the data related to conflict resolution. Such data allow us to examine the types and the quantities of logical conflicts that may result from the terms added by the users and to use Discrete Event Simulation models to understand if and how term additions and conflict resolutions converge. We look forward to a discussion on how the tools (Character Recorder is online at http://shark.sbs.arizona.edu/chrecorder/public) described in our presentation can contribute to producing and publishing FAIR data in taxonomic studies. 
    more » « less
  2. Taxonomic treatments start with the creation of taxon-by-character matrices. Systematics authors recognized data ambiguity issues in published phenotypic characters and are willing to adopt an ontology-aware authoring tool (Cui et al. 2022). To promote interoperable and reusable taxonomic treatments, we have developed two research prototypes: a web-based application, Character Recorder (http://chrecorder.lusites.xyz/login), to faciliate the use and addition of ontology terms by Carex systematist authors while building their matrices, and a mobile application, Conflict Resolver (Android, https://tinyurl.com/5cfatrz8), to identify potential conflicts among the terms added by the authors and facilitate the resolution of the conflicts. We have completed two usability studies on Character Recorder. a web-based application, Character Recorder (http://chrecorder.lusites.xyz/login), to faciliate the use and addition of ontology terms by Carex systematist authors while building their matrices, and a mobile application, Conflict Resolver (Android, https://tinyurl.com/5cfatrz8), to identify potential conflicts among the terms added by the authors and facilitate the resolution of the conflicts. We have completed two usability studies on Character Recorder. In the one-hour Student Usabiilty Study, 16 third-year biology students with a general introduction to Carex used Character Recorder and Excel to record a set of 11 given characters for two samples (shape of sheath summits = U-shaped/U shaped). In the three-day Expert Usability Study, 7 established Carex systematists and 1 graduate student with expert-level knowledge used Character Recorder to record characters for 1 sample each of Carex canesens and Carex rostrata as they would in their professional life, using real mounted specimens, microscope, reticles, and rulers. Experts activities were not timed but they spent roughly 1.5 days on recording the characters and the rest of time discussing features and improvements. Features of Character Recorder have been reported in 2021 TDWG meeting and we included here only a few figures to highlight its interoperability and reusability features at the time of the usability studies (Fig. 1, Fig. 2, and Fig. 3). The Carex Ontology accompanying Character Recorder was created by extracting terms from Carex treatments of Flora of China and Flora of North America using Explorer of Taxon Concept (Cui et al. 2016) with subsequent manual edits. The design principle of Character Recorder is to encourage standardization and also leave the authors the freedom to do their work. While it took students an average of 6 minutes to recover all the given characters using Microsoft® Excel®, as opposed to 11 minutes using Character Recorder, the total number of unique meaning-bearing words used in their characters was 116 with Excel versus 30 with Character Recorder, showing the power of the latter in reducing synonyms and spelling variations. All students reported that they learned to use Character Recorder quickly and some even thought their use was as fast or faster than using Excel. All preferred Character Recorder to Excel for teaching students to record character data. Nearly all of the students found Character Recorder was more useful for recording clear and consistent data and all students agreed that participating in this study raised their awareness of data variation issues. The expert group consisted of 3, 2, 1, 3 experts in age ranges 20-49, 50-59, 60-69, and >69, respectively. They each recorded over 100 characters for two or more samples. Detailed analysis of their characters is pending, but we have noticed color characters have more variations than other characters (Fig. 4). All experts reported that they learned to use Character Recorder quickly, and 6 out of 8 believed they would not need a tutorial the next time they used it. One out of 8 experts somewhat disliked the feature of reusing others' values ("Use This" in Fig. 2) as it may undermine the objectivity and independence of an author. All experts used Recommended Set of Characters and they liked the term suggestion and illustration features shown in Figs 2, 3. All experts would recommend that their colleagues try Character Recorder and recommended that it be further developed and integrated into every taxonomist's toolbox. Student and expert responses to the National Aeronautics and Space Administration Task Load Index (NASA-TLX, Hart and Staveland 1988) are summarized in Fig. 5, which suggests that, while Character Recorder may incur in a slightly higher cost, the performance it supports outweighs its cost, especially for students. Every piece of the software prototypes and associated resources are open for anyone to access or further develop. We thank all student and expert participants and US National Science Foundation for their support in this research. We thank Harris & Harris and Presses de l'Université Laval for the permissions to use their phenotype illustrations in Character Recorder. 
    more » « less
  3. Abstract

    Wild large herbivores are declining worldwide. Despite extensive use of exclosure experiments to investigate herbivore impacts, there is little consensus on the effects of wild large herbivores on ecosystem function.

    Of the ecosystem functions likely impacted, we reviewed the five most‐studied in exclosure experiments: ecosystem resilience/resistance to disturbance, nutrient cycling, carbon cycling, plant regeneration, and primary productivity.

    Experimental data on large wild herbivores' effects on ecosystem functions were predominately derived from temperate grasslands (50% grasslands, 75% temperate zones). Additionally, data were from experiments that may not be of adequate size (median size 400 m2despite excluding all experiments below 25 m2) or duration (median duration 6 years) to capture ecosystem‐scale responses to these low‐density and wide‐ranging taxa.

    Wild herbivore removal frequently impacted ecosystem functions; for example, net carbon uptake increased by three times in some instances. However, the magnitude and direction of effects, even within a single function, were highly variable.

    A focus on carbon cycling highlighted challenges in interpreting effects on a single function. While the effect of large herbivore exclusion on carbon cycling was slightly positive when its components (e.g. pools vs. fluxes of carbon) were aggregated, effects on individual components were variable and sometimes opposed.

    Given modern declines in large wild herbivores, it is critical to understand their effects on ecosystem function. However, this synthesis highlights strong variability in direction, magnitude, and modifiers of these effects. Some variation is likely due to disparity in what components are used to describe a given function. For example, for the carbon cycle we identified eight distinctly meaningful components, which are not easily combined yet are potentially misrepresentative of the larger cycle when considered alone. However, much of the observed difference in responses likely reflects real ecological variability across complex systems.

    To move towards a general predictive framework we must identify where variation in effect is due to methodological differences and where due to ecosystem context. Two critical steps forward are (a) additional quantitative synthetic analyses of large herbivores' effects on individual functions, and (b) improved, increased systematic exclosure research focusing on effects of large herbivores' exclusion on functions.

    A freePlain Language Summarycan be found within the Supporting Information of this article.

     
    more » « less
  4. Abstract The spectacular radiation of insects has produced a stunning diversity of phenotypes. During the past 250 years, research on insect systematics has generated hundreds of terms for naming and comparing them. In its current form, this terminological diversity is presented in natural language and lacks formalization, which prohibits computer-assisted comparison using semantic web technologies. Here we propose a Model for Describing Cuticular Anatomical Structures (MoDCAS) which incorporates structural properties and positional relationships for standardized, consistent, and reproducible descriptions of arthropod phenotypes. We applied the MoDCAS framework in creating the ontology for the Anatomy of the Insect Skeleto-Muscular system (AISM). The AISM is the first general insect ontology that aims to cover all taxa by providing generalized, fully logical, and queryable, definitions for each term. It was built using the Ontology Development Kit (ODK), which maximizes interoperability with Uberon (Uberon multi-species anatomy ontology) and other basic ontologies, enhancing the integration of insect anatomy into the broader biological sciences. A template system for adding new terms, extending, and linking the AISM to additional anatomical, phenotypic, genetic, and chemical ontologies is also introduced. The AISM is proposed as the backbone for taxon-specific insect ontologies and has potential applications spanning systematic biology and biodiversity informatics, allowing users to (1) use controlled vocabularies and create semi-automated computer-parsable insect morphological descriptions; (2) integrate insect morphology into broader fields of research, including ontology-informed phylogenetic methods, logical homology hypothesis testing, evo-devo studies, and genotype to phenotype mapping; and (3) automate the extraction of morphological data from the literature, enabling the generation of large-scale phenomic data, by facilitating the production and testing of informatic tools able to extract, link, annotate, and process morphological data. This descriptive model and its ontological applications will allow for clear and semantically interoperable integration of arthropod phenotypes in biodiversity studies. 
    more » « less
  5. Abstract

    Ontologies are becoming a fundamental technology for analysing phenotypic data. The commonly used Entity–Quality (EQ) provides rich semantics for annotating phenotypes and characters using ontologies. However, EQ syntax might be time inefficient if this granularity is unnecessary for downstream analysis.

    We present an R package ontoFAST that aids fast annotations of characters with biological ontologies. ontoFAST takes a biomedical ontology in OBO format and a list of characters as input, and produces a list of mappings from characters to ontology terms as output.

    The annotations produced by ontoFAST can be exported in CSV format for downstream analysis. Additionally, ontoFAST provides (a) functions for constructing simple queries of characters against ontologies and (b) helper function for exporting and visualizing complex ontological hierarchies and their relationships.

    ontoFAST enhances integration of ontological and phylogenetic methods and supports data interoperability between R applications. Ontology tools are underrepresented in R ecosystem and we hope that ontoFAST will stimulate their further development.

     
    more » « less