Title: Model-Finding for Externally Verifying FOL Ontologies: A Study of Spatial Ontologies
Use and reuse of an ontology requires prior ontology verification which encompasses, at least, proving that the ontology is internally consistent and consistent with representative datasets. First-order logic (FOL) model finders are among the only available tools to aid us in this undertaking, but proving consistency of FOL ontologies is theoretically intractable while also rarely succeeding in practice, with FOL model finders scaling even worse than FOL theorem provers. This issue is further exacerbated when verifying FOL ontologies against datasets, which requires constructing models with larger domain sizes. This paper presents a first systematic study of the general feasibility of SAT-based model finding with FOL ontologies. We use select spatial ontologies and carefully controlled synthetic datasets to identify key measures that determine the size and difficulty of the resulting SAT problems. We experimentally show that these measures are closely correlated with the runtimes of Vampire and Paradox, two state-of-the-art model finders. We propose a definition elimination technique and demonstrate that it can be a highly effective measure for reducing the problem size and improving the runtime and scalability of model finding. more »« less
Hahmann, Torsten; Powell, Robert
(, International Semantic Web Conference (ISWC 2021))
null
(Ed.)
While OWL and RDF are by far the most popular logic-based languages for Semantic Web Ontologies, some well-designed ontologies are only available in languages with a much richer expressivity, such as first-order logic (FOL) or the ISO standard Common Logic. This inhibits reuse of these ontologies by the wider Semantic Web Community. While converting OWL ontologies to FOL is straightforward, the reverse problem of finding the closest OWL approximation of an FOL ontology is undecidable. However, for most practical purposes, a ``good enough'' OWL approximation need not be perfect to enable wider reuse by the Semantic Web Community. This paper outlines such a conversion approach by first normalizing FOL sentences into a function-free prenex conjunctive normal (FF-PCNF) that strips away minor syntactic differences and then applying a pattern-based approach to identify common OWL axioms. It is tested on the over 2,000 FOL ontologies from the Common Logic Ontology Repository.
Van Nguyen, Tran Cao
(, Lecture notes in computer science)
This paper addresses the problem of automatic generation of natural language descriptions for ontology-described artifacts. The original motivation for the work is the challenge of providing textual narratives of automatically generated scientific workflows (e.g., paragraphs that scientists can include in their publications). The paper presents two systems which generate descriptions of sets of atoms derived from a collection of ontologies. The first system, called nlgPhylogeny, demonstrates the feasibility of the task in the Phylotastic project, providing evolutionary biologists with narrative for automatically generated analysis workflows. nlgPhylogeny utilizes the fact that the Grammatical Framework (GF) is suitable for the natural language generation (NLG) task; the paper shows how elements of the ontologies in Phylotastic, such as web services and information artifacts, can be encoded in GF for the NLG task. The second system, called ๐๐๐๐พ๐๐๐๐๐๐๐ข๐ด, is a generalization of nlgPhylogeny. It eliminates the requirement that a GF needs to be defined and proposes the use of annotated ontologies for NLG. Given a set of annotated ontologies, ๐๐๐๐พ๐๐๐๐๐๐๐ข๐ด generates a GF suitable for the creation of natural language descriptions of sets of atoms derived from these ontologies. The paper describes the algorithms used in the development of nlgPhylogeny and ๐๐๐๐พ๐๐๐๐๐๐๐ข๐ด and discusses potential applications of these systems.
Endara, Lorena; Thessen, Anne; Cole, Heather; Walls, Ramona; Gkoutos, Georgios; Cao, Yujie; Chong, Steven; Cui, Hong
(, Biodiversity Data Journal)
Background: When phenotypic characters are described in the literature, they may be constrained or clarified with additional information such as the location or degree of expression, these terms are called โmodifiersโ. With effort underway to convert narrative character descriptions to computable data, ontologies for such modifiers are needed. Such ontologies can also be used to guide term usage in future publications. Spatial and method modifiers are the subjects of ontologies that already have been developed or are under development. In this work, frequency (e.g., rarely, usually), certainty (e.g., probably, definitely), degree (e.g., slightly, extremely), and coverage modifiers (e.g., sparsely, entirely) are collected, reviewed, and used to create two modifier ontologies with different design considerations. The basic goal is to express the sequential relationships within a type of modifiers, for example, usually is more frequent than rarely, in order to allow data annotated with ontology terms to be classified accordingly. Method: Two designs are proposed for the ontology, both using the list pattern: a closed ordered list (i.e., five-bin design) and an open ordered list design. The five-bin design puts the modifier terms into a set of 5 fixed bins with interval object properties, for example, one_level_more/less_frequently_than, where new terms can only be added as synonyms to existing classes. The open list approach starts with 5 bins, but supports the extensibility of the list via ordinal properties, for example, more/less_frequently_than, allowing new terms to be inserted as a new class anywhere in the list. The consequences of the different design decisions are discussed in the paper. CharaParser was used to extract modifiers from plant, ant, and other taxonomic descriptions. After a manual screening, 130 modifier words were selected as the candidate terms for the modifier ontologies. Four curators/experts (three biologists and one information scientist specialized in biosemantics) reviewed and categorized the terms into 20 bins using the Ontology Term Organizer (OTO) (http://biosemantics.arizona.edu/OTO). Inter-curator variations were reviewed and expressed in the final ontologies. Results: Frequency, certainty, degree, and coverage terms with complete agreement among all curators were used as class labels or exact synonyms. Terms with different interpretations were either excluded or included using โbroader synonymโ or โnot recommendedโ annotation properties. These annotations explicitly allow for the user to be aware of the semantic ambiguity associated with the terms and whether they should be used with caution or avoided. Expert categorization results showed that 16 out of 20 bins contained terms with full agreements, suggesting differentiating the modifiers into 5 levels/bins balances the need to differentiate modifiers and the need for the ontology to reflect user consensus. Two ontologies, developed using the Protege ontology editor, are made available as OWL files and can be downloaded from https://github.com/biosemantics/ontologies. Contribution: We built the first two modifier ontologies following a consensus-based approach with terms commonly used in taxonomic literature. The five-bin ontology has been used in the Explorer of Taxon Concepts web toolkit to compute the similarity between characters extracted from literature to facilitate taxon concepts alignments. The two ontologies will also be used in an ontology-informed authoring tool for taxonomists to facilitate consistency in modifier term usage.
Clarke, Jennifer L; Cooper, Laurel D; Poelchau, Monica F; Berardini, Tanya Z; Elser, Justin; Farmer, Andrew D; Ficklin, Stephen; Kumari, Sunita; Laporte, Marie-Angรฉlique; Nelson, Rex T; et al
(, Database)
Abstract Over the last couple of decades, there has been a rapid growth in the number and scope of agricultural genetics, genomics and breeding databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources (https://www.agbiodata.org/databases) covering model or crop plant and animal GGB data, ontologies, pathways, genetic variation and breeding platforms (referred to as โdatabasesโ throughout). One of the goals of the Consortium is to facilitate FAIR (Findable, Accessible, Interoperable, and Reusable) data management and the integration of datasets which requires data sharing, along with structured vocabularies and/or ontologies. Two AgBioData working groups, focused on Data Sharing and Ontologies, respectively, conducted a Consortium-wide survey to assess the current status and future needs of the members in those areas. A total of 33 researchers responded to the survey, representing 37 databases. Results suggest that data-sharing practices by AgBioData databases are in a fairly healthy state, but it is not clear whether this is true for all metadata and data types across all databases; and that, ontology use has not substantially changed since a similar survey was conducted in 2017. Based on our evaluation of the survey results, we recommend (i) providing training for database personnel in a specific data-sharing techniques, as well as in ontology use; (ii) further study on what metadata is shared, and how well it is shared among databases; (iii) promoting an understanding of data sharing and ontologies in the stakeholder community; (iv) improving data sharing and ontologies for specific phenotypic data types and formats; and (v) lowering specific barriers to data sharing and ontology use, by identifying sustainability solutions, and the identification, promotion, or development of data standards. Combined, these improvements are likely to help AgBioData databases increase development efforts towards improved ontology use, and data sharing via programmatic means. Database URL https://www.agbiodata.org/databases
Girรณn, Jennifer C; Tarasov, Sergei; Gonzรกlez Montaรฑa, Luis Antonio; Matentzoglu, Nicolas; Smith, Aaron D; Koch, Markus; Boudinot, Brendon E; Bouchard, Patrice; Burks, Roger; Vogt, Lars; et al
(, Systematic Biology)
Abstract The spectacular radiation of insects has produced a stunning diversity of phenotypes. During the past 250 years, research on insect systematics has generated hundreds of terms for naming and comparing them. In its current form, this terminological diversity is presented in natural language and lacks formalization, which prohibits computer-assisted comparison using semantic web technologies. Here we propose a Model for Describing Cuticular Anatomical Structures (MoDCAS) which incorporates structural properties and positional relationships for standardized, consistent, and reproducible descriptions of arthropod phenotypes. We applied the MoDCAS framework in creating the ontology for the Anatomy of the Insect Skeleto-Muscular system (AISM). The AISM is the first general insect ontology that aims to cover all taxa by providing generalized, fully logical, and queryable, definitions for each term. It was built using the Ontology Development Kit (ODK), which maximizes interoperability with Uberon (Uberon multi-species anatomy ontology) and other basic ontologies, enhancing the integration of insect anatomy into the broader biological sciences. A template system for adding new terms, extending, and linking the AISM to additional anatomical, phenotypic, genetic, and chemical ontologies is also introduced. The AISM is proposed as the backbone for taxon-specific insect ontologies and has potential applications spanning systematic biology and biodiversity informatics, allowing users to (1) use controlled vocabularies and create semi-automated computer-parsable insect morphological descriptions; (2) integrate insect morphology into broader fields of research, including ontology-informed phylogenetic methods, logical homology hypothesis testing, evo-devo studies, and genotype to phenotype mapping; and (3) automate the extraction of morphological data from the literature, enabling the generation of large-scale phenomic data, by facilitating the production and testing of informatic tools able to extract, link, annotate, and process morphological data. This descriptive model and its ontological applications will allow for clear and semantically interoperable integration of arthropod phenotypes in biodiversity studies.
Stephen, S., and Hahmann, T. Model-Finding for Externally Verifying FOL Ontologies: A Study of Spatial Ontologies. Retrieved from https://par.nsf.gov/biblio/10190461. Proc. of the International Conference on Formal Ontology in Information System (FOIS-2020) . Web. doi:10.3233/FAIA200675.
Stephen, S., & Hahmann, T. Model-Finding for Externally Verifying FOL Ontologies: A Study of Spatial Ontologies. Proc. of the International Conference on Formal Ontology in Information System (FOIS-2020), (). Retrieved from https://par.nsf.gov/biblio/10190461. https://doi.org/10.3233/FAIA200675
Stephen, S., and Hahmann, T.
"Model-Finding for Externally Verifying FOL Ontologies: A Study of Spatial Ontologies". Proc. of the International Conference on Formal Ontology in Information System (FOIS-2020) (). Country unknown/Code not available. https://doi.org/10.3233/FAIA200675.https://par.nsf.gov/biblio/10190461.
@article{osti_10190461,
place = {Country unknown/Code not available},
title = {Model-Finding for Externally Verifying FOL Ontologies: A Study of Spatial Ontologies},
url = {https://par.nsf.gov/biblio/10190461},
DOI = {10.3233/FAIA200675},
abstractNote = {Use and reuse of an ontology requires prior ontology verification which encompasses, at least, proving that the ontology is internally consistent and consistent with representative datasets. First-order logic (FOL) model finders are among the only available tools to aid us in this undertaking, but proving consistency of FOL ontologies is theoretically intractable while also rarely succeeding in practice, with FOL model finders scaling even worse than FOL theorem provers. This issue is further exacerbated when verifying FOL ontologies against datasets, which requires constructing models with larger domain sizes. This paper presents a first systematic study of the general feasibility of SAT-based model finding with FOL ontologies. We use select spatial ontologies and carefully controlled synthetic datasets to identify key measures that determine the size and difficulty of the resulting SAT problems. We experimentally show that these measures are closely correlated with the runtimes of Vampire and Paradox, two state-of-the-art model finders. We propose a definition elimination technique and demonstrate that it can be a highly effective measure for reducing the problem size and improving the runtime and scalability of model finding.},
journal = {Proc. of the International Conference on Formal Ontology in Information System (FOIS-2020)},
author = {Stephen, S. and Hahmann, T.},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.