- Publication Date:
- NSF-PAR ID:
- 10292287
- Journal Name:
- International Semantic Web Conference (ISWC 2021)
- Sponsoring Org:
- National Science Foundation
More Like this
-
Use and reuse of an ontology requires prior ontology verification which encompasses, at least, proving that the ontology is internally consistent and consistent with representative datasets. First-order logic (FOL) model finders are among the only available tools to aid us in this undertaking, but proving consistency of FOL ontologies is theoretically intractable while also rarely succeeding in practice, with FOL model finders scaling even worse than FOL theorem provers. This issue is further exacerbated when verifying FOL ontologies against datasets, which requires constructing models with larger domain sizes. This paper presents a first systematic study of the general feasibility of SAT-based model finding with FOL ontologies. We use select spatial ontologies and carefully controlled synthetic datasets to identify key measures that determine the size and difficulty of the resulting SAT problems. We experimentally show that these measures are closely correlated with the runtimes of Vampire and Paradox, two state-of-the-art model finders. We propose a definition elimination technique and demonstrate that it can be a highly effective measure for reducing the problem size and improving the runtime and scalability of model finding.
-
Background: When phenotypic characters are described in the literature, they may be constrained or clarified with additional information such as the location or degree of expression, these terms are called “modifiers”. With effort underway to convert narrative character descriptions to computable data, ontologies for such modifiers are needed. Such ontologies can also be used to guide term usage in future publications. Spatial and method modifiers are the subjects of ontologies that already have been developed or are under development. In this work, frequency (e.g., rarely, usually), certainty (e.g., probably, definitely), degree (e.g., slightly, extremely), and coverage modifiers (e.g., sparsely, entirely) are collected, reviewed, and used to create two modifier ontologies with different design considerations. The basic goal is to express the sequential relationships within a type of modifiers, for example, usually is more frequent than rarely, in order to allow data annotated with ontology terms to be classified accordingly. Method: Two designs are proposed for the ontology, both using the list pattern: a closed ordered list (i.e., five-bin design) and an open ordered list design. The five-bin design puts the modifier terms into a set of 5 fixed bins with interval object properties, for example, one_level_more/less_frequently_than, where new terms canmore »
-
It takes great effort to manually or semi-automatically convert free-text phenotype narratives (e.g., morphological descriptions in taxonomic works) to a computable format before they can be used in large-scale analyses. We argue that neither a manual curation approach nor an information extraction approach based on machine learning is a sustainable solution to produce computable phenotypic data that are FAIR (Findable, Accessible, Interoperable, Reusable) (Wilkinson et al. 2016). This is because these approaches do not scale to all biodiversity, and they do not stop the publication of free-text phenotypes that would need post-publication curation. In addition, both manual and machine learning approaches face great challenges: the problem of inter-curator variation (curators interpret/convert a phenotype differently from each other) in manual curation, and keywords to ontology concept translation in automated information extraction, make it difficult for either approach to produce data that are truly FAIR. Our empirical studies show that inter-curator variation in translating phenotype characters to Entity-Quality statements (Mabee et al. 2007) is as high as 40% even within a single project. With this level of variation, curated data integrated from multiple curation projects may still not be FAIR. The key causes of this variation have been identified as semantic vaguenessmore »
-
Making the most of biodiversity data requires linking observations of biological species from multiple sources both efficiently and accurately (Bisby 2000, Franz et al. 2016). Aggregating occurrence records using taxonomic names and synonyms is computationally efficient but known to experience significant limitations on accuracy when the assumption of one-to-one relationships between names and biological entities breaks down (Remsen 2016, Franz and Sterner 2018). Taxonomic treatments and checklists provide authoritative information about the correct usage of names for species, including operational representations of the meanings of those names in the form of range maps, reference genetic sequences, or diagnostic traits. They increasingly provide taxonomic intelligence in the form of precise description of the semantic relationships between different published names in the literature. Making this authoritative information Findable, Accessible, Interoperable, and Reusable (FAIR; Wilkinson et al. 2016) would be a transformative advance for biodiversity data sharing and help drive adoption and novel extensions of existing standards such as the Taxonomic Concept Schema and the OpenBiodiv Ontology (Kennedy et al. 2006, Senderov et al. 2018). We call for the greater, global Biodiversity Information Standards (TDWG) and taxonomy community to commit to extending and expanding on how FAIR applies to biodiversity data and includemore »
-
We revise the genus
Conostigmus Dahlbom 1858 occurring in Madagascar, based on data from more specimens than were examined for the latest world revision of the genus. Our results yield new information about intraspecific variability and the nature of the atypical latitudinal diversity gradient (LDG) observed in Ceraphronoidea. We also investigate cellular processes that underlie body size polyphenism, by utilizing the correspondence between epidermal cells and scutes, polygonal units of leather-like microsculpture. Our results reveal that body size polyphenism in Megaspilidae is most likely related to cell number and not cell size variation, and that cell size differs between epithelial fields of the head and that of the mesosoma. Three species,Conostigmus ballescoracas Dessart, 1997,C. babaiax Dessart, 1996 andC. longulus Dessart, 1997, are redescribed. Females ofC. longulus are described for the first time, as are nine new species:C. bucephalus Mikó and Trietsch sp. nov.,C. clavatus Mikó and Trietsch sp. nov.,C. fianarantsoaensis Mikó and Trietsch sp. nov.,C. lucidus Mikó and Trietsch sp. nov.,C. macrocupula , Mikó and Trietsch sp. nov.,C. madagascariensis Mikó and Trietsch sp. nov.,C. missyhazenae Mikó and Trietsch sp. nov.,C. pseudobabaiax Mikó and Trietsch sp. nov., andC. toliaraensis Mikó and Trietsch sp. nov. A fully illustrated identification key forMalagasy Conostigmus species and a Web Ontology Language (OWL) representation of the taxonomic treatment, including specimen data, nomenclature,more »