“What is crucial for your ability to communicate with me… pivots on the recipient’s capacity to interpret—to make good inferential sense of the meanings that the declarer is able to send” (Rescher 2000, p148). Conventional approaches to reconciling taxonomic information in biodiversity databases have been based on string matching for unique taxonomic name combinations (Kindt 2020, Norman et al. 2020). However, in their original context, these names pertain to specific usages or taxonomic concepts, which can subsequently vary for the same name as applied by different authors. Name-based synonym matching is a helpful first step (Guala 2016, Correia et al. 2018), but may still leave considerable ambiguity regarding proper usage (Fig. 1). Therefore, developing "taxonomic intelligence" is the bioinformatic challenge to adequately represent, and subsequently propagate, this complex name/usage interaction across trusted biodiversity data networks. How do we ensure that senders and recipients of biodiversity data not only can share messages but do so with “good inferential sense” of their respective meanings? Key obstacles have involved dealing with the complexity of taxonomic name/usage modifications through time, both in terms of accounting for and digitally representing the long histories of taxonomic change in most lineages. An important critique of proposals tomore »
Integrating Taxonomic Names and Concepts from Paper and Digital Sources for a New Flora of Alaska
The taxonomic foundation of a new regional flora or monograph is the reconciliation of pre-existing names and taxonomic concepts (i.e., variation in usage of those names). This reconciliation is traditionally done manually, but the availability of taxonomic resources online and of text manipulation software means that some of the work can now be automated, speeding up the development of new taxonomic products. As a contribution to developing a new Flora of Alaska (floraofalaska.org), we have digitized the main pre-existing flora (Hultén 1968) and combined it with key online taxonomic name sources (Panarctic Flora, Flora of North America, International Plant Names Index - IPNI, Tropicos, Kew’s World Checklist of Selected Plant Families), to build a canonical list of names anchored to external Globally Unique Identifiers (GUIDs) (e.g., IPNI URLs). We developed taxonomically-aware fuzzy-matching software ( matchnames , Webb 2020) to identify cognates in different lists. The taxa for which there are variations between different sources in accepted names and synonyms are then flagged for review by taxonomic experts. However, even though names may be consistent across previous monographs and floras, the taxonomic concept (or circumscription) of a name may differ among authors, meaning that the way an accepted name in the more »
- Award ID(s):
- 1759964
- Publication Date:
- NSF-PAR ID:
- 10353593
- Journal Name:
- Biodiversity Information Science and Standards
- Volume:
- 5
- ISSN:
- 2535-0897
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Making the most of biodiversity data requires linking observations of biological species from multiple sources both efficiently and accurately (Bisby 2000, Franz et al. 2016). Aggregating occurrence records using taxonomic names and synonyms is computationally efficient but known to experience significant limitations on accuracy when the assumption of one-to-one relationships between names and biological entities breaks down (Remsen 2016, Franz and Sterner 2018). Taxonomic treatments and checklists provide authoritative information about the correct usage of names for species, including operational representations of the meanings of those names in the form of range maps, reference genetic sequences, or diagnostic traits. They increasingly provide taxonomic intelligence in the form of precise description of the semantic relationships between different published names in the literature. Making this authoritative information Findable, Accessible, Interoperable, and Reusable (FAIR; Wilkinson et al. 2016) would be a transformative advance for biodiversity data sharing and help drive adoption and novel extensions of existing standards such as the Taxonomic Concept Schema and the OpenBiodiv Ontology (Kennedy et al. 2006, Senderov et al. 2018). We call for the greater, global Biodiversity Information Standards (TDWG) and taxonomy community to commit to extending and expanding on how FAIR applies to biodiversity data and includemore »
-
We are now over four decades into digitally managing the names of Earth's species. As the number of federating (i.e., software that brings together previously disparate projects under a common infrastructure, for example TaxonWorks) and aggregating (e.g., International Plant Name Index, Catalog of Life (CoL)) efforts increase, there remains an unmet need for both the migration forward of old data, and for the production of new, precise and comprehensive nomenclatural catalogs. Given this context, we provide an overview of how TaxonWorks seeks to contribute to this effort, and where it might evolve in the future. In TaxonWorks, when we talk about governed names and relationships, we mean it in the sense of existing international codes of nomenclature (e.g., the International Code of Zoological Nomenclature (ICZN)). More technically, nomenclature is defined as a set of objective assertions that describe the relationships between the names given to biological taxa and the rules that determine how those names are governed. It is critical to note that this is not the same thing as the relationship between a name and a biological entity, but rather nomenclature in TaxonWorks represents the details of the (governed) relationships between names. Rather than thinking of nomenclature as changingmore »
-
It takes great effort to manually or semi-automatically convert free-text phenotype narratives (e.g., morphological descriptions in taxonomic works) to a computable format before they can be used in large-scale analyses. We argue that neither a manual curation approach nor an information extraction approach based on machine learning is a sustainable solution to produce computable phenotypic data that are FAIR (Findable, Accessible, Interoperable, Reusable) (Wilkinson et al. 2016). This is because these approaches do not scale to all biodiversity, and they do not stop the publication of free-text phenotypes that would need post-publication curation. In addition, both manual and machine learning approaches face great challenges: the problem of inter-curator variation (curators interpret/convert a phenotype differently from each other) in manual curation, and keywords to ontology concept translation in automated information extraction, make it difficult for either approach to produce data that are truly FAIR. Our empirical studies show that inter-curator variation in translating phenotype characters to Entity-Quality statements (Mabee et al. 2007) is as high as 40% even within a single project. With this level of variation, curated data integrated from multiple curation projects may still not be FAIR. The key causes of this variation have been identified as semantic vaguenessmore »
-
Taxonomic treatments start with the creation of taxon-by-character matrices. Systematics authors recognized data ambiguity issues in published phenotypic characters and are willing to adopt an ontology-aware authoring tool (Cui et al. 2022). To promote interoperable and reusable taxonomic treatments, we have developed two research prototypes: a web-based application, Character Recorder (http://chrecorder.lusites.xyz/login), to faciliate the use and addition of ontology terms by Carex systematist authors while building their matrices, and a mobile application, Conflict Resolver (Android, https://tinyurl.com/5cfatrz8), to identify potential conflicts among the terms added by the authors and facilitate the resolution of the conflicts. We have completed two usability studies on Character Recorder. a web-based application, Character Recorder (http://chrecorder.lusites.xyz/login), to faciliate the use and addition of ontology terms by Carex systematist authors while building their matrices, and a mobile application, Conflict Resolver (Android, https://tinyurl.com/5cfatrz8), to identify potential conflicts among the terms added by the authors and facilitate the resolution of the conflicts. We have completed two usability studies on Character Recorder. In the one-hour Student Usabiilty Study, 16 third-year biology students with a general introduction to Carex used Character Recorder and Excel to record a set of 11 given characters for two samples (shape of sheath summits = U-shaped/U shaped).more »