skip to main content


Title: TaxonWorks: a Use Case in Documenting of Etymology of Generic Names in Auchenorrhyncha (Hemiptera)
The 3i World Auchenorrhyncha database (http://dmitriev.speciesfile.org) is being migrated into TaxonWorks (http://taxonworks.org) and comprises nomenclatural data for all known Auchenorrhyncha taxa (leafhoppers, planthoppers, treehoppers, cicadas, spittle bugs). Of all those scientific names, 8,700 are unique genus-group names (which include valid genera and subgenera as well as their synonyms). According to the Rules of Zoological Nomenclature, a properly formed species-group name when combined with a genus-group name must agree with the latter in gender if the species-group name is or ends with a Latin or Latinized adjective or participle. This provides a double challenge for researchers describing new or citing existing taxa. For each species, the knowledge about the part of speech is essential information (nouns do not change their form when associated with different generic names). For the genus, the knowledge of the gender is essential information. Every time the species is transferred from one genus to another, its ending may need to be transformed to make a proper new scientific name (a binominal name). In modern day practice, it is important, when establishing a new name, to provide information about etymology of this name and the ways it should be used in the future publications: the grammatical gender for a genus, and the part of speech for a species. The older names often do not provide enough information about their etymology to make proper construction of scientific names. That is why in the literature, we can find numerous cases where a scientific name is not formed in conformity to the Rules of Nomenclature. An attempt was made to resolve the etymology of the generic names in Auchenorrhyncha to unify and clarify nomenclatural issues in this group of insects. In TaxonWorks, the rules of nomenclature are defined using the NOMEN onthology (https://github.com/SpeciesFileGroup/nomen).  more » « less
Award ID(s):
1639601
NSF-PAR ID:
10079956
Author(s) / Creator(s):
Date Published:
Journal Name:
Biodiversity Information Science and Standards
Volume:
2
ISSN:
2535-0897
Page Range / eLocation ID:
e25724
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We are now over four decades into digitally managing the names of Earth's species. As the number of federating (i.e., software that brings together previously disparate projects under a common infrastructure, for example TaxonWorks) and aggregating (e.g., International Plant Name Index, Catalog of Life (CoL)) efforts increase, there remains an unmet need for both the migration forward of old data, and for the production of new, precise and comprehensive nomenclatural catalogs. Given this context, we provide an overview of how TaxonWorks seeks to contribute to this effort, and where it might evolve in the future. In TaxonWorks, when we talk about governed names and relationships, we mean it in the sense of existing international codes of nomenclature (e.g., the International Code of Zoological Nomenclature (ICZN)). More technically, nomenclature is defined as a set of objective assertions that describe the relationships between the names given to biological taxa and the rules that determine how those names are governed. It is critical to note that this is not the same thing as the relationship between a name and a biological entity, but rather nomenclature in TaxonWorks represents the details of the (governed) relationships between names. Rather than thinking of nomenclature as changing (a verb commonly used to express frustration with biological nomenclature), it is useful to think of nomenclature as a set of data points, which grows over time. For example, when synonymy happens, we do not erase the past, but rather record a new context for the name(s) in question. The biological concept changes, but the nomenclature (names) simply keeps adding up. Behind the scenes, nomenclature in TaxonWorks is represented by a set of nodes and edges, i.e., a mathematical graph, or network (e.g., Fig. 1). Most names (i.e., nodes in the network) are what TaxonWorks calls "protonyms," monomial epithets that are used to construct, for example, bionomial names (not to be confused with "protonym" sensu the ICZN). Protonyms are linked to other protonyms via relationships defined in NOMEN, an ontology that encodes governed rules of nomenclature. Within the system, all data, nodes and edges, can be cited, i.e., linked to a source and therefore anchored in time and tied to authorship, and annotated with a variety of annotation types (e.g., notes, confidence levels, tags). The actual building of the graphs is greatly simplified by multiple user-interfaces that allow scientists to review (e.g. Fig. 2), create, filter, and add to (again, not "change") the nomenclatural history. As in any complex knowledge-representation model, there are outlying scenarios, or edge cases that emerge, making certain human tasks more complex than others. TaxonWorks is no exception, it has limitations in terms of what and how some things can be represented. While many complex representations are hidden by simplified user-interfaces, some, for example, the handling of the ICZN's Family-group name, batch-loading of invalid relationships, and comparative syncing against external resources need more work to simplify the processes presently required to meet catalogers' needs. The depth at which TaxonWorks can capture nomenclature is only really valuable if it can be used by others. This is facilitated by the application programming interface (API) serving its data (https://api.taxonworks.org), serving text files, and by exports to standards like the emerging Catalog of Life Data Package. With reference to real-world problems, we illustrate different ways in which the API can be used, for example, as integrated into spreadsheets, through the use of command line scripts, and serve in the generation of public-facing websites. Behind all this effort are an increasing number of people recording help videos, developing documentation, and troubleshooting software and technical issues. Major contributions have come from developers at many skill levels, from high school to senior software engineers, illustrating that TaxonWorks leads in enabling both technical and domain-based contributions. The health and growth of this community is a key factor in TaxonWork's potential long-term impact in the effort to unify the names of Earth's species. 
    more » « less
  2. Large systematic revisionary projects incorporating data for hundreds or thousands of taxa require an integrative approach, with a strong biodiversity-informatics core for efficient data management to facilitate research on the group. Our original biodiversity informatics platform, 3i (Internet-accessible Interactive Identification) combined a customized MS Access database backend with ASP-based web interfaces to support revisionary syntheses of several large genera of leafhopers (Hemiptera: Auchenorrhyncha: Cicadellidae). More recently, for our National Science Foundation sponsored project, “GoLife: Collaborative Research: Integrative genealogy, ecology and phenomics of deltocephaline leafhoppers (Hemiptera: Cicadellidae), and their microbial associates”, we selected the new open-source platform TaxonWorks as the cyberinfrastructure. In the scope of the project, the original “3i World Auchenorrhyncha Database” was imported into TaxonWorks. At the present time, TaxonWorks has many tools to automatically import nomenclature, citations, and specimen based collection data. At the time of the initial migration of the 3i database, many of those tools were still under development, and complexity of the data in the database required a custom migration script, which is still probably the most efficient solution for importing datasets with long development history. At the moment, the World Auchenorrhyncha Database comprehensively covers nomenclature of the group and includes data on 70 valid families, 6,816 valid genera, 47,064 valid species as well as synonymy and subsequent combinations (Fig. 1). In addition, many taxon records include the original citation, bibliography, type information, etymology, etc. The bibliography of the group includes 37,579 sources, about 1/3 of which are associated with PDF files. Species have distribution records, either derived from individual specimens or as country and state level asserted distribution, as well as biological associations indicating host plants, predators, and parasitoids. Observation matrices in TaxonWorks are designed to handle morphological data associated with taxa or specimens. The matrices may be used to automatically generate interactive identification keys and taxon descriptions. They can also be downloaded to be imported, for example, into Lucid builder, or to perform phylogenetic analysis using an external application. At the moment there are 36 matrices associated with the project. The observation matrix from GoLife project covers 798 taxa by 210 descriptors (most of which are qualitative multi-state morphological descriptors) (Fig. 2). Illustrations are provided for 9,886 taxa and organized in the specialized image matrix and could be used as a pictorial key for determination of species and taxa of a higher rank. For the phylogenetic analysis, a dataset was constructed for 730 terminal taxa and >160,000 nucleotide positions obtained using anchored hybrid enrichment of genomic DNA for a sample of leafhoppers from the subfamily Deltocephalinae and outgroups. The probe kit targets leafhopper genes, as well as some bacterial genes (endosymbionts and plant pathogens transmitted by leafhoppers). The maximum likelihood analyses of concatenated nucleotide and amino acid sequences as well as coalescent gene tree analysis yielded well-resolved phylogenetic trees (Cao et al. 2022). Raw sequence data have been uploaded to the Sequence Read Archive on GenBank. Occurrence and morphological data, as well as diagnostic images, for voucher specimens have been incorporated into TaxonWorks. Data in TaxonWorks could be exported in raw format, get accessed via Application Programming Interface (API), or be shared with external data aggregators like Catalogue of Life, GBIF, iDigBio. 
    more » « less
  3. The World Auchenorrhyncha Database comprises nomenclatural information for all known taxa in this suborder of Hemipteran insects (leafhoppers, planthoppers, treehoppers, cicadas, and spittle bugs). Of more than 110,000 included scientific names, 8,921 represent unique genus–group names (valid genera and subgenera as well as their synonyms). An attempt is being made to resolve the etymology of those names to clarify nomenclatural issues in this group of insects. 
    more » « less
  4. This catalogue includes all valid family-group (six subtribes), genus-group (55 genera, 33 subgenera), and species-group names (1009 species and subspecies) of Sepidiini darkling beetles (Coleoptera: Tenebrionidae: Pimeliinae), and their available synonyms. For each name, the author, year, and page number of the description are provided, with additional information (e.g., type species for genus-group names, author of synonymies for invalid taxa, notes) depending on the taxon rank. Verified distributional records (loci typici and data acquired from revisionary publications) for all the species are gathered. Distribution of the subtribes is illustrated and discussed. Several new nomenclatural acts are included. The generic names Phanerotomea Koch, 1958 [= Ocnodes Fåhraeus, 1870] and Parmularia Koch, 1955 [= Psammodes Kirby, 1819] are new synonyms (valid names in square brackets). The following new combinations are proposed: Ocnodesacuductusacuductus (Ancey, 1883), O. acuductusufipanus (Koch, 1952), O. adamantinus (Koch, 1952), O. argenteofasciatus (Koch, 1953), O. arnoldiarnoldi (Koch, 1952), O. arnoldisabianus (Koch, 1952), O.barbosai (Koch, 1952), O.basilewskyi (Koch, 1952), O.bellmarleyi (Koch, 1952), O. benguelensis (Koch, 1952), O. bertolonii (Guérin-Méneville, 1844), O. blandus (Koch, 1952), O. brevicornis (Haag-Rutenberg, 1875), O. brunnescensbrunnescens (Haag-Rutenberg, 1871), O. brunnescensmolestus (Haag-Rutenberg, 1875), O. buccinator (Koch, 1952), O. bushmanicus (Koch, 1952), O. carbonarius (Gerstaecker, 1854), O. cardiopterus (Fairmaire, 1888), O. cataractus (Koch, 1952), O. cinerarius (Koch, 1952), O. complanatus (Koch, 1952), O. confertus (Koch, 1952), O. congruens (Péringuey, 1899), O. cordiventris (Haag-Rutenberg, 1871), O. crocodilinus (Koch, 1952), O. dimorphus (Koch, 1952), O. distinctus (Haag-Rutenberg, 1871), O. dolosus (Péringuey, 1899), O. dorsocostatus (Gebien, 1910), O. dubiosus (Péringuey, 1899), O. ejectus (Koch, 1952), O. epronoticus (Koch, 1952), O. erichsoni (Haag-Rutenberg, 1871), O. ferreiraeferreirae (Koch, 1952), O. ferreiraezulu (Koch, 1952), O. fettingi (Haag-Rutenberg, 1875), O. fistucans (Koch, 1952), O. fraternus (Haag-Rutenberg, 1875), O. freyi (Koch, 1952), O. freudei (Koch, 1952), O. fulgidus (Koch, 1952), O. funestus (Haag-Rutenberg, 1871), O. gemmeulus (Koch, 1952), O. gibberosulus (Péringuey, 1908), O. gibbus (Haag-Rutenberg, 1879), O. globosus (Haag-Rutenberg, 1871), O. granisterna (Koch, 1952), O. granulosicollis (Haag-Rutenberg, 1871), O.gridellii (Koch, 1960), O. gueriniguerini (Haag-Rutenberg, 1871), O. guerinilawrencii (Koch, 1954), O. guerinimancus (Koch 1954), O. haemorrhoidalishaemorrhoidalis (Koch, 1952), O. haemorrhoidalissalubris (Koch, 1952), O. heydeni (Haag-Rutenberg, 1871), O. humeralis (Haag-Rutenberg, 1871), O. humerangula (Koch, 1952), O. imbricatus (Koch, 1952), O.imitatorimitator (Péringuey, 1899), O. imitatorinvadens (Koch, 1952), O. inflatus (Koch, 1952), O. janssensi (Koch, 1952), O. javeti (Haag-Rutenberg, 1871), O. junodi (Péringuey, 1899), O. kulzeri (Koch, 1952), O. lacustris (Koch, 1952), O. laevigatus (Olivier, 1795), O. lanceolatus (Koch, 1953), O. licitus (Peringey, 1899), O. luctuosus (Haag-Rutenberg, 1871), O. luxurosus (Koch, 1952), O. maputoensis (Koch, 1952), O. marginicollis (Koch, 1952), O. martinsi (Koch, 1952), O. melleus (Koch, 1952), O. mendicusestermanni (Koch, 1952), O. mendicusmendicus (Péringuey, 1899), O. miles (Péringuey, 1908), O. mimeticus (Koch, 1952), O. misolampoides (Fairmaire, 1888), O. mixtus (Haag-Rutenberg, 1871), O. monacha (Koch, 1952), O. montanus (Koch, 1952), O. mozambicus (Koch, 1952), O. muliebriscurtus (Koch, 1952), O. muliebrismuliebris (Koch, 1952), O. muliebrissilvestris (Koch, 1952), O. nervosus (Haag-Rutenberg, 1871), O.notatum (Thunberg, 1787), O. notaticollis (Koch, 1952), O. odorans (Koch, 1952), O. opacus (Solier, 1843), O. osbecki (Billberg, 1815), O. overlaeti (Koch, 1952), O. ovulus (Haag-Rutenberg, 1871), O. pachysomaornata (Koch, 1952), O. pachysomapachysoma (Péringuey, 1892), O. papillosus (Koch, 1952), O. pedator (Fairmaire, 1888), O. perlucidus (Koch, 1952), O. planus (Koch, 1952), O. pretorianus (Koch, 1952), O. procursus (Péringuey, 1899), O. protectus (Koch, 1952), O. punctatissimus (Koch, 1952), O. puncticollis (Koch, 1952), O. punctipennisplanisculptus (Koch, 1952), O. punctipennispunctipennis (Harold, 1878), O. punctipleura (Koch, 1952), O. rhodesianus (Koch, 1952), O. roriferus (Koch, 1952), O. rufipes (Harold, 1878), O. saltuarius (Koch, 1952), O.scabricollis (Gerstaecker, 1854), O. scopulipes (Koch, 1952), O. scrobicollisgriqua (Koch, 1952), O. scrobicollissimulans (Koch, 1952), O. semirasus (Koch, 1952), O. semiscabrum (Haag-Rutenberg, 1871), O. sericicollis (Koch, 1952), O.similis (Péringuey, 1899), O. sjoestedti (Gebien, 1910), O. spatulipes (Koch, 1952), O. specularis (Péringuey, 1899), O. spinigerus (Koch, 1952), O. stevensoni (Koch, 1952), O. tarsocnoides (Koch, 1952), O. temulentus (Koch, 1952), O. tenebrosusmelanarius (Haag-Rutenberg, 1871), O. tenebrosustenebrosus (Erichson, 1843), O. tibialis (Haag-Rutenberg, 1871), O. torosus (Koch, 1952), O. transversicollis (Haag-Rutenberg, 1879), O. tumidus (Haag-Rutenberg, 1871), O. umvumanus (Koch, 1952), O. vagus (Péringuey, 1899), O. vaticinus (Péringuey, 1899), O. verecundus (Péringuey, 1899), O. vetustus (Koch, 1952), O. vexator (Péringuey, 1899), O. virago (Koch, 1952), O. warmeloi (Koch, 1953), O. zanzibaricus (Haag-Rutenberg, 1875), Psammophanesantinorii (Gridelli, 1939), and P.mirei (Pierre, 1979). The type species [placed in square brackets] of the following genus-group taxa are designated for the first time, Ocnodes Fåhraeus, 1870 [ Ocnodesscrobicollis Fåhraeus, 1870], Psammodophysis Péringuey, 1899 [ Psammodophysisprobes Péringuey, 1899], and Trachynotidus Péringuey, 1899 [ Psammodesthoreyi Haag-Rutenberg, 1871]. A lectotype is designated for Histrionotusomercooperi Koch, 1955 in order to fix its taxonomic status. Ulamus Kamiński is introduced here as a replacement name for Echinotus Marwick, 1935 [ Type species. Aviculaechinata Smith, 1817] (Mollusca: Pteriidae) to avoid homonymy with Echinotus Solier, 1843 (Coleoptera: Tenebrionidae). 
    more » « less
  5. TaxonWorks (http://taxonworks.org) is an integrated workbench for taxonomists and biodiversity scientists. It is designed to capture, organize, and enrich data, share and refine it with collaborators, and package it for analysis and publication. It is based on PostgreSQL (database) and the Ruby-on-Rails programming language and framework for developing web applications (https://github.com/SpeciesFileGroup/taxonworks). The TaxonWorks community is built around an open software ecosystem that facilitates participation at many levels. TaxonWorks is designed to serve both researchers who create and curate the data, as well as technical users, such as programmers and informatics specialists, who act as data consumers. TaxonWorks provides researchers with robust, user friendly interfaces based on well thought out customized workflows for efficient and validated data entry. It provides technical users database access through an application programming interface (API) that serves data in JSON format. The data model includes coverage for nearly all classes of data recorded in modern taxonomic treatments primary studies of biodiversity, including nomenclature, bibliography, specimens and collecting events, phylogenetic matrices and species descriptions, etc. The nomenclatural classes are based on the NOMEN ontology (https://github.com/SpeciesFileGroup/nomen). 
    more » « less