skip to main content


Title: Could a Focus on the “Why” of Taxonomy Help Taxonomy Better Respond to the Needs of Science and Society?
Genomics has put prokaryotic rank-based taxonomy on a solid phylogenetic foundation. However, most taxonomic ranks were set long before the advent of DNA sequencing and genomics. In this concept paper, we thus ask the following question: should prokaryotic classification schemes besides the current phylum-to-species ranks be explored, developed, and incorporated into scientific discourse? Could such alternative schemes provide better solutions to the basic need of science and society for which taxonomy was developed, namely, precise and meaningful identification? A neutral genome-similarity based framework is then described that could allow alternative classification schemes to be explored, compared, and translated into each other without having to choose only one as the gold standard. Classification schemes could thus continue to evolve and be selected according to their benefits and based on how well they fulfill the need for prokaryotic identification.  more » « less
Award ID(s):
2018522
NSF-PAR ID:
10347150
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Frontiers in Microbiology
Volume:
13
ISSN:
1664-302X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    TheGila robustaspecies complex in the lower reaches of the Colorado River includes three nominal and contested species (G. robusta, G. intermedia,andG. nigra) originally defined by morphological and meristic characters. In subsequent investigations, none of these characters proved diagnostic, and species assignments were based on capture location. Two recent studies applied conservation genomics to assess species boundaries and reached contrasting conclusions: an ezRAD phylogenetic study resolved 5 lineages with poor alignment to species categories and proposed a single species with multiple population partitions. In contrast, a dd-RAD coalescent study concluded that the three nominal species are well-supported evolutionarily lineages. Here we developed a draft genome (~ 1.229 Gbp) to apply genome-wide coverage (10,246 SNPs) with nearly range-wide sampling of specimens (G. robustaN = 266,G. intermediaN = 241, andG. nigraN = 117) to resolve this debate. All three nominal species were polyphyletic, whereas 5 of 8 watersheds were monophyletic. AMOVA partitioned 23.1% of genetic variance among nominal species, 30.9% among watersheds, and the Little Colorado River was highly distinct (FSTranged from 0.79 to 0.88 across analyses). Likewise, DAPC identified watersheds as more distinct than species, with the Little Colorado River having 297 fixed nucleotide differences compared to zero fixed differences among the three nominal species. In every analysis, geography explains more of the observed variance than putative taxonomy, and there are no diagnostic molecular or morphological characters to justify species designation. Our analysis reconciles previous work by showing that species identities based on type location are supported by significant divergence, but natural geographic partitions show consistently greater divergence. Thus, our data confirmGila robustaas a single polytypic species with roughly a dozen highly isolated geographic populations, providing a strong scientific basis for watershed-based future conservation.

     
    more » « less
  2. Implicit Requirements (IMR) identification is part of the Requirements Engineering (RE) phase in Software Engineering during which data is gathered to create SRS (Software Requirements Specifications) documents. As opposed to explicit requirements clearly stated, IMRs constitute subtle data and need to be inferred. Research has shown that IMRs are crucial to the success of software development. Many software systems can encounter failures due to lack of IMR data management. SRS documents are large, often hundreds of pages, due to which manually identifying IMRs by human software engineers is not feasible. Moreover, such data is evergrowing due to the expansion of software systems. It is thus important to address the crucial issue of IMR data management. This article presents a survey on IMRs in SRS documents with the definition and overview of IMR data, detailed taxonomy of IMRs with explanation and examples, practices in managing IMR data, and tools for IMR identification. In addition to reviewing classical and state-of-the-art approaches, we highlight trends and challenges and point out open issues for future research. This survey article is interesting based on data quality, hidden information retrieval, veracity and salience, and knowledge discovery from large textual documents with complex heterogeneous data. 
    more » « less
  3. Fungal taxonomy and ecology have been revolutionized by the application of molecular methods and both have increasing connections to genomics and functional biology. However, data streams from traditional specimen- and culture-based systematics are not yet fully integrated with those from metagenomic and metatranscriptomic studies, which limits understanding of the taxonomic diversity and metabolic properties of fungal communities. This article reviews current resources, needs, and opportunities for sequence-based classification and identification (SBCI) in fungi as well as related efforts in prokaryotes. To realize the full potential of fungal SBCI it will be necessary to make advances in multiple areas. Improvements in sequencing methods, including long-read and single-cell technologies, will empower fungal molecular ecologists to look beyond ITS and current shotgun metagenomics approaches. Data quality and accessibility will be enhanced by attention to data and metadata standards and rigorous enforcement of policies for deposition of data and workflows. Taxonomic communities will need to develop best practices for molecular characterization in their focal clades, while also contributing to globally useful datasets including ITS. Changes to nomenclatural rules are needed to enable validPUBLICation of sequence-based taxon descriptions. Finally, cultural shifts are necessary to promote adoption of SBCI and to accord professional credit to individuals who contribute to community resources. 
    more » « less
  4. Ho, Simon (Ed.)
    Abstract Whole-genome comparisons based on average nucleotide identities (ANI) and the genome-to-genome distance calculator have risen to prominence in rapidly classifying prokaryotic taxa using whole-genome sequences. Some implementations have even been proposed as a new standard in species classification and have become a common technique for papers describing newly sequenced genomes. However, attempts to apply whole-genome divergence data to the delineation of higher taxonomic units and to phylogenetic inference have had difficulty matching those produced by more complex phylogenetic methods. We present a novel method for generating statistically supported phylogenies of archaeal and bacterial groups using a combined ANI and alignment fraction-based metric. For the test cases to which we applied the developed approach, we obtained results comparable with other methodologies up to at least the family level. The developed method uses nonparametric bootstrapping to gauge support for inferred groups. This method offers the opportunity to make use of whole-genome comparison data, that is already being generated, to quickly produce phylogenies including support for inferred groups. Additionally, the developed ANI methodology can assist the classification of higher taxonomic groups.[Average nucleotide identity (ANI); genome evolution; prokaryotic species delineation; taxonomy.] 
    more » « less
  5. A forcing function is an intervention for constraining human behavior. However, the literature describing forcing functions provides little guidance for when and how to apply forcing functions or their associated trade-offs. In this paper, we address these shortcomings by introducing a novel taxonomy of forcing functions. This taxonomy extends the previous methods in four ways. First, it identifies two levels of forcing function solidity: hard forcing functions, which explicitly enforce constraints through the system, and soft forcing functions, which convey or communicate constraints. Second, each solidity level is decomposed into specific types. Third, the taxonomy hierarchically ranks forcing function solidities and types based on trade-offs of constraint and resilience. Fourth, for hard forcing functions, our taxonomy offers formal guidance for identifying the minimally constraining intervention that will prevent a specific error from occurring. We validated the ability of our method to identify effective error interventions by applying it to systems with known errors from the literature. We then compared the solutions offered by our method to known, effective interventions. We discuss our results and offer suggestions for further developments in future research. 
    more » « less