skip to main content


Title: Widely used, short 16S rRNA mitochondrial gene fragments yield poor and erratic results in phylogenetic estimation and species delimitation of amphibians
Abstract Background The 16S mitochondrial rRNA gene is the most widely sequenced molecular marker in amphibian systematic studies, making it comparable to the universal CO1 barcode that is more commonly used in other animal groups. However, studies employ different primer combinations that target different lengths/regions of the 16S gene ranging from complete gene sequences (~ 1500 bp) to short fragments (~ 500 bp), the latter of which is the most ubiquitously used. Sequences of different lengths are often concatenated, compared, and/or jointly analyzed to infer phylogenetic relationships, estimate genetic divergence ( p -distances), and justify the recognition of new species (species delimitation), making the 16S gene region, by far, the most influential molecular marker in amphibian systematics. Despite their ubiquitous and multifarious use, no studies have ever been conducted to evaluate the congruence and performance among the different fragment lengths. Results Using empirical data derived from both Sanger-based and genomic approaches, we show that full-length 16S sequences recover the most accurate phylogenetic relationships, highest branch support, lowest variation in genetic distances (pairwise p -distances), and best-scoring species delimitation partitions. In contrast, widely used short fragments produce inaccurate phylogenetic reconstructions, lower and more variable branch support, erratic genetic distances, and low-scoring species delimitation partitions, the numbers of which are vastly overestimated. The relatively poor performance of short 16S fragments is likely due to insufficient phylogenetic information content. Conclusions Taken together, our results demonstrate that short 16S fragments are unable to match the efficacy achieved by full-length sequences in terms of topological accuracy, heuristic branch support, genetic divergences, and species delimitation partitions, and thus, phylogenetic and taxonomic inferences that are predicated on short 16S fragments should be interpreted with caution. However, short 16S fragments can still be useful for species identification, rapid assessments, or definitively coupling complex life stages in natural history studies and faunal inventories. While the full 16S sequence performs best, it requires the use of several primer pairs that increases cost, time, and effort. As a compromise, our results demonstrate that practitioners should utilize medium-length primers in favor of the short-fragment primers because they have the potential to markedly improve phylogenetic inference and species delimitation without additional cost.  more » « less
Award ID(s):
1654388
NSF-PAR ID:
10427870
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
BMC Ecology and Evolution
Volume:
22
Issue:
1
ISSN:
2730-7182
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Gilbert, Jack A. (Ed.)
    ABSTRACT Small subunit rRNA (SSU rRNA) amplicon sequencing can quantitatively and comprehensively profile natural microbiomes, representing a critically important tool for studying diverse global ecosystems. However, results will only be accurate if PCR primers perfectly match the rRNA of all organisms present. To evaluate how well marine microorganisms across all 3 domains are detected by this method, we compared commonly used primers with >300 million rRNA gene sequences retrieved from globally distributed marine metagenomes. The best-performing primers compared to 16S rRNA of bacteria and archaea were 515Y/926R and 515Y/806RB, which perfectly matched over 96% of all sequences. Considering cyanobacterial and chloroplast 16S rRNA, 515Y/926R had the highest coverage (99%), making this set ideal for quantifying marine primary producers. For eukaryotic 18S rRNA sequences, 515Y/926R also performed best (88%), followed by V4R/V4RB (18S rRNA specific; 82%)—demonstrating that the 515Y/926R combination performs best overall for all 3 domains. Using Atlantic and Pacific Ocean samples, we demonstrate high correspondence between 515Y/926R amplicon abundances (generated for this study) and metagenomic 16S rRNA (median R 2 = 0.98, n  = 272), indicating amplicons can produce equally accurate community composition data compared with shotgun metagenomics. Our analysis also revealed that expected performance of all primer sets could be improved with minor modifications, pointing toward a nearly completely universal primer set that could accurately quantify biogeochemically important taxa in ecosystems ranging from the deep sea to the surface. In addition, our reproducible bioinformatic workflow can guide microbiome researchers studying different ecosystems or human health to similarly improve existing primers and generate more accurate quantitative amplicon data. IMPORTANCE PCR amplification and sequencing of marker genes is a low-cost technique for monitoring prokaryotic and eukaryotic microbial communities across space and time but will work optimally only if environmental organisms match PCR primer sequences exactly. In this study, we evaluated how well primers match globally distributed short-read oceanic metagenomes. Our results demonstrate that primer sets vary widely in performance, and that at least for marine systems, rRNA amplicon data from some primers lack significant biases compared to metagenomes. We also show that it is theoretically possible to create a nearly universal primer set for diverse saline environments by defining a specific mixture of a few dozen oligonucleotides, and present a software pipeline that can guide rational design of primers for any environment with available meta’omic data. 
    more » « less
  2. Abstract

    Using sequences from 2,615 ultraconserved element (UCE) loci and multiple methodologies we inferred phylogenies for the largest genetic data set of New World bats in the genus Myotis to date. The resulting phylogenetic trees were populated with short branch lengths and widespread conflict, hallmarks consistent with rapid adaptive radiations. The degree of conflict observed in Myotis has likely contributed to difficulties disentangling deeper evolutionary relationships. Unlike earlier phylogenies based on 1 to 2 gene sequences, this UCE data set places M. brandtii outside the New World clades. Introgression testing of a small subset of our samples revealed evidence of historical but not contemporary gene flow, suggesting that hybridization occurs less frequently in the Neotropics than the Nearctic. We identified several instances of cryptic lineages within described species as well as several instances of potential taxonomic oversplitting. Evidence from Central and South American localities suggests that diversity in those regions is not fully characterized. In light of the accumulated evidence of the evolutionary complexity in Myotis and our survey of the taxonomic implications from our phylogenies, it is apparent that the definition of species and regime of species delimitation need to be reevaluated for Myotis. This will require substantial collaboration and sample sharing between geneticists and taxonomists to build a system that is both robust and applicable in a genus as diverse as Myotis.

     
    more » « less
  3. Abstract Marker selection has emerged as an important component of phylogenomic study design due to rising concerns of the effects of gene tree estimation error, model misspecification, and data-type differences. Researchers must balance various trade-offs associated with locus length and evolutionary rate among other factors. The most commonly used reduced representation data sets for phylogenomics are ultraconserved elements (UCEs) and Anchored Hybrid Enrichment (AHE). Here, we introduce Rapidly Evolving Long Exon Capture (RELEC), a new set of loci that targets single exons that are both rapidly evolving (evolutionary rate faster than RAG1) and relatively long in length (>1,500 bp), while at the same time avoiding paralogy issues across amniotes. We compare the RELEC data set to UCEs and AHE in squamate reptiles by aligning and analyzing orthologous sequences from 17 squamate genomes, composed of 10 snakes and 7 lizards. The RELEC data set (179 loci) outperforms AHE and UCEs by maximizing per-locus genetic variation while maintaining presence and orthology across a range of evolutionary scales. RELEC markers show higher phylogenetic informativeness than UCE and AHE loci, and RELEC gene trees show greater similarity to the species tree than AHE or UCE gene trees. Furthermore, with fewer loci, RELEC remains computationally tractable for full Bayesian coalescent species tree analyses. We contrast RELEC to and discuss important aspects of comparable methods, and demonstrate how RELEC may be the most effective set of loci for resolving difficult nodes and rapid radiations. We provide several resources for capturing or extracting RELEC loci from other amniote groups. 
    more » « less
  4. null (Ed.)
    One of the most urgent contemporary tasks for taxonomists and evolutionary biologists is to estimate the number of species on earth. Recording alpha diversity is crucial for protecting biodiversity, especially in areas of elevated species richness, which coincide geographically with increased anthropogenic environmental pressures - the world’s so-called biodiversity hotspots. Although the distribution of Puddle frogs of the genus Occidozyga in South and Southeast Asia includes five biodiversity hotspots, the available data on phylogeny, species diversity, and biogeography are surprisingly patchy. Samples analyzed in this study were collected throughout Southeast Asia, with a primary focus on Sundaland and the Philippines. A mitochondrial gene region comprising ~ 2000 bp of 12S and 16S rRNA with intervening tRNA Valine and three nuclear loci (BDNF, NTF3, POMC) were analyzed to obtain a robust, time-calibrated phylogenetic hypothesis. We found a surprisingly high level of genetic diversity within Occidozyga, based on uncorrected p-distance values corroborated by species delimitation analyses. This extensive genetic diversity revealed 29 evolutionary lineages, defined by the > 5% uncorrected p-distance criterion for the 16S rRNA gene, suggesting that species diversity in this clade of phenotypically homogeneous forms probably has been underestimated. The comparison with results of other anuran groups leads to the assumption that anuran species diversity could still be substantially underestimated in Southeast Asia in general. Many genetically divergent lineages of frogs are phenotypically similar, indicating a tendency towards extensive morphological conservatism. We present a biogeographic reconstruction of the colonization of Sundaland and nearby islands which, together with our temporal framework, suggests that lineage diversification centered on the landmasses of the northern Sunda Shelf. This remarkably genetically structured group of amphibians could represent an exceptional case for future studies of geographical structure and diversification in a widespread anuran clade spanning some of the most pronounced geographical barriers on the planet (e.g., Wallace’s Line). Studies considering gene flow, morphology, ecological and bioacoustic data are needed to answer these questions and to test whether observed diversity of Puddle frog lineages warrants taxonomic recognition. 
    more » « less
  5. Abstract

    Contamination of a genetic sample with DNA from one or more nontarget species is a continuing concern of molecular phylogenetic studies, both Sanger sequencing studies and next-generation sequencing studies. We developed an automated pipeline for identifying and excluding likely cross-contaminated loci based on the detection of bimodal distributions of patristic distances across gene trees. When contamination occurs between samples within a data set, a comparison between a contaminated sample and its contaminant taxon will yield bimodal distributions with one peak close to zero patristic distance. This new method does not rely on a priori knowledge of taxon relatedness nor does it determine the causes(s) of the contamination. Exclusion of putatively contaminated loci from a data set generated for the insect family Cicadidae showed that these sequences were affecting some topological patterns and branch supports, although the effects were sometimes subtle, with some contamination-influenced relationships exhibiting strong bootstrap support. Long tip branches and outlier values for one anchored phylogenomic pipeline statistic (AvgNHomologs) were correlated with the presence of contamination. While the anchored hybrid enrichment markers used here, which target hemipteroid taxa, proved effective in resolving deep and shallow level Cicadidae relationships in aggregate, individual markers contained inadequate phylogenetic signal, in part probably due to short length. The cleaned data set, consisting of 429 loci, from 90 genera representing 44 of 56 current Cicadidae tribes, supported three of the four sampled Cicadidae subfamilies in concatenated-matrix maximum likelihood (ML) and multispecies coalescent-based species tree analyses, with the fourth subfamily weakly supported in the ML trees. No well-supported patterns from previous family-level Sanger sequencing studies of Cicadidae phylogeny were contradicted. One taxon (Aragualna plenalinea) did not fall with its current subfamily in the genetic tree, and this genus and its tribe Aragualnini is reclassified to Tibicininae following morphological re-examination. Only subtle differences were observed in trees after the removal of loci for which divergent base frequencies were detected. Greater success may be achieved by increased taxon sampling and developing a probe set targeting a more recent common ancestor and longer loci. Searches for contamination are an essential step in phylogenomic analyses of all kinds and our pipeline is an effective solution. [Auchenorrhyncha; base-composition bias; Cicadidae; Cicadoidea; Hemiptera; phylogenetic conflict.]

     
    more » « less