- Award ID(s):
- 1654388
- NSF-PAR ID:
- 10427870
- Date Published:
- Journal Name:
- BMC Ecology and Evolution
- Volume:
- 22
- Issue:
- 1
- ISSN:
- 2730-7182
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Gilbert, Jack A. (Ed.)ABSTRACT Small subunit rRNA (SSU rRNA) amplicon sequencing can quantitatively and comprehensively profile natural microbiomes, representing a critically important tool for studying diverse global ecosystems. However, results will only be accurate if PCR primers perfectly match the rRNA of all organisms present. To evaluate how well marine microorganisms across all 3 domains are detected by this method, we compared commonly used primers with >300 million rRNA gene sequences retrieved from globally distributed marine metagenomes. The best-performing primers compared to 16S rRNA of bacteria and archaea were 515Y/926R and 515Y/806RB, which perfectly matched over 96% of all sequences. Considering cyanobacterial and chloroplast 16S rRNA, 515Y/926R had the highest coverage (99%), making this set ideal for quantifying marine primary producers. For eukaryotic 18S rRNA sequences, 515Y/926R also performed best (88%), followed by V4R/V4RB (18S rRNA specific; 82%)—demonstrating that the 515Y/926R combination performs best overall for all 3 domains. Using Atlantic and Pacific Ocean samples, we demonstrate high correspondence between 515Y/926R amplicon abundances (generated for this study) and metagenomic 16S rRNA (median R 2 = 0.98, n = 272), indicating amplicons can produce equally accurate community composition data compared with shotgun metagenomics. Our analysis also revealed that expected performance of all primer sets could be improved with minor modifications, pointing toward a nearly completely universal primer set that could accurately quantify biogeochemically important taxa in ecosystems ranging from the deep sea to the surface. In addition, our reproducible bioinformatic workflow can guide microbiome researchers studying different ecosystems or human health to similarly improve existing primers and generate more accurate quantitative amplicon data. IMPORTANCE PCR amplification and sequencing of marker genes is a low-cost technique for monitoring prokaryotic and eukaryotic microbial communities across space and time but will work optimally only if environmental organisms match PCR primer sequences exactly. In this study, we evaluated how well primers match globally distributed short-read oceanic metagenomes. Our results demonstrate that primer sets vary widely in performance, and that at least for marine systems, rRNA amplicon data from some primers lack significant biases compared to metagenomes. We also show that it is theoretically possible to create a nearly universal primer set for diverse saline environments by defining a specific mixture of a few dozen oligonucleotides, and present a software pipeline that can guide rational design of primers for any environment with available meta’omic data.more » « less
-
Abstract Using sequences from 2,615 ultraconserved element (UCE) loci and multiple methodologies we inferred phylogenies for the largest genetic data set of New World bats in the genus Myotis to date. The resulting phylogenetic trees were populated with short branch lengths and widespread conflict, hallmarks consistent with rapid adaptive radiations. The degree of conflict observed in Myotis has likely contributed to difficulties disentangling deeper evolutionary relationships. Unlike earlier phylogenies based on 1 to 2 gene sequences, this UCE data set places M. brandtii outside the New World clades. Introgression testing of a small subset of our samples revealed evidence of historical but not contemporary gene flow, suggesting that hybridization occurs less frequently in the Neotropics than the Nearctic. We identified several instances of cryptic lineages within described species as well as several instances of potential taxonomic oversplitting. Evidence from Central and South American localities suggests that diversity in those regions is not fully characterized. In light of the accumulated evidence of the evolutionary complexity in Myotis and our survey of the taxonomic implications from our phylogenies, it is apparent that the definition of species and regime of species delimitation need to be reevaluated for Myotis. This will require substantial collaboration and sample sharing between geneticists and taxonomists to build a system that is both robust and applicable in a genus as diverse as Myotis.
-
Abstract Marker selection has emerged as an important component of phylogenomic study design due to rising concerns of the effects of gene tree estimation error, model misspecification, and data-type differences. Researchers must balance various trade-offs associated with locus length and evolutionary rate among other factors. The most commonly used reduced representation data sets for phylogenomics are ultraconserved elements (UCEs) and Anchored Hybrid Enrichment (AHE). Here, we introduce Rapidly Evolving Long Exon Capture (RELEC), a new set of loci that targets single exons that are both rapidly evolving (evolutionary rate faster than RAG1) and relatively long in length (>1,500 bp), while at the same time avoiding paralogy issues across amniotes. We compare the RELEC data set to UCEs and AHE in squamate reptiles by aligning and analyzing orthologous sequences from 17 squamate genomes, composed of 10 snakes and 7 lizards. The RELEC data set (179 loci) outperforms AHE and UCEs by maximizing per-locus genetic variation while maintaining presence and orthology across a range of evolutionary scales. RELEC markers show higher phylogenetic informativeness than UCE and AHE loci, and RELEC gene trees show greater similarity to the species tree than AHE or UCE gene trees. Furthermore, with fewer loci, RELEC remains computationally tractable for full Bayesian coalescent species tree analyses. We contrast RELEC to and discuss important aspects of comparable methods, and demonstrate how RELEC may be the most effective set of loci for resolving difficult nodes and rapid radiations. We provide several resources for capturing or extracting RELEC loci from other amniote groups.more » « less
-
null (Ed.)One of the most urgent contemporary tasks for taxonomists and evolutionary biologists is to estimate the number of species on earth. Recording alpha diversity is crucial for protecting biodiversity, especially in areas of elevated species richness, which coincide geographically with increased anthropogenic environmental pressures - the world’s so-called biodiversity hotspots. Although the distribution of Puddle frogs of the genus Occidozyga in South and Southeast Asia includes five biodiversity hotspots, the available data on phylogeny, species diversity, and biogeography are surprisingly patchy. Samples analyzed in this study were collected throughout Southeast Asia, with a primary focus on Sundaland and the Philippines. A mitochondrial gene region comprising ~ 2000 bp of 12S and 16S rRNA with intervening tRNA Valine and three nuclear loci (BDNF, NTF3, POMC) were analyzed to obtain a robust, time-calibrated phylogenetic hypothesis. We found a surprisingly high level of genetic diversity within Occidozyga, based on uncorrected p-distance values corroborated by species delimitation analyses. This extensive genetic diversity revealed 29 evolutionary lineages, defined by the > 5% uncorrected p-distance criterion for the 16S rRNA gene, suggesting that species diversity in this clade of phenotypically homogeneous forms probably has been underestimated. The comparison with results of other anuran groups leads to the assumption that anuran species diversity could still be substantially underestimated in Southeast Asia in general. Many genetically divergent lineages of frogs are phenotypically similar, indicating a tendency towards extensive morphological conservatism. We present a biogeographic reconstruction of the colonization of Sundaland and nearby islands which, together with our temporal framework, suggests that lineage diversification centered on the landmasses of the northern Sunda Shelf. This remarkably genetically structured group of amphibians could represent an exceptional case for future studies of geographical structure and diversification in a widespread anuran clade spanning some of the most pronounced geographical barriers on the planet (e.g., Wallace’s Line). Studies considering gene flow, morphology, ecological and bioacoustic data are needed to answer these questions and to test whether observed diversity of Puddle frog lineages warrants taxonomic recognition.more » « less
-
Abstract Contamination of a genetic sample with DNA from one or more nontarget species is a continuing concern of molecular phylogenetic studies, both Sanger sequencing studies and next-generation sequencing studies. We developed an automated pipeline for identifying and excluding likely cross-contaminated loci based on the detection of bimodal distributions of patristic distances across gene trees. When contamination occurs between samples within a data set, a comparison between a contaminated sample and its contaminant taxon will yield bimodal distributions with one peak close to zero patristic distance. This new method does not rely on a priori knowledge of taxon relatedness nor does it determine the causes(s) of the contamination. Exclusion of putatively contaminated loci from a data set generated for the insect family Cicadidae showed that these sequences were affecting some topological patterns and branch supports, although the effects were sometimes subtle, with some contamination-influenced relationships exhibiting strong bootstrap support. Long tip branches and outlier values for one anchored phylogenomic pipeline statistic (AvgNHomologs) were correlated with the presence of contamination. While the anchored hybrid enrichment markers used here, which target hemipteroid taxa, proved effective in resolving deep and shallow level Cicadidae relationships in aggregate, individual markers contained inadequate phylogenetic signal, in part probably due to short length. The cleaned data set, consisting of 429 loci, from 90 genera representing 44 of 56 current Cicadidae tribes, supported three of the four sampled Cicadidae subfamilies in concatenated-matrix maximum likelihood (ML) and multispecies coalescent-based species tree analyses, with the fourth subfamily weakly supported in the ML trees. No well-supported patterns from previous family-level Sanger sequencing studies of Cicadidae phylogeny were contradicted. One taxon (Aragualna plenalinea) did not fall with its current subfamily in the genetic tree, and this genus and its tribe Aragualnini is reclassified to Tibicininae following morphological re-examination. Only subtle differences were observed in trees after the removal of loci for which divergent base frequencies were detected. Greater success may be achieved by increased taxon sampling and developing a probe set targeting a more recent common ancestor and longer loci. Searches for contamination are an essential step in phylogenomic analyses of all kinds and our pipeline is an effective solution. [Auchenorrhyncha; base-composition bias; Cicadidae; Cicadoidea; Hemiptera; phylogenetic conflict.]