skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Lessons from assembling UCEs : A comparison of common methods and the case of Clavinomia (Halictidae)
Abstract Sequence data assembly is a foundational step in high‐throughput sequencing, with untold consequences for downstream analyses. Despite this, few studies have interrogated the many methods for assembling phylogenomic UCE data for their comparative efficacy, or for how outputs may be impacted. We study this by comparing the most commonly used assembly methods for UCEs in the under‐studied bee lineage Nomiinae and a representative sampling of relatives. Data for 63 UCE‐only and 75 mixed taxa were assembled with five methods, including ABySS, HybPiper, SPAdes, Trinity and Velvet, and then benchmarked for their relative performance in terms of locus capture parameters and phylogenetic reconstruction. Unexpectedly, Trinity and Velvet trailed the other methods in terms of locus capture and DNA matrix density, whereas SPAdes performed favourably in most assessed metrics. In comparison with SPAdes, the guided‐assembly approach HybPiper generally recovered the highest quality loci but in lower numbers. Based on our results, we formally moveClavinomiato Dieunomiini and renderEpinomiaonce more a subgenus ofDieunomia. We strongly advise that future studies more closely examine the influence of assembly approach on their results, or, minimally, use better‐performing assembly methods such as SPAdes or HybPiper. In this way, we can move forward with phylogenomic studies in a more standardized, comparable manner.  more » « less
Award ID(s):
2127744
PAR ID:
10484908
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Molecular Ecology Resources
Volume:
24
Issue:
3
ISSN:
1755-098X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Marvaldi, Adriana (Ed.)
    Abstract Tailoring ultraconserved element (UCE) probe set design to focal taxa has been demonstrated to improve locus recovery and phylogenomic inference. However, beyond conducting expensive in vitro testing, it remains unclear how best to determine whether an existing UCE probe set is likely to suffice for phylogenomic inference or whether tailored probe design will be desirable. Here we investigate the utility of 8 different UCE probe sets for the in silico phylogenomic inference of scarabaeoid beetles. Probe sets tested differed in terms of (i) how phylogenetically distant from Scarabaeoidea taxa those used during probe design are, (ii) breadth of phylogenetic inference probe set was designed for, and (iii) method of probe design. As part of this study, 2 new UCE probe sets are produced for the beetle family Scarabaeidae and superfamily Hydrophiloidea. We confirm that probe set utility decreases with increasing phylogenetic distance from target taxa. In addition, narrowing the phylogenetic breadth of probe design decreases the phylogenetic capture range. We also confirm previous findings regarding ways to optimize UCE probe design. Finally, we make suggestions regarding assessment of need for de novo probe design. 
    more » « less
  2. Ruane, Sara (Ed.)
    Abstract Genome-scale data have the potential to clarify phylogenetic relationships across the tree of life but have also revealed extensive gene tree conflict. This seeming paradox, whereby larger data sets both increase statistical confidence and uncover significant discordance, suggests that understanding sources of conflict is important for accurate reconstruction of evolutionary history. We explore this paradox in squamate reptiles, the vertebrate clade comprising lizards, snakes, and amphisbaenians. We collected an average of 5103 loci for 91 species of squamates that span higher-level diversity within the clade, which we augmented with publicly available sequences for an additional 17 taxa. Using a locus-by-locus approach, we evaluated support for alternative topologies at 17 contentious nodes in the phylogeny. We identified shared properties of conflicting loci, finding that rate and compositional heterogeneity drives discordance between gene trees and species tree and that conflicting loci rarely overlap across contentious nodes. Finally, by comparing our tests of nodal conflict to previous phylogenomic studies, we confidently resolve 9 of the 17 problematic nodes. We suggest this locus-by-locus and node-by-node approach can build consensus on which topological resolutions remain uncertain in phylogenomic studies of other contentious groups. [Anchored hybrid enrichment (AHE); gene tree conflict; molecular evolution; phylogenomic concordance; target capture; ultraconserved elements (UCE).] 
    more » « less
  3. Abstract Several automated molecular methods have emerged for distinguishing eukaryote species based on DNA sequence data. However, there are knowledge gaps around which of these single‐locus methods is more accurate for the identification of microalgal species, such as the highly diverse and ecologically relevant diatoms. We applied genetic divergence, Automatic Barcode Gap Discovery for primary species delimitation (ABGD), Assemble Species by Automatic Partitioning (ASAP), Statistical Parsimony Network Analysis (SPNA), Generalized Mixed Yule Coalescent (GMYC) and Poisson Tree Processes (PTP) using partialcox1,rbcL,5.8S + ITS2,ITS1 + 5.8S + ITS2 markers to delineate species and compare to published polyphasic identification data (morphological features, phylogeny and sexual reproductive isolation) to test the resolution of these methods. ASAP, ABGD, SPNA and PTP models resolved species ofEunotia,Seminavis, Nitzschia, SellaphoraandPseudo‐nitzschiacorresponding to previous polyphasic identification, including reproductive isolation studies. In most cases, these models identified diatom species in similar ways, regardless of sequence fragment length. GMYC model presented smallest number of results that agreed with previous published identification. Following the recommendations for proper use of each model presented in the present study, these models can be useful tools to identify cryptic or closely related species of diatoms, even when the datasets have relatively few sequences. 
    more » « less
  4. Abstract Marker selection has emerged as an important component of phylogenomic study design due to rising concerns of the effects of gene tree estimation error, model misspecification, and data-type differences. Researchers must balance various trade-offs associated with locus length and evolutionary rate among other factors. The most commonly used reduced representation data sets for phylogenomics are ultraconserved elements (UCEs) and Anchored Hybrid Enrichment (AHE). Here, we introduce Rapidly Evolving Long Exon Capture (RELEC), a new set of loci that targets single exons that are both rapidly evolving (evolutionary rate faster than RAG1) and relatively long in length (>1,500 bp), while at the same time avoiding paralogy issues across amniotes. We compare the RELEC data set to UCEs and AHE in squamate reptiles by aligning and analyzing orthologous sequences from 17 squamate genomes, composed of 10 snakes and 7 lizards. The RELEC data set (179 loci) outperforms AHE and UCEs by maximizing per-locus genetic variation while maintaining presence and orthology across a range of evolutionary scales. RELEC markers show higher phylogenetic informativeness than UCE and AHE loci, and RELEC gene trees show greater similarity to the species tree than AHE or UCE gene trees. Furthermore, with fewer loci, RELEC remains computationally tractable for full Bayesian coalescent species tree analyses. We contrast RELEC to and discuss important aspects of comparable methods, and demonstrate how RELEC may be the most effective set of loci for resolving difficult nodes and rapid radiations. We provide several resources for capturing or extracting RELEC loci from other amniote groups. 
    more » « less
  5. Blaimer, Bonnie (Ed.)
    Abstract A rapid proliferation in the availability of whole genome sequences (WGS), often with relatively low read depth, offers an unprecedented opportunity for phylogenomic advances using publicly available data, but there are several key challenges in applying these data. Using low‐coverage WGS data for the ant species ofFormica, we conducted detailed comparisons on two different analytical pipelines (reference‐based vs. de novo genome assembly), four types of datasets (5‐kbp‐window, ultra‐conserved element [UCE], single‐copy ortholog [BUSCO] and mitogenome), and a series of analytical procedures (e.g. concatenation vs. coalescent analyses) to identify which are robust to typical WGS data. The results show that at a shallow scale of phylogenetic relationships of closely related species 5‐kbp‐windows from the reference‐based pipeline and UCEs from the de novo assemblies are more successful than the BUSCOs in recovering informative markers for phylogenetic inference. Compared with concatenation analyses, coalescent analyses often resulted in disparate deeper relationships in the phylogeny. This study also uncovers evident mito‐nuclear discordance and demonstrates genome‐wide gene conflicts in phylogenetic signals, both pointing to possible incomplete lineage sorting and/or hybridization during the early, rapid radiation ofFormicaants. Divergence dating analyses show that different types of data and analytical methods could result in inconsistent time estimates, highlighting the potential need for multiple approaches to better understand species divergence. The strengths and weaknesses of different analytical pipelines and strategies are discussed. Findings from this study provide valuable insights for large‐scale phylogenomic projects using WGS data. 
    more » « less