skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Lessons from assembling UCEs : A comparison of common methods and the case of Clavinomia (Halictidae)
Abstract Sequence data assembly is a foundational step in high‐throughput sequencing, with untold consequences for downstream analyses. Despite this, few studies have interrogated the many methods for assembling phylogenomic UCE data for their comparative efficacy, or for how outputs may be impacted. We study this by comparing the most commonly used assembly methods for UCEs in the under‐studied bee lineage Nomiinae and a representative sampling of relatives. Data for 63 UCE‐only and 75 mixed taxa were assembled with five methods, including ABySS, HybPiper, SPAdes, Trinity and Velvet, and then benchmarked for their relative performance in terms of locus capture parameters and phylogenetic reconstruction. Unexpectedly, Trinity and Velvet trailed the other methods in terms of locus capture and DNA matrix density, whereas SPAdes performed favourably in most assessed metrics. In comparison with SPAdes, the guided‐assembly approach HybPiper generally recovered the highest quality loci but in lower numbers. Based on our results, we formally moveClavinomiato Dieunomiini and renderEpinomiaonce more a subgenus ofDieunomia. We strongly advise that future studies more closely examine the influence of assembly approach on their results, or, minimally, use better‐performing assembly methods such as SPAdes or HybPiper. In this way, we can move forward with phylogenomic studies in a more standardized, comparable manner.  more » « less
Award ID(s):
2127744
PAR ID:
10484908
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Molecular Ecology Resources
Volume:
24
Issue:
3
ISSN:
1755-098X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Marvaldi, Adriana (Ed.)
    Abstract Tailoring ultraconserved element (UCE) probe set design to focal taxa has been demonstrated to improve locus recovery and phylogenomic inference. However, beyond conducting expensive in vitro testing, it remains unclear how best to determine whether an existing UCE probe set is likely to suffice for phylogenomic inference or whether tailored probe design will be desirable. Here we investigate the utility of 8 different UCE probe sets for the in silico phylogenomic inference of scarabaeoid beetles. Probe sets tested differed in terms of (i) how phylogenetically distant from Scarabaeoidea taxa those used during probe design are, (ii) breadth of phylogenetic inference probe set was designed for, and (iii) method of probe design. As part of this study, 2 new UCE probe sets are produced for the beetle family Scarabaeidae and superfamily Hydrophiloidea. We confirm that probe set utility decreases with increasing phylogenetic distance from target taxa. In addition, narrowing the phylogenetic breadth of probe design decreases the phylogenetic capture range. We also confirm previous findings regarding ways to optimize UCE probe design. Finally, we make suggestions regarding assessment of need for de novo probe design. 
    more » « less
  2. Ruane, Sara (Ed.)
    Abstract Genome-scale data have the potential to clarify phylogenetic relationships across the tree of life but have also revealed extensive gene tree conflict. This seeming paradox, whereby larger data sets both increase statistical confidence and uncover significant discordance, suggests that understanding sources of conflict is important for accurate reconstruction of evolutionary history. We explore this paradox in squamate reptiles, the vertebrate clade comprising lizards, snakes, and amphisbaenians. We collected an average of 5103 loci for 91 species of squamates that span higher-level diversity within the clade, which we augmented with publicly available sequences for an additional 17 taxa. Using a locus-by-locus approach, we evaluated support for alternative topologies at 17 contentious nodes in the phylogeny. We identified shared properties of conflicting loci, finding that rate and compositional heterogeneity drives discordance between gene trees and species tree and that conflicting loci rarely overlap across contentious nodes. Finally, by comparing our tests of nodal conflict to previous phylogenomic studies, we confidently resolve 9 of the 17 problematic nodes. We suggest this locus-by-locus and node-by-node approach can build consensus on which topological resolutions remain uncertain in phylogenomic studies of other contentious groups. [Anchored hybrid enrichment (AHE); gene tree conflict; molecular evolution; phylogenomic concordance; target capture; ultraconserved elements (UCE).] 
    more » « less
  3. Abstract Marker selection has emerged as an important component of phylogenomic study design due to rising concerns of the effects of gene tree estimation error, model misspecification, and data-type differences. Researchers must balance various trade-offs associated with locus length and evolutionary rate among other factors. The most commonly used reduced representation data sets for phylogenomics are ultraconserved elements (UCEs) and Anchored Hybrid Enrichment (AHE). Here, we introduce Rapidly Evolving Long Exon Capture (RELEC), a new set of loci that targets single exons that are both rapidly evolving (evolutionary rate faster than RAG1) and relatively long in length (>1,500 bp), while at the same time avoiding paralogy issues across amniotes. We compare the RELEC data set to UCEs and AHE in squamate reptiles by aligning and analyzing orthologous sequences from 17 squamate genomes, composed of 10 snakes and 7 lizards. The RELEC data set (179 loci) outperforms AHE and UCEs by maximizing per-locus genetic variation while maintaining presence and orthology across a range of evolutionary scales. RELEC markers show higher phylogenetic informativeness than UCE and AHE loci, and RELEC gene trees show greater similarity to the species tree than AHE or UCE gene trees. Furthermore, with fewer loci, RELEC remains computationally tractable for full Bayesian coalescent species tree analyses. We contrast RELEC to and discuss important aspects of comparable methods, and demonstrate how RELEC may be the most effective set of loci for resolving difficult nodes and rapid radiations. We provide several resources for capturing or extracting RELEC loci from other amniote groups. 
    more » « less
  4. Abstract Several automated molecular methods have emerged for distinguishing eukaryote species based on DNA sequence data. However, there are knowledge gaps around which of these single‐locus methods is more accurate for the identification of microalgal species, such as the highly diverse and ecologically relevant diatoms. We applied genetic divergence, Automatic Barcode Gap Discovery for primary species delimitation (ABGD), Assemble Species by Automatic Partitioning (ASAP), Statistical Parsimony Network Analysis (SPNA), Generalized Mixed Yule Coalescent (GMYC) and Poisson Tree Processes (PTP) using partialcox1,rbcL,5.8S + ITS2,ITS1 + 5.8S + ITS2 markers to delineate species and compare to published polyphasic identification data (morphological features, phylogeny and sexual reproductive isolation) to test the resolution of these methods. ASAP, ABGD, SPNA and PTP models resolved species ofEunotia,Seminavis, Nitzschia, SellaphoraandPseudo‐nitzschiacorresponding to previous polyphasic identification, including reproductive isolation studies. In most cases, these models identified diatom species in similar ways, regardless of sequence fragment length. GMYC model presented smallest number of results that agreed with previous published identification. Following the recommendations for proper use of each model presented in the present study, these models can be useful tools to identify cryptic or closely related species of diatoms, even when the datasets have relatively few sequences. 
    more » « less
  5. Wiegmann, Brian (Ed.)
    Abstract Ultraconserved genomic elements (UCEs) are generally treated as independent loci in phylogenetic analyses. The identification pipeline for UCE probes does not require prior knowledge of genetic identity, only selecting loci that are highly conserved, single copy, without repeats, and of a particular length. Here, we characterized UCEs from 11 phylogenomic studies across the animal tree of life, from birds to marine invertebrates. We found that within vertebrate lineages, UCEs are mostly intronic and intergenic, while in invertebrates, the majority are in exons. We then curated four different sets of UCE markers by genomic category from five different studies including: birds, mammals, fish, Hymenoptera (ants, wasps, and bees), and Coleoptera (beetles). Of genes captured by UCEs, we find that many are represented by two or more UCEs, corresponding to nonoverlapping segments of a single gene. We considered these UCEs to be nonindependent, merged all UCEs that belonged to a particular gene, constructed gene and species trees, and then evaluated the subsequent effect of merging cogenic UCEs on gene and species tree reconstruction. Average bootstrap support for merged UCE gene trees was significantly improved across all data sets apparently driven by the increase in loci length. Additionally, we conducted simulations and found that gene trees generated from merged UCEs were more accurate than those generated by unmerged UCEs. As loci length improves gene tree accuracy, this modest degree of UCE characterization and curation impacts downstream analyses and demonstrates the advantages of incorporating basic genomic characterizations into phylogenomic analyses. [Anchored hybrid enrichment; ants; ASTRAL; bait capture; carangimorph; Coleoptera; conserved nonexonic elements; exon capture; gene tree; Hymenoptera; mammal; phylogenomic markers; songbird; species tree; ultraconserved elements; weevils.] 
    more » « less