skip to main content


Title: Phylogenetic Signal of Indels and the Neoavian Radiation
The early radiation of Neoaves has been hypothesized to be an intractable “hard polytomy”. We explore the fundamental properties of insertion/deletion alleles (indels), an under-utilized form of genomic data with the potential to help solve this. We scored >5 million indels from >7000 pan-genomic intronic and ultraconserved element (UCE) loci in 48 representatives of all neoavian orders. We found that intronic and UCE indels exhibited less homoplasy than nucleotide (nt) data. Gene trees estimated using indel data were less resolved than those estimated using nt data. Nevertheless, Accurate Species TRee Algorithm (ASTRAL) species trees estimated using indels were generally similar to nt-based ASTRAL trees, albeit with lower support. However, the power of indel gene trees became clear when we combined them with nt gene trees, including a striking result for UCEs. The individual UCE indel and nt ASTRAL trees were incongruent with each other and with the intron ASTRAL trees; however, the combined indel+nt ASTRAL tree was much more congruent with the intronic trees. Finally, combining indel and nt data for both introns and UCEs provided sufficient power to reduce the scope of the polytomy that was previously proposed for several supraordinal lineages of Neoaves.  more » « less
Award ID(s):
1655683
PAR ID:
10112970
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Diversity
Volume:
11
Issue:
7
ISSN:
1424-2818
Page Range / eLocation ID:
108
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Wiegmann, Brian (Ed.)
    Abstract Ultraconserved genomic elements (UCEs) are generally treated as independent loci in phylogenetic analyses. The identification pipeline for UCE probes does not require prior knowledge of genetic identity, only selecting loci that are highly conserved, single copy, without repeats, and of a particular length. Here, we characterized UCEs from 11 phylogenomic studies across the animal tree of life, from birds to marine invertebrates. We found that within vertebrate lineages, UCEs are mostly intronic and intergenic, while in invertebrates, the majority are in exons. We then curated four different sets of UCE markers by genomic category from five different studies including: birds, mammals, fish, Hymenoptera (ants, wasps, and bees), and Coleoptera (beetles). Of genes captured by UCEs, we find that many are represented by two or more UCEs, corresponding to nonoverlapping segments of a single gene. We considered these UCEs to be nonindependent, merged all UCEs that belonged to a particular gene, constructed gene and species trees, and then evaluated the subsequent effect of merging cogenic UCEs on gene and species tree reconstruction. Average bootstrap support for merged UCE gene trees was significantly improved across all data sets apparently driven by the increase in loci length. Additionally, we conducted simulations and found that gene trees generated from merged UCEs were more accurate than those generated by unmerged UCEs. As loci length improves gene tree accuracy, this modest degree of UCE characterization and curation impacts downstream analyses and demonstrates the advantages of incorporating basic genomic characterizations into phylogenomic analyses. [Anchored hybrid enrichment; ants; ASTRAL; bait capture; carangimorph; Coleoptera; conserved nonexonic elements; exon capture; gene tree; Hymenoptera; mammal; phylogenomic markers; songbird; species tree; ultraconserved elements; weevils.] 
    more » « less
  2. Assessing the applicability of theory to major adaptive radiations in deep time represents an extremely difficult problem in evolutionary biology. Neoaves, which includes 95% of living birds, is believed to have undergone a period of rapid diversification roughly coincident with the Cretaceous–Paleogene (K-Pg) boundary. We investigate whether basal neoavian lineages experienced an ecological release in response to ecological opportunity, as evidenced by density compensation. We estimated effective population sizes (Ne) of basal neoavian lineages by combining coalescent branch lengths (CBLs) and the numbers of generations between successive divergences. We used a modified version of Accurate Species TRee Algorithm (ASTRAL) to estimate CBLs directly from insertion–deletion (indel) data, as well as from gene trees using DNA sequence and/or indel data. We found that some divergences near the K-Pg boundary involved unexpectedly high gene tree discordance relative to the estimated number of generations between speciation events. The simplest explanation for this result is an increase in Ne, despite the caveats discussed herein. It appears that at least some early neoavian lineages, similar to the ancestor of the clade comprising doves, mesites, and sandgrouse, experienced ecological release near the time of the K-Pg mass extinction. 
    more » « less
  3. Abstract Marker selection has emerged as an important component of phylogenomic study design due to rising concerns of the effects of gene tree estimation error, model misspecification, and data-type differences. Researchers must balance various trade-offs associated with locus length and evolutionary rate among other factors. The most commonly used reduced representation data sets for phylogenomics are ultraconserved elements (UCEs) and Anchored Hybrid Enrichment (AHE). Here, we introduce Rapidly Evolving Long Exon Capture (RELEC), a new set of loci that targets single exons that are both rapidly evolving (evolutionary rate faster than RAG1) and relatively long in length (>1,500 bp), while at the same time avoiding paralogy issues across amniotes. We compare the RELEC data set to UCEs and AHE in squamate reptiles by aligning and analyzing orthologous sequences from 17 squamate genomes, composed of 10 snakes and 7 lizards. The RELEC data set (179 loci) outperforms AHE and UCEs by maximizing per-locus genetic variation while maintaining presence and orthology across a range of evolutionary scales. RELEC markers show higher phylogenetic informativeness than UCE and AHE loci, and RELEC gene trees show greater similarity to the species tree than AHE or UCE gene trees. Furthermore, with fewer loci, RELEC remains computationally tractable for full Bayesian coalescent species tree analyses. We contrast RELEC to and discuss important aspects of comparable methods, and demonstrate how RELEC may be the most effective set of loci for resolving difficult nodes and rapid radiations. We provide several resources for capturing or extracting RELEC loci from other amniote groups. 
    more » « less
  4. Thorne, Jeffrey (Ed.)
    Abstract Phylogenetic inference from genome-wide data (phylogenomics) has revolutionized the study of evolution because it enables accounting for discordance among evolutionary histories across the genome. To this end, summary methods have been developed to allow accurate and scalable inference of species trees from gene trees. However, most of these methods, including the widely used ASTRAL, can only handle single-copy gene trees and do not attempt to model gene duplication and gene loss. As a result, most phylogenomic studies have focused on single-copy genes and have discarded large parts of the data. Here, we first propose a measure of quartet similarity between single-copy and multicopy trees that accounts for orthology and paralogy. We then introduce a method called ASTRAL-Pro (ASTRAL for PaRalogs and Orthologs) to find the species tree that optimizes our quartet similarity measure using dynamic programing. By studying its performance on an extensive collection of simulated data sets and on real data sets, we show that ASTRAL-Pro is more accurate than alternative methods. 
    more » « less
  5. Takahashi, Aya (Ed.)
    Abstract Phylogenomic analyses routinely estimate species trees using methods that account for gene tree discordance. However, the most scalable species tree inference methods, which summarize independently inferred gene trees to obtain a species tree, are sensitive to hard-to-avoid errors introduced in the gene tree estimation step. This dilemma has created much debate on the merits of concatenation versus summary methods and practical obstacles to using summary methods more widely and to the exclusion of concatenation. The most successful attempt at making summary methods resilient to noisy gene trees has been contracting low support branches from the gene trees. Unfortunately, this approach requires arbitrary thresholds and poses new challenges. Here, we introduce threshold-free weighting schemes for the quartet-based species tree inference, the metric used in the popular method ASTRAL. By reducing the impact of quartets with low support or long terminal branches (or both), weighting provides stronger theoretical guarantees and better empirical performance than the unweighted ASTRAL. Our simulations show that weighting improves accuracy across many conditions and reduces the gap with concatenation in conditions with low gene tree discordance and high noise. On empirical data, weighting improves congruence with concatenation and increases support. Together, our results show that weighting, enabled by a new optimization algorithm we introduce, improves the utility of summary methods and can reduce the incongruence often observed across analytical pipelines. 
    more » « less