skip to main content


Title: Limitations of Phylogenomic Data Can Drive Inferred Speciation Rate Shifts
Abstract Biodiversity analyses of phylogenomic timetrees have produced many high-profile examples of shifts in the rate of speciation across the tree of life. Temporally correlated events in ecology, climate, and biogeography are frequently invoked to explain these rate shifts. In a re-examination of 15 genomic timetrees and 25 major published studies of the pattern of speciation through time, we observed an unexpected correlation between the timing of reported rate shifts and the information content of sequence alignments. Here, we show that the paucity of sequence variation and insufficient species sampling in phylogenomic data sets are the likely drivers of many inferred speciation rate shifts, rather than the proposed biological explanations. Therefore, data limitations can produce predictable but spurious signals of rate shifts even when speciation rates may be similar across taxa and time. Our results suggest that the reliable detection of speciation rate shifts requires the acquisition and assembly of long phylogenomic alignments with near-complete species sampling and accurate estimates of species richness for the clades of study.  more » « less
Award ID(s):
1932765
NSF-PAR ID:
10354842
Author(s) / Creator(s):
; ;
Editor(s):
Tamura, Koichiro
Date Published:
Journal Name:
Molecular Biology and Evolution
Volume:
39
Issue:
3
ISSN:
0737-4038
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. INTRODUCTION Resolving the role that different environmental forces may have played in the apparent explosive diversification of modern placental mammals is crucial to understanding the evolutionary context of their living and extinct morphological and genomic diversity. RATIONALE Limited access to whole-genome sequence alignments that sample living mammalian biodiversity has hampered phylogenomic inference, which until now has been limited to relatively small, highly constrained sequence matrices often representing <2% of a typical mammalian genome. To eliminate this sampling bias, we used an alignment of 241 whole genomes to comprehensively identify and rigorously analyze noncoding, neutrally evolving sequence variation in coalescent and concatenation-based phylogenetic frameworks. These analyses were followed by validation with multiple classes of phylogenetically informative structural variation. This approach enabled the generation of a robust time tree for placental mammals that evaluated age variation across hundreds of genomic loci that are not restricted by protein coding annotations. RESULTS Coalescent and concatenation phylogenies inferred from multiple treatments of the data were highly congruent, including support for higher-level taxonomic groupings that unite primates+colugos with treeshrews (Euarchonta), bats+cetartiodactyls+perissodactyls+carnivorans+pangolins (Scrotifera), all scrotiferans excluding bats (Fereuungulata), and carnivorans+pangolins with perissodactyls (Zooamata). However, because these approaches infer a single best tree, they mask signatures of phylogenetic conflict that result from incomplete lineage sorting and historical hybridization. Accordingly, we also inferred phylogenies from thousands of noncoding loci distributed across chromosomes with historically contrasting recombination rates. Throughout the radiation of modern orders (such as rodents, primates, bats, and carnivores), we observed notable differences between locus trees inferred from the autosomes and the X chromosome, a pattern typical of speciation with gene flow. We show that in many cases, previously controversial phylogenetic relationships can be reconciled by examining the distribution of conflicting phylogenetic signals along chromosomes with variable historical recombination rates. Lineage divergence time estimates were notably uniform across genomic loci and robust to extensive sensitivity analyses in which the underlying data, fossil constraints, and clock models were varied. The earliest branching events in the placental phylogeny coincide with the breakup of continental landmasses and rising sea levels in the Late Cretaceous. This signature of allopatric speciation is congruent with the low genomic conflict inferred for most superordinal relationships. By contrast, we observed a second pulse of diversification immediately after the Cretaceous-Paleogene (K-Pg) extinction event superimposed on an episode of rapid land emergence. Greater geographic continuity coupled with tumultuous climatic changes and increased ecological landscape at this time provided enhanced opportunities for mammalian diversification, as depicted in the fossil record. These observations dovetail with increased phylogenetic conflict observed within clades that diversified in the Cenozoic. CONCLUSION Our genome-wide analysis of multiple classes of sequence variation provides the most comprehensive assessment of placental mammal phylogeny, resolves controversial relationships, and clarifies the timing of mammalian diversification. We propose that the combination of Cretaceous continental fragmentation and lineage isolation, followed by the direct and indirect effects of the K-Pg extinction at a time of rapid land emergence, synergistically contributed to the accelerated diversification rate of placental mammals during the early Cenozoic. The timing of placental mammal evolution. Superordinal mammalian diversification took place in the Cretaceous during periods of continental fragmentation and sea level rise with little phylogenomic discordance (pie charts: left, autosomes; right, X chromosome), which is consistent with allopatric speciation. By contrast, the Paleogene hosted intraordinal diversification in the aftermath of the K-Pg mass extinction event, when clades exhibited higher phylogenomic discordance consistent with speciation with gene flow and incomplete lineage sorting. 
    more » « less
  2. Battistuzzi, Fabia Ursula (Ed.)
    Abstract The Molecular Evolutionary Genetics Analysis (MEGA) software has matured to contain a large collection of methods and tools of computational molecular evolution. Here, we describe new additions that make MEGA a more comprehensive tool for building timetrees of species, pathogens, and gene families using rapid relaxed-clock methods. Methods for estimating divergence times and confidence intervals are implemented to use probability densities for calibration constraints for node-dating and sequence sampling dates for tip-dating analyses. They are supported by new options for tagging sequences with spatiotemporal sampling information, an expanded interactive Node Calibrations Editor, and an extended Tree Explorer to display timetrees. Also added is a Bayesian method for estimating neutral evolutionary probabilities of alleles in a species using multispecies sequence alignments and a machine learning method to test for the autocorrelation of evolutionary rates in phylogenies. The computer memory requirements for the maximum likelihood analysis are reduced significantly through reprogramming, and the graphical user interface has been made more responsive and interactive for very big data sets. These enhancements will improve the user experience, quality of results, and the pace of biological discovery. Natively compiled graphical user interface and command-line versions of MEGA11 are available for Microsoft Windows, Linux, and macOS from www.megasoftware.net. 
    more » « less
  3. Abstract

    Despite the obstacles facing marine colonists, most lineages of aquatic organisms have colonized and diversified in freshwaters repeatedly. These transitions can trigger rapid morphological or physiological change and, on longer timescales, lead to increased rates of speciation and extinction. Diatoms are a lineage of ancestrally marine microalgae that have diversified throughout freshwater habitats worldwide. We generated a phylogenomic data set of genomes and transcriptomes for 59 diatom taxa to resolve freshwater transitions in one lineage, the Thalassiosirales. Although most parts of the species tree were consistently resolved with strong support, we had difficulties resolving a Paleocene radiation, which affected the placement of one freshwater lineage. This and other parts of the tree were characterized by high levels of gene tree discordance caused by incomplete lineage sorting and low phylogenetic signal. Despite differences in species trees inferred from concatenation versus summary methods and codons versus amino acids, traditional methods of ancestral state reconstruction supported six transitions into freshwaters, two of which led to subsequent species diversification. Evidence from gene trees, protein alignments, and diatom life history together suggest that habitat transitions were largely the product of homoplasy rather than hemiplasy, a condition where transitions occur on branches in gene trees not shared with the species tree. Nevertheless, we identified a set of putatively hemiplasious genes, many of which have been associated with shifts to low salinity, indicating that hemiplasy played a small but potentially important role in freshwater adaptation. Accounting for differences in evolutionary outcomes, in which some taxa became locked into freshwaters while others were able to return to the ocean or become salinity generalists, might help further distinguish different sources of adaptive mutation in freshwater diatoms.

     
    more » « less
  4. Abstract

    Gametic isolation is thought to play an important role in the evolution of reproductive isolation in broadcast‐spawning marine invertebrates. However, it is unclear whether gametic isolation commonly evolves early in the speciation process or only accumulates after other reproductive barriers are already in place. It is also unknown whether gametic isolation is an effective barrier to introgression following speciation. Here, we used whole‐genome sequencing data and multiple complementary phylogenomic approaches to test whether the well‐documented gametic incompatibilities among the strongylocentrotid sea urchins have limited introgression. We quantified phylogenetic discordance, inferred reticulate phylogenetic networks, and applied theΔstatistic using gene tree topologies reconstructed from multiple sequence alignments of protein‐coding single‐copy orthologs. In addition, we conducted ABBA–BABA tests on genome‐wide single nucleotide variants and reconstructed a phylogeny of mitochondrial genomes. Our results revealed strong mito‐nuclear discordance and considerable nonrandom gene tree discordance that cannot be explained by incomplete lineage sorting alone. Eight of the nine species examined demonstrated a history of introgression with at least one other species or ancestral lineage, indicating that introgression was common during the diversification of the strongylocentrotid urchins. There was strong support for introgression between four extant species pairs (Strongylocentrotus pallidusS. droebachiensis,S. intermediusS. pallidus,S. purpuratusS. fragilis, andMesocentrotus franciscanusPseudocentrotus depressus) and additional evidence for introgression on internal branches of the phylogeny. Our results suggest that the existing gametic incompatibilities among the strongylocentrotid urchin species have not been a complete barrier to hybridization and introgression following speciation. Their continued divergence in the face of widespread introgression indicates that other reproductive isolating barriers likely exist and may have been more critical in establishing reproductive isolation early in speciation.

     
    more » « less
  5. Smith, Stephen (Ed.)
    Abstract Understanding how gene flow affects population divergence and speciation remains challenging. Differentiating one evolutionary process from another can be difficult because multiple processes can produce similar patterns, and more than one process can occur simultaneously. Although simple population models produce predictable results, how these processes balance in taxa with patchy distributions and complicated natural histories is less certain. These types of populations might be highly connected through migration (gene flow), but can experience stronger effects of genetic drift and inbreeding, or localized selection. Although different signals can be difficult to separate, the application of high-throughput sequence data can provide the resolution necessary to distinguish many of these processes. We present whole-genome sequence data for an avian species group with an alpine and arctic tundra distribution to examine the role that different population genetic processes have played in their evolutionary history. Rosy-finches inhabit high elevation mountaintop sky islands and high-latitude island and continental tundra. They exhibit extensive plumage variation coupled with low levels of genetic variation. Additionally, the number of species within the complex is debated, making them excellent for studying the forces involved in the process of diversification, as well as an important species group in which to investigate species boundaries. Total genomic variation suggests a broadly continuous pattern of allele frequency changes across the mainland taxa of this group in North America. However, phylogenomic analyses recover multiple distinct, well supported, groups that coincide with previously described morphological variation and current species-level taxonomy. Tests of introgression using D-statistics and approximate Bayesian computation reveal significant levels of introgression between multiple North American taxa. These results provide insight into the balance between divergent and homogenizing population genetic processes and highlight remaining challenges in interpreting conflict between different types of analytical approaches with whole-genome sequence data. [ABBA-BABA; approximate Bayesian computation; gene flow; phylogenomics; speciation; whole-genome sequencing.] 
    more » « less