skip to main content

This content will become publicly available on March 1, 2023

Title: Limitations of Phylogenomic Data Can Drive Inferred Speciation Rate Shifts
Abstract Biodiversity analyses of phylogenomic timetrees have produced many high-profile examples of shifts in the rate of speciation across the tree of life. Temporally correlated events in ecology, climate, and biogeography are frequently invoked to explain these rate shifts. In a re-examination of 15 genomic timetrees and 25 major published studies of the pattern of speciation through time, we observed an unexpected correlation between the timing of reported rate shifts and the information content of sequence alignments. Here, we show that the paucity of sequence variation and insufficient species sampling in phylogenomic data sets are the likely drivers of many inferred speciation rate shifts, rather than the proposed biological explanations. Therefore, data limitations can produce predictable but spurious signals of rate shifts even when speciation rates may be similar across taxa and time. Our results suggest that the reliable detection of speciation rate shifts requires the acquisition and assembly of long phylogenomic alignments with near-complete species sampling and accurate estimates of species richness for the clades of study.
Authors:
; ;
Editors:
Tamura, Koichiro
Award ID(s):
1932765
Publication Date:
NSF-PAR ID:
10354842
Journal Name:
Molecular Biology and Evolution
Volume:
39
Issue:
3
ISSN:
0737-4038
Sponsoring Org:
National Science Foundation
More Like this
  1. Battistuzzi, Fabia Ursula (Ed.)
    Abstract The Molecular Evolutionary Genetics Analysis (MEGA) software has matured to contain a large collection of methods and tools of computational molecular evolution. Here, we describe new additions that make MEGA a more comprehensive tool for building timetrees of species, pathogens, and gene families using rapid relaxed-clock methods. Methods for estimating divergence times and confidence intervals are implemented to use probability densities for calibration constraints for node-dating and sequence sampling dates for tip-dating analyses. They are supported by new options for tagging sequences with spatiotemporal sampling information, an expanded interactive Node Calibrations Editor, and an extended Tree Explorer to display timetrees. Also added is a Bayesian method for estimating neutral evolutionary probabilities of alleles in a species using multispecies sequence alignments and a machine learning method to test for the autocorrelation of evolutionary rates in phylogenies. The computer memory requirements for the maximum likelihood analysis are reduced significantly through reprogramming, and the graphical user interface has been made more responsive and interactive for very big data sets. These enhancements will improve the user experience, quality of results, and the pace of biological discovery. Natively compiled graphical user interface and command-line versions of MEGA11 are available for Microsoft Windows, Linux, andmore »macOS from www.megasoftware.net.« less
  2. Smith, Stephen (Ed.)
    Abstract Understanding how gene flow affects population divergence and speciation remains challenging. Differentiating one evolutionary process from another can be difficult because multiple processes can produce similar patterns, and more than one process can occur simultaneously. Although simple population models produce predictable results, how these processes balance in taxa with patchy distributions and complicated natural histories is less certain. These types of populations might be highly connected through migration (gene flow), but can experience stronger effects of genetic drift and inbreeding, or localized selection. Although different signals can be difficult to separate, the application of high-throughput sequence data can provide the resolution necessary to distinguish many of these processes. We present whole-genome sequence data for an avian species group with an alpine and arctic tundra distribution to examine the role that different population genetic processes have played in their evolutionary history. Rosy-finches inhabit high elevation mountaintop sky islands and high-latitude island and continental tundra. They exhibit extensive plumage variation coupled with low levels of genetic variation. Additionally, the number of species within the complex is debated, making them excellent for studying the forces involved in the process of diversification, as well as an important species group in which to investigatemore »species boundaries. Total genomic variation suggests a broadly continuous pattern of allele frequency changes across the mainland taxa of this group in North America. However, phylogenomic analyses recover multiple distinct, well supported, groups that coincide with previously described morphological variation and current species-level taxonomy. Tests of introgression using D-statistics and approximate Bayesian computation reveal significant levels of introgression between multiple North American taxa. These results provide insight into the balance between divergent and homogenizing population genetic processes and highlight remaining challenges in interpreting conflict between different types of analytical approaches with whole-genome sequence data. [ABBA-BABA; approximate Bayesian computation; gene flow; phylogenomics; speciation; whole-genome sequencing.]« less
  3. Abstract Estimating multiple sequence alignments (MSAs) and inferring phylogenies are essential for many aspects of comparative biology. Yet, many bioinformatics tools for such analyses have focused on specific clades, with greatest attention paid to plants, animals, and fungi. The rapid increase in high-throughput sequencing (HTS) data from diverse lineages now provides opportunities to estimate evolutionary relationships and gene family evolution across the eukaryotic tree of life. At the same time, these types of data are known to be error-prone (e.g., substitutions, contamination). To address these opportunities and challenges, we have refined a phylogenomic pipeline, now named PhyloToL, to allow easy incorporation of data from HTS studies, to automate production of both MSAs and gene trees, and to identify and remove contaminants. PhyloToL is designed for phylogenomic analyses of diverse lineages across the tree of life (i.e., at scales of >100 My). We demonstrate the power of PhyloToL by assessing stop codon usage in Ciliophora, identifying contamination in a taxon- and gene-rich database and exploring the evolutionary history of chromosomes in the kinetoplastid parasite Trypanosoma brucei, the causative agent of African sleeping sickness. Benchmarking PhyloToL’s homology assessment against that of OrthoMCL and a published paper on superfamilies of bacterial and eukaryoticmore »organellar outer membrane pore-forming proteins demonstrates the power of our approach for determining gene family membership and inferring gene trees. PhyloToL is highly flexible and allows users to easily explore HTS data, test hypotheses about phylogeny and gene family evolution and combine outputs with third-party tools (e.g., PhyloChromoMap, iGTP).« less
  4. The regions of the Andes and Caribbean-Mesoamerica are both hypothesized to be the cradle for many Neotropical lineages, but few studies have fully investigated the dynamics and interactions between Neotropical bioregions. The New World hawkmoth genus Xylophanes is the most taxonomically diverse genus in the Sphingidae, with the highest endemism and richness in the Andes and Caribbean-Mesoamerica. We integrated phylogenomic and DNA barcode data and generated the first time-calibrated tree for this genus, covering 93.8% of the species diversity. We used event-based likelihood ancestral area estimation and biogeographic stochastic mapping to examine the speciation and dispersal dynamics of Xylophanes across bioregions. We also used trait-dependent diversification models to compare speciation and extinction rates of lineages associated with different bioregions. Our results indicate that Xylophanes originated in Caribbean-Mesoamerica in the Late Miocene, and immediately diverged into five major clades. The current species diversity and distribution of Xylophanes can be explained by two consecutive phases. In the first phase, the highest Xylophanes speciation and emigration rates occurred in the Caribbean-Mesoamerica, and the highest immigration rates occurred in the Andes, whereas in the second phase the highest immigration rates were found in Amazonia, and the Andes had the highest speciation and emigration rates.
  5. Abstract Current phylogenomic approaches implicitly assume that the predominant phylogenetic signal within a genome reflects the true evolutionary history of organisms, without assessing the confounding effects of postspeciation gene flow that can produce a mosaic of phylogenetic signals that interact with recombinational variation. Here, we tested the validity of this assumption with a phylogenomic analysis of 27 species of the cat family, assessing local effects of recombination rate on species tree inference and divergence time estimation across their genomes. We found that the prevailing phylogenetic signal within the autosomes is not always representative of the most probable speciation history, due to ancient hybridization throughout felid evolution. Instead, phylogenetic signal was concentrated within regions of low recombination, and notably enriched within large X chromosome recombination cold spots that exhibited recurrent patterns of strong genetic differentiation and selective sweeps across mammalian orders. By contrast, regions of high recombination were enriched for signatures of ancient gene flow, and these sequences inflated crown-lineage divergence times by ∼40%. We conclude that existing phylogenomic approaches to infer the Tree of Life may be highly misleading without considering the genomic architecture of phylogenetic signal relative to recombination rate and its interplay with historical hybridization.