skip to main content

Title: Multispecies Coalescent: Theory and Applications in Phylogenetics
Species tree estimation is a basic part of many biological research projects, ranging from answering basic evolutionary questions (e.g., how did a group of species adapt to their environments?) to addressing questions in functional biology. Yet, species tree estimation is very challenging, due to processes such as incomplete lineage sorting, gene duplication and loss, horizontal gene transfer, and hybridization, which can make gene trees differ from each other and from the overall evolutionary history of the species. Over the last 10–20 years, there has been tremendous growth in methods and mathematical theory for estimating species trees and phylogenetic networks, and some of these methods are now in wide use. In this survey, we provide an overview of the current state of the art, identify the limitations of existing methods and theory, and propose additional research problems and directions.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Annual Review of Ecology, Evolution, and Systematics
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Assessing effects of gene tree error in coalescent analyses have widely ignored coalescent branch lengths (CBLs) despite their potential utility in estimating ancestral population demographics and detecting species tree anomaly zones. However, the ability of coalescent methods to obtain accurate estimates remains largely unexplored. Errors in gene trees should lead to underestimates of the true CBL, and for a given set of comparisons, longer CBLs should be more accurate. Here, we furthered our empirical understanding of how error in gene tree quality (i.e., locus informativeness and gene tree resolution) affect CBLs using four datasets comprised of ultraconserved elements (UCE) or exons for clades that exhibit wide ranges of branch lengths. For each dataset, we compared the impact of locus informativeness (assessed using number of parsimony‐informative sites) and gene tree resolution on CBL estimates. Our results, in general, showed that CBLs were drastically shorter when estimates included low informative loci. Gene tree resolution also had an impact on UCE datasets, with polytomous gene trees producing longer branches than randomly resolved gene trees. However, resolution did not appear to affect CBL estimates from the more informative exon datasets. Thus, as expected, gene tree quality affects CBL estimates, though this can generally be minimized by using moderate filtering to select more informative loci and/or by allowing polytomies in gene trees. These approaches, as well as additional contributions to improve CBL estimation, should lead to CBLs that are useful for addressing evolutionary and biological questions.

    more » « less
  2. Ponty, Yann (Ed.)
    Abstract Motivation Species delimitation, the process of deciding how to group a set of organisms into units called species, is one of the most challenging problems in computational evolutionary biology. While many methods exist for species delimitation, most based on the coalescent theory, few are scalable to very large datasets, and methods that scale tend to be not accurate. Species delimitation is closely related to species tree inference from discordant gene trees, a problem that has enjoyed rapid advances in recent years. Results In this article, we build on the accuracy and scalability of recent quartet-based methods for species tree estimation and propose a new method called SODA for species delimitation. SODA relies heavily on a recently developed method for testing zero branch length in species trees. In extensive simulations, we show that SODA can easily scale to very large datasets while maintaining high accuracy. Availability and implementation The code and data presented here are available on Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  3. Abstract Marker selection has emerged as an important component of phylogenomic study design due to rising concerns of the effects of gene tree estimation error, model misspecification, and data-type differences. Researchers must balance various trade-offs associated with locus length and evolutionary rate among other factors. The most commonly used reduced representation data sets for phylogenomics are ultraconserved elements (UCEs) and Anchored Hybrid Enrichment (AHE). Here, we introduce Rapidly Evolving Long Exon Capture (RELEC), a new set of loci that targets single exons that are both rapidly evolving (evolutionary rate faster than RAG1) and relatively long in length (>1,500 bp), while at the same time avoiding paralogy issues across amniotes. We compare the RELEC data set to UCEs and AHE in squamate reptiles by aligning and analyzing orthologous sequences from 17 squamate genomes, composed of 10 snakes and 7 lizards. The RELEC data set (179 loci) outperforms AHE and UCEs by maximizing per-locus genetic variation while maintaining presence and orthology across a range of evolutionary scales. RELEC markers show higher phylogenetic informativeness than UCE and AHE loci, and RELEC gene trees show greater similarity to the species tree than AHE or UCE gene trees. Furthermore, with fewer loci, RELEC remains computationally tractable for full Bayesian coalescent species tree analyses. We contrast RELEC to and discuss important aspects of comparable methods, and demonstrate how RELEC may be the most effective set of loci for resolving difficult nodes and rapid radiations. We provide several resources for capturing or extracting RELEC loci from other amniote groups. 
    more » « less
  4. Abstract Motivation

    Branch lengths and topology of a species tree are essential in most downstream analyses, including estimation of diversification dates, characterization of selection, understanding adaptation, and comparative genomics. Modern phylogenomic analyses often use methods that account for the heterogeneity of evolutionary histories across the genome due to processes such as incomplete lineage sorting. However, these methods typically do not generate branch lengths in units that are usable by downstream applications, forcing phylogenomic analyses to resort to alternative shortcuts such as estimating branch lengths by concatenating gene alignments into a supermatrix. Yet, concatenation and other available approaches for estimating branch lengths fail to address heterogeneity across the genome.


    In this article, we derive expected values of gene tree branch lengths in substitution units under an extension of the multispecies coalescent (MSC) model that allows substitutions with varying rates across the species tree. We present CASTLES, a new technique for estimating branch lengths on the species tree from estimated gene trees that uses these expected values, and our study shows that CASTLES improves on the most accurate prior methods with respect to both speed and accuracy.

    Availability and implementation

    CASTLES is available at

    more » « less
  5. Abstract

    Expansion of many tree species lags behind climate change projections. Extreme storms can rapidly overcome this lag, especially for coastal species, but how will storm‐driven expansion shape intraspecific genetic variation? Do storms provide recruits only from the nearest sources, or from more distant sources? Answers to these questions have ecological and evolutionary implications, but empirical evidence is absent from the literature. In 2017, Hurricane Irma provided an opportunity to address this knowledge gap at the northern range limit of the neotropical black mangrove (Avicennia germinans) on the Atlantic coast of Florida, USA. We observed massive post‐hurricane increases in beach‐strandedA. germinanspropagules at, and past, this species’ present day range margin when compared to a previously surveyed nonhurricane year. Yet, propagule dispersal does not guarantee subsequent establishment and reproductive success (i.e., effective dispersal). We also evaluated prior effective dispersal along this coastline with isolatedA. germinanstrees identified beyond the most northern established population. We used 12 nuclear microsatellite loci to genotype 896 hurricane‐driven drift propagules from nine sites and 10 isolated trees from four sites, determined their sources of origin, and estimated dispersal distances. Almost all drift propagules and all isolated trees came from the nearest sources. This research suggests that hurricanes are a prerequisite for poleward range expansion of a coastal tree species and that storms can shape the expanding gene pool by providing almost exclusively range‐margin genotypes. These insights and empirical estimates of hurricane‐driven dispersal distances should improve our ability to forecast distributional shifts of coastal species.

    more » « less