skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Incongruence in the phylogenomics era
Genome-scale amounts of data and the development of novel statistical phylogenetic 18 approaches have greatly aided the reconstruction of a broad sketch of the tree of life and resolved 19 many of its branches. However, incongruence—the inference of conflicting evolutionary histories—20 remains pervasive in phylogenomic data. We synthesize the biological and analytical factors that 21 drive incongruence, discuss methodological advances to diagnose and handle incongruence, and 22 identify avenues for future research. The study of incongruence has enabled a deeper understanding 23 of phylogenesis and improved our ability to reconstruct and interpret the tree of life.  more » « less
Award ID(s):
2110404
PAR ID:
10515137
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Nature Reviews Genetics
Date Published:
Journal Name:
Nature Reviews Genetics
Volume:
24
Issue:
12
ISSN:
1471-0056
Page Range / eLocation ID:
834 to 850
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Faircloth, Brant (Ed.)
    Abstract While some relationships in phylogenomic studies have remained stable since the Sanger sequencing era, many challenging nodes remain, even with genome-scale data. Incongruence or lack of resolution in the phylogenomic era is frequently attributed to inadequate data modeling and analytical issues that lead to systematic biases. However, few studies investigate the potential for random error or establish expectations for the level of resolution achievable with a given empirical data set and integrate uncertainties across methods when faced with conflicting results. Ants are the most species-rich lineage of social insects and one of the most ecologically important terrestrial animals. Consequently, ants have garnered significant research attention, including their systematics. Despite this, there has been no comprehensive genus-level phylogeny of the ants inferred using genomic data that thoroughly evaluates both signal strength and incongruence. In this study, we provide insight into and quantify uncertainty across the ant tree of life by utilizing the most taxonomically comprehensive ultraconserved elements data set of ants to date, including 277 (81%) of recognized ant genera from all 16 extant subfamilies, and representing over 98% of described species. We use simulations to establish expectations for resolution, identify branches with less-than-expected concordance, and dissect the effects of data and model selection on recalcitrant nodes. Simulations show that hundreds of loci are needed to resolve recalcitrant nodes on our genus-level ant phylogeny. This demonstrates the continued role of random error in phylogenomic studies. Our analyses provide a comprehensive picture of support and incongruence across the ant phylogeny, while offering a more nuanced depiction of uncertainty and significantly expanding generic sampling. We use a consensus approach to integrate uncertainty across different analyses and find that assumptions about root age exert substantial influence on divergence dating. Our results suggest that advancing the understanding of ant phylogeny will require not only more data but also more refined phylogenetic models. We also provide a workflow for identifying under-supported nodes in concatenation analyses, outline a pragmatic way to reconcile conflicting results in phylogenomics, and introduce a user-friendly locus selection tool for divergence dating. 
    more » « less
  2. Ruane, Sara (Ed.)
    Abstract Some phylogenetic problems remain unresolved even when large amounts of sequence data are analyzed and methods that accommodate processes such as incomplete lineage sorting are employed. In addition to investigating biological sources of phylogenetic incongruence, it is also important to reduce noise in the phylogenomic dataset by using appropriate filtering approach that addresses gene tree estimation errors. We present the results of a case study in manakins, focusing on the very difficult clade comprising the genera Antilophia and Chiroxiphia. Previous studies suggest that Antilophia is nested within Chiroxiphia, though relationships among Antilophia+Chiroxiphia species have been highly unstable. We extracted more than 11,000 loci (ultra-conserved elements and introns) from whole genomes and conducted analyses using concatenation and multispecies coalescent methods. Topologies resulting from analyses using all loci differed depending on the data type and analytical method, with 2 clades (Antilophia+Chiroxiphia and Manacus+Pipra+Machaeopterus) in the manakin tree showing incongruent results. We hypothesized that gene trees that conflicted with a long coalescent branch (e.g., the branch uniting Antilophia+Chiroxiphia) might be enriched for cases of gene tree estimation error, so we conducted analyses that either constrained those gene trees to include monophyly of Antilophia+Chiroxiphia or excluded these loci. While constraining trees reduced some incongruence, excluding the trees led to completely congruent species trees, regardless of the data type or model of sequence evolution used. We found that a suite of gene metrics (most importantly the number of informative sites and likelihood of intralocus recombination) collectively explained the loci that resulted in non-monophyly of Antilophia+Chiroxiphia. We also found evidence for introgression that may have contributed to the discordant topologies we observe in Antilophia+Chiroxiphia and led to deviations from expectations given the multispecies coalescent model. Our study highlights the importance of identifying factors that can obscure phylogenetic signal when dealing with recalcitrant phylogenetic problems, such as gene tree estimation error, incomplete lineage sorting, and reticulation events. [Birds; c-gene; data type; gene estimation error; model fit; multispecies coalescent; phylogenomics; reticulation] 
    more » « less
  3. Takahashi, Aya (Ed.)
    Abstract Phylogenomic analyses routinely estimate species trees using methods that account for gene tree discordance. However, the most scalable species tree inference methods, which summarize independently inferred gene trees to obtain a species tree, are sensitive to hard-to-avoid errors introduced in the gene tree estimation step. This dilemma has created much debate on the merits of concatenation versus summary methods and practical obstacles to using summary methods more widely and to the exclusion of concatenation. The most successful attempt at making summary methods resilient to noisy gene trees has been contracting low support branches from the gene trees. Unfortunately, this approach requires arbitrary thresholds and poses new challenges. Here, we introduce threshold-free weighting schemes for the quartet-based species tree inference, the metric used in the popular method ASTRAL. By reducing the impact of quartets with low support or long terminal branches (or both), weighting provides stronger theoretical guarantees and better empirical performance than the unweighted ASTRAL. Our simulations show that weighting improves accuracy across many conditions and reduces the gap with concatenation in conditions with low gene tree discordance and high noise. On empirical data, weighting improves congruence with concatenation and increases support. Together, our results show that weighting, enabled by a new optimization algorithm we introduce, improves the utility of summary methods and can reduce the incongruence often observed across analytical pipelines. 
    more » « less
  4. Townsend, Jeffrey (Ed.)
    Abstract Many evolutionary relationships remain controversial despite whole-genome sequencing data. These controversies arise, in part, due to challenges associated with accurately modeling the complex phylogenetic signal coming from genomic regions experiencing distinct evolutionary forces. Here, we examine how different regions of the genome support or contradict well-established relationships among three mammal groups using millions of orthologous parsimony-informative biallelic sites (PIBS) distributed across primate, rodent, and Pecora genomes. We compared PIBS concordance percentages among locus types (e.g. coding sequences (CDS), introns, intergenic regions), and contrasted PIBS utility over evolutionary timescales. Sites derived from noncoding sequences provided more data and proportionally more concordant sites compared with those from CDS in all clades. CDS PIBS were also predominant drivers of tree incongruence in two cases of topological conflict. PIBS derived from most locus types provided surprisingly consistent support for splitting events spread across the timescales we examined, although we find evidence that CDS and intronic PIBS may, respectively and to a limited degree, inform disproportionately about older and younger splits. In this era of accessible wholegenome sequence data, these results:1) suggest benefits to more intentionally focusing on noncoding loci as robust data for tree inference and 2) reinforce the importance of accurate modeling, especially when using CDS data. 
    more » « less
  5. The almost simultaneous emergence of major animal phyla during the early Cambrian shaped modern animal biodiversity. Reconstructing evolutionary relationships among such closely spaced branches in the animal tree of life has proven to be a major challenge, hindering understanding of early animal evolution and the fossil record. This is particularly true in the species-rich and highly varied Mollusca where dramatic inconsistency among paleontological, morphological, and molecular evidence has led to a long-standing debate about the group’s phylogeny and the nature of dozens of enigmatic fossil taxa. A critical step needed to overcome this issue is to supplement available genomic data, which is plentiful for well-studied lineages, with genomes from rare but key lineages, such as Scaphopoda. Here, by presenting chromosome-level genomes from both extant scaphopod orders and leveraging complete genomes spanning Mollusca, we provide strong support for Scaphopoda as the sister taxon of Bivalvia, revitalizing the morphology-based Diasoma hypothesis originally proposed 50 years ago. Our molecular clock analysis confidently dates the split between Bivalvia and Scaphopoda at ~520 Ma, prompting a reinterpretation of controversial laterally compressed Early Cambrian fossils, includingAnabarella,Watsonella,andMellopegma,as stem diasomes. Moreover, we show that incongruence in the phylogenetic placement of Scaphopoda in previous phylogenomic studies was due to ancient incomplete lineage sorting (ILS) that occurred during the rapid radiation of Conchifera. Our findings highlight the need to consider ILS as a potential source of error in deep phylogeny reconstruction, especially in the context of the unique nature of the Cambrian Explosion. 
    more » « less