skip to main content


Title: Maximum Parsimony Inference of Phylogenetic Networks in the Presence of Polyploid Complexes
Abstract

Phylogenetic networks provide a powerful framework for modeling and analyzing reticulate evolutionary histories. While polyploidy has been shown to be prevalent not only in plants but also in other groups of eukaryotic species, most work done thus far on phylogenetic network inference assumes diploid hybridization. These inference methods have been applied, with varying degrees of success, to data sets with polyploid species, even though polyploidy violates the mathematical assumptions underlying these methods. Statistical methods were developed recently for handling specific types of polyploids and so were parsimony methods that could handle polyploidy more generally yet while excluding processes such as incomplete lineage sorting. In this article, we introduce a new method for inferring most parsimonious phylogenetic networks on data that include polyploid species. Taking gene tree topologies as input, the method seeks a phylogenetic network that minimizes deep coalescences while accounting for polyploidy. We demonstrate the performance of the method on both simulated and biological data. The inference method as well as a method for evaluating evolutionary hypotheses in the form of phylogenetic networks are implemented and publicly available in the PhyloNet software package. [Incomplete lineage sorting; minimizing deep coalescences; multilabeled trees; multispecies network coalescent; phylogenetic networks; polyploidy.]

 
more » « less
Award ID(s):
1800723
NSF-PAR ID:
10305260
Author(s) / Creator(s):
 ;  ;  ;  ;  ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Systematic Biology
ISSN:
1063-5157
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The role of hybridization and subsequent introgression has been demonstrated in an increasing number of species. Recently, Fontaineet al. (Science, 347, 2015, 1258524) conducted a phylogenomic analysis of six members of theAnopheles gambiaespecies complex. Their analysis revealed a reticulate evolutionary history and pointed to extensive introgression on all four autosomal arms. The study further highlighted the complex evolutionary signals that the co‐occurrence of incomplete lineage sorting (ILS) and introgression can give rise to in phylogenomic analyses. While tree‐based methodologies were used in the study, phylogenetic networks provide a more natural model to capture reticulate evolutionary histories. In this work, we reanalyse theAnophelesdata using a recently devised framework that combines the multispecies coalescent with phylogenetic networks. This framework allows us to captureILSand introgression simultaneously, and forms the basis for statistical methods for inferring reticulate evolutionary histories. The new analysis reveals a phylogenetic network with multiple hybridization events, some of which differ from those reported in the original study. To elucidate the extent and patterns of introgression across the genome, we devise a new method that quantifies the use of reticulation branches in the phylogenetic network by each genomic region. Applying the method to the mosquito data set reveals the evolutionary history of all the chromosomes. This study highlights the utility of ‘network thinking’ and the new insights it can uncover, in particular in phylogenomic analyses of large data sets with extensive gene tree incongruence.

     
    more » « less
  2. Abstract

    Prokaryotic genomes are often considered to be mosaics of genes that do not necessarily share the same evolutionary history due to widespread horizontal gene transfers (HGTs). Consequently, representing evolutionary relationships of prokaryotes as bifurcating trees has long been controversial. However, studies reporting conflicts among gene trees derived from phylogenomic data sets have shown that these conflicts can be the result of artifacts or evolutionary processes other than HGT, such as incomplete lineage sorting, low phylogenetic signal, and systematic errors due to substitution model misspecification. Here, we present the results of an extensive exploration of phylogenetic conflicts in the cyanobacterial order Nostocales, for which previous studies have inferred strongly supported conflicting relationships when using different concatenated phylogenomic data sets. We found that most of these conflicts are concentrated in deep clusters of short internodes of the Nostocales phylogeny, where the great majority of individual genes have low resolving power. We then inferred phylogenetic networks to detect HGT events while also accounting for incomplete lineage sorting. Our results indicate that most conflicts among gene trees are likely due to incomplete lineage sorting linked to an ancient rapid radiation, rather than to HGTs. Moreover, the short internodes of this radiation fit the expectations of the anomaly zone, i.e., a region of the tree parameter space where a species tree is discordant with its most likely gene tree. We demonstrated that concatenation of different sets of loci can recover up to 17 distinct and well-supported relationships within the putative anomaly zone of Nostocales, corresponding to the observed conflicts among well-supported trees based on concatenated data sets from previous studies. Our findings highlight the important role of rapid radiations as a potential cause of strongly conflicting phylogenetic relationships when using phylogenomic data sets of bacteria. We propose that polytomies may be the most appropriate phylogenetic representation of these rapid radiations that are part of anomaly zones, especially when all possible genomic markers have been considered to infer these phylogenies. [Anomaly zone; bacteria; horizontal gene transfer; incomplete lineage sorting; Nostocales; phylogenomic conflict; rapid radiation; Rhizonema.]

     
    more » « less
  3. Abstract Motivation

    Reticulate evolutionary histories, such as those arising in the presence of hybridization, are best modeled as phylogenetic networks. Recently developed methods allow for statistical inference of phylogenetic networks while also accounting for other processes, such as incomplete lineage sorting. However, these methods can only handle a small number of loci from a handful of genomes.

    Results

    In this article, we introduce a novel two-step method for scalable inference of phylogenetic networks from the sequence alignments of multiple, unlinked loci. The method infers networks on subproblems and then merges them into a network on the full set of taxa. To reduce the number of trinets to infer, we formulate a Hitting Set version of the problem of finding a small number of subsets, and implement a simple heuristic to solve it. We studied their performance, in terms of both running time and accuracy, on simulated as well as on biological datasets. The two-step method accurately infers phylogenetic networks at a scale that is infeasible with existing methods. The results are a significant and promising step towards accurate, large-scale phylogenetic network inference.

    Availability and implementation

    We implemented the algorithms in the publicly available software package PhyloNet (https://bioinfocs.rice.edu/PhyloNet).

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  4. Phylogenetic networks extend phylogenetic trees to allow for modeling reticulate evolutionary processes such as hybridization. They take the shape of a rooted, directed, acyclic graph, and when parameterized with evolutionary parameters, such as divergence times and population sizes, they form a generative process of molecular sequence evolution. Early work on computational methods for phylogenetic network inference focused exclusively on reticulations and sought networks with the fewest number of reticulations to fit the data. As processes such as incomplete lineage sorting (ILS) could be at play concurrently with hybridization, work in the last decade has shifted to computational approaches for phylogenetic network inference in the presence of ILS. In such a short period, significant advances have been made on developing and implementing such computational approaches. In particular, parsimony, likelihood, and Bayesian methods have been devised for estimating phylogenetic networks and associated parameters using estimated gene trees as data. Use of those inference methods has been augmented with statistical tests for specific hypotheses of hybridization, like the D-statistic. Most recently, Bayesian approaches for inferring phylogenetic networks directly from sequence data were developed and implemented. In this chapter, we survey such advances and discuss model assumptions as well as methods’ strengths and limitations. We also discuss parallel efforts in the population genetics community aimed at inferring similar structures. Finally, we highlight major directions for future research in this area. 
    more » « less
  5. SUMMARY

    Hybridization has long been recognized as a fundamental evolutionary process in plants but, until recently, our understanding of its phylogenetic distribution and biological significance across deep evolutionary scales has been largely obscure. Over the past decade, genomic and phylogenomic datasets have revealed, perhaps not surprisingly, that hybridization, often associated with polyploidy, has been common throughout the evolutionary history of plants, particularly in various lineages of flowering plants. However, phylogenomic studies have also highlighted the challenges of disentangling signals of ancient hybridization from other sources of genomic conflict (in particular, incomplete lineage sorting). Here, we provide a critical review of ancient hybridization in vascular plants, outlining well‐documented cases of ancient hybridization across plant phylogeny, as well as the challenges unique to documenting ancient versus recent hybridization. We provide a definition for ancient hybridization, which, to our knowledge, has not been explicitly attempted before. Further documenting the extent of deep reticulation in plants should remain an important research focus, especially because published examples likely represent the tip of the iceberg in terms of the total extent of ancient hybridization. However, future research should increasingly explore the macroevolutionary significance of this process, in terms of its impact on evolutionary trajectories (e.g. how does hybridization influence trait evolution or the generation of biodiversity over long time scales?), as well as how life history and ecological factors shape, or have shaped, the frequency of hybridization across geologic time and plant phylogeny. Finally, we consider the implications of ubiquitous ancient hybridization for how we conceptualize, analyze, and classify plant phylogeny. Networks, as opposed to bifurcating trees, represent more accurate representations of evolutionary history in many cases, although our ability to infer, visualize, and use networks for comparative analyses is highly limited. Developing improved methods for the generation, visualization, and use of networks represents a critical future direction for plant biology. Current classification systems also do not generally allow for the recognition of reticulate lineages, and our classifications themselves are largely based on evidence from the chloroplast genome. Updating plant classification to better reflect nuclear phylogenies, as well as considering whether and how to recognize hybridization in classification systems, will represent an important challenge for the plant systematics community.

     
    more » « less