skip to main content

Title: Maximum Parsimony Inference of Phylogenetic Networks in the Presence of Polyploid Complexes
Abstract

Phylogenetic networks provide a powerful framework for modeling and analyzing reticulate evolutionary histories. While polyploidy has been shown to be prevalent not only in plants but also in other groups of eukaryotic species, most work done thus far on phylogenetic network inference assumes diploid hybridization. These inference methods have been applied, with varying degrees of success, to data sets with polyploid species, even though polyploidy violates the mathematical assumptions underlying these methods. Statistical methods were developed recently for handling specific types of polyploids and so were parsimony methods that could handle polyploidy more generally yet while excluding processes such as incomplete lineage sorting. In this article, we introduce a new method for inferring most parsimonious phylogenetic networks on data that include polyploid species. Taking gene tree topologies as input, the method seeks a phylogenetic network that minimizes deep coalescences while accounting for polyploidy. We demonstrate the performance of the method on both simulated and biological data. The inference method as well as a method for evaluating evolutionary hypotheses in the form of phylogenetic networks are implemented and publicly available in the PhyloNet software package. [Incomplete lineage sorting; minimizing deep coalescences; multilabeled trees; multispecies network coalescent; phylogenetic networks; polyploidy.]

Authors:
 ;  ;  ;  ;  ;
Award ID(s):
1800723
Publication Date:
NSF-PAR ID:
10305260
Journal Name:
Systematic Biology
ISSN:
1063-5157
Publisher:
Oxford University Press
Sponsoring Org:
National Science Foundation
More Like this
  1. Phylogenetic networks extend phylogenetic trees to allow for modeling reticulate evolutionary processes such as hybridization. They take the shape of a rooted, directed, acyclic graph, and when parameterized with evolutionary parameters, such as divergence times and population sizes, they form a generative process of molecular sequence evolution. Early work on computational methods for phylogenetic network inference focused exclusively on reticulations and sought networks with the fewest number of reticulations to fit the data. As processes such as incomplete lineage sorting (ILS) could be at play concurrently with hybridization, work in the last decade has shifted to computational approaches for phylogenetic network inference in the presence of ILS. In such a short period, significant advances have been made on developing and implementing such computational approaches. In particular, parsimony, likelihood, and Bayesian methods have been devised for estimating phylogenetic networks and associated parameters using estimated gene trees as data. Use of those inference methods has been augmented with statistical tests for specific hypotheses of hybridization, like the D-statistic. Most recently, Bayesian approaches for inferring phylogenetic networks directly from sequence data were developed and implemented. In this chapter, we survey such advances and discuss model assumptions as well as methods’ strengths and limitations.more »We also discuss parallel efforts in the population genetics community aimed at inferring similar structures. Finally, we highlight major directions for future research in this area.« less
  2. Background

    The páramo ecosystem, located above the timberline in the tropical Andes, has been the setting for some of the most dramatic plant radiations, and it is one of the world’s fastest evolving and most diverse high-altitude ecosystems. Today 144+ species of frailejones (subtribe Espeletiinae Cuatrec., Asteraceae) dominate the páramo. Frailejones have intrigued naturalists and botanists, not just for their appealing beauty and impressive morphological diversity, but also for their remarkable adaptations to the extremely harsh environmental conditions of the páramo. Previous attempts to reconstruct the evolutionary history of this group failed to resolve relationships among genera and species, and there is no agreement regarding the classification of the group. Thus, our goal was to reconstruct the phylogeny of the frailejones and to test the influence of the geography on it as a first step to understanding the patterns of radiation of these plants.

    Methods

    Field expeditions in 70 páramos of Colombia and Venezuela resulted in 555 collected samples from 110 species. Additional material was obtained from herbarium specimens. Sequence data included nrDNA (ITS and ETS) and cpDNA (rpl16), for an aligned total of 2,954 bp. Fragment analysis was performed with AFLP data using 28 primer combinations and yielding 1,665 fragments. Phylogeniesmore »based on sequence data were reconstructed under maximum parsimony, maximum likelihood and Bayesian inference. The AFLP dataset employed minimum evolution analyses. A Monte Carlo permutation test was used to infer the influence of the geography on the phylogeny.

    Results

    Phylogenies reconstructed suggest that most genera are paraphyletic, but the phylogenetic signal may be misled by hybridization and incomplete lineage sorting. A tree with all the available molecular data shows two large clades: one of primarily Venezuelan species that includes a few neighboring Colombian species; and a second clade of only Colombian species. Results from the Monte Carlo permutation test suggests a very strong influence of the geography on the phylogenetic relationships. Venezuelan páramos tend to hold taxa that are more distantly-related to each other than Colombian páramos, where taxa are more closely-related to each other.

    Conclusions

    Our data suggest the presence of two independent radiations: one in Venezuela and the other in Colombia. In addition, the current generic classification will need to be deeply revised. Analyses show a strong geographic structure in the phylogeny, with large clades grouped in hotspots of diversity at a regional scale, and in páramo localities at a local scale. Differences in the degrees of relatedness between sympatric species of Venezuelan and Colombian páramos may be explained because of the younger age of the latter páramos, and the lesser time for speciation of Espeletiinae in them.

    « less
  3. Abstract Motivation Reticulate evolutionary histories, such as those arising in the presence of hybridization, are best modeled as phylogenetic networks. Recently developed methods allow for statistical inference of phylogenetic networks while also accounting for other processes, such as incomplete lineage sorting. However, these methods can only handle a small number of loci from a handful of genomes. Results In this article, we introduce a novel two-step method for scalable inference of phylogenetic networks from the sequence alignments of multiple, unlinked loci. The method infers networks on subproblems and then merges them into a network on the full set of taxa. To reduce the number of trinets to infer, we formulate a Hitting Set version of the problem of finding a small number of subsets, and implement a simple heuristic to solve it. We studied their performance, in terms of both running time and accuracy, on simulated as well as on biological datasets. The two-step method accurately infers phylogenetic networks at a scale that is infeasible with existing methods. The results are a significant and promising step towards accurate, large-scale phylogenetic network inference. Availability and implementation We implemented the algorithms in the publicly available software package PhyloNet (https://bioinfocs.rice.edu/PhyloNet). Supplementary information Supplementarymore »data are available at Bioinformatics online.« less
  4. Genome-scale data have greatly facilitated the resolution of recalcitrant nodes that Sanger-based datasets have been unable to resolve. However, phylogenomic studies continue to use traditional methods such as bootstrapping to estimate branch support; and high bootstrap values are still interpreted as providing strong support for the correct topology. Furthermore, relatively little attention has been given to assessing discordances between gene and species trees, and the underlying processes that produce phylogenetic conflict. We generated novel genomic datasets to characterize and determine the causes of discordance in Old World treefrogs (Family: Rhacophoridae)—a group that is fraught with conflicting and poorly supported topologies among major clades. Additionally, a suite of data filtering strategies and analytical methods were applied to assess their impact on phylogenetic inference. We showed that incomplete lineage sorting was detected at all nodes that exhibited high levels of discordance. Those nodes were also associated with extremely short internal branches. We also clearly demonstrate that bootstrap values do not reflect uncertainty or confidence for the correct topology and, hence, should not be used as a measure of branch support in phylogenomic datasets. Overall, we showed that phylogenetic discordances in Old World treefrogs resulted from incomplete lineage sorting and that species treemore »inference can be improved using a multi-faceted, total-evidence approach, which uses the most amount of data and considers results from different analytical methods and datasets.« less
  5. Abstract Phylogenomic data from a rapidly increasing number of studies provide new evidence for resolving relationships in recently radiated clades, but they also pose new challenges for inferring evolutionary histories. Most existing methods for reconstructing phylogenetic hypotheses rely solely on algorithms that only consider incomplete lineage sorting (ILS) as a cause of intra- or intergenomic discordance. Here, we utilize a variety of methods, including those to infer phylogenetic networks, to account for both ILS and introgression as a cause for nuclear and cytoplasmic-nuclear discordance using phylogenomic data from the recently radiated flowering plant genus Polemonium (Polemoniaceae), an ecologically diverse genus in Western North America with known and suspected gene flow between species. We find evidence for widespread discordance among nuclear loci that can be explained by both ILS and reticulate evolution in the evolutionary history of Polemonium. Furthermore, the histories of organellar genomes show strong discordance with the inferred species tree from the nuclear genome. Discordance between the nuclear and plastid genome is not completely explained by ILS, and only one case of discordance is explained by detected introgression events. Our results suggest that multiple processes have been involved in the evolutionary history of Polemonium and that the plastid genomemore »does not accurately reflect species relationships. We discuss several potential causes for this cytoplasmic-nuclear discordance, which emerging evidence suggests is more widespread across the Tree of Life than previously thought. [Cyto-nuclear discordance, genomic discordance, phylogenetic networks, plastid capture, Polemoniaceae, Polemonium, reticulations.]« less