skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Polyphest: fast polyploid phylogeny estimation
Abstract MotivationDespite the widespread occurrence of polyploids across the Tree of Life, especially in the plant kingdom, very few computational methods have been developed to handle the specific complexities introduced by polyploids in phylogeny estimation. Furthermore, methods that are designed to account for polyploidy often disregard incomplete lineage sorting (ILS), a major source of heterogeneous gene histories, or are computationally very demanding. Therefore, there is a great need for efficient and robust methods to accurately reconstruct polyploid phylogenies. ResultsWe introduce Polyphest (POLYploid PHylogeny ESTimation), a new method for efficiently and accurately inferring species phylogenies in the presence of both polyploidy and ILS. Polyphest bypasses the need for extensive network space searches by first generating a multilabeled tree based on gene trees, which is then converted into a (uniquely labeled) species phylogeny. We compare the performance of Polyphest to that of two polyploid phylogeny estimation methods, one of which does not account for ILS, namely PADRE, and another that accounts for ILS, namely MPAllopp. Polyphest is more accurate than PADRE and achieves comparable accuracy to MPAllopp, while being significantly faster. We also demonstrate the application of Polyphest to empirical data from the hexaploid bread wheat and confirm the allopolyploid origin of bread wheat along with the closest relatives for each of its subgenomes. Availability and implementationPolyphest is available at https://github.com/NakhlehLab/Polyphest.  more » « less
Award ID(s):
1800723
PAR ID:
10543929
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
40
Issue:
Supplement_2
ISSN:
1367-4803
Page Range / eLocation ID:
ii20 to ii28
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Holder, Mark (Ed.)
    Abstract Phylogenetic networks provide a powerful framework for modeling and analyzing reticulate evolutionary histories. While polyploidy has been shown to be prevalent not only in plants but also in other groups of eukaryotic species, most work done thus far on phylogenetic network inference assumes diploid hybridization. These inference methods have been applied, with varying degrees of success, to data sets with polyploid species, even though polyploidy violates the mathematical assumptions underlying these methods. Statistical methods were developed recently for handling specific types of polyploids and so were parsimony methods that could handle polyploidy more generally yet while excluding processes such as incomplete lineage sorting. In this article, we introduce a new method for inferring most parsimonious phylogenetic networks on data that include polyploid species. Taking gene tree topologies as input, the method seeks a phylogenetic network that minimizes deep coalescences while accounting for polyploidy. We demonstrate the performance of the method on both simulated and biological data. The inference method as well as a method for evaluating evolutionary hypotheses in the form of phylogenetic networks are implemented and publicly available in the PhyloNet software package. [Incomplete lineage sorting; minimizing deep coalescences; multilabeled trees; multispecies network coalescent; phylogenetic networks; polyploidy.] 
    more » « less
  2. Summary Recently formed allopolyploid species offer unprecedented insights into the early stages of polyploid evolution. This review examines seven well‐studied neopolyploids (we use ‘neopolyploid’ to refer to very recently formed polyploids, i.e. during the past 300 years), spanning different angiosperm families, exploring commonalities and differences in their evolutionary trajectories. Each neopolyploid provides a unique case study, demonstrating both shared patterns, such as rapid genomic and phenotypic changes, and unique responses to hybridization and genome doubling. While previous studies of these neopolyploids have improved our understanding of polyploidy, significant knowledge gaps remain, highlighting the need for further research into the varied impacts of whole‐genome duplication on gene expression, epigenetic modifications, and ecological interactions. Notably, all of these neopolyploids have spontaneously arisen due to human activity in natural environments, underscoring the profound consequences of polyploidization in a rapidly changing world. Understanding the immediate effects of polyploidy is crucial not only for evolutionary biology but also for applied practices, as polyploidy can lead to novel traits, as well as stress tolerance and increased crop yields. Future research directions include investigating the genetic and epigenetic mechanisms underlying polyploid evolution, as well as exploring the potential of neopolyploids for crop improvement and environmental adaptation. 
    more » « less
  3. Polyploidy is widely acknowledged to have played an important role in the evolution and diversification of vascular plants. However, the influence of genome duplication on population-level dynamics and its cascading effects at the community level remain unclear. In part, this is due to persistent uncertainties over the extent of polyploid phenotypic variation, and the interactions between polyploids and co-occurring species, and highlights the need to integrate polyploid research at the population and community level. Here, we investigate how community-level patterns of phylogenetic relatedness might influence escape from minority cytotype exclusion, a classic population genetics hypothesis about polyploid establishment, and population-level species interactions. Focusing on two plant families in which polyploidy has evolved multiple times, Brassicaceae and Rosaceae, we build upon the hypothesis that the greater allelic and phenotypic diversity of polyploids allow them to successfully inhabit a different geographic range compared to their diploid progenitor and close relatives. Using a phylogenetic framework, we specifically test (1) whether polyploid species are more distantly related to diploids within the same community than co-occurring diploids are to one another, and (2) if polyploid species tend to exhibit greater ecological success than diploids, using species abundance in communities as an indicator of successful establishment. Overall, our results suggest that the effects of genome duplication on community structure are not clear-cut. We find that polyploid species tend to be more distantly related to co-occurring diploids than diploids are to each other. However, we do not find a consistent pattern of polyploid species being more abundant than diploid species, suggesting polyploids are not uniformly more ecologically successful than diploids. While polyploidy appears to have some important influences on species co-occurrence in Brassicaceae and Rosaceae communities, our study highlights the paucity of available geographically explicit data on intraspecific ploidal variation. The increased use of high-throughput methods to identify ploidal variation, such as flow cytometry and whole genome sequencing, will greatly aid our understanding of how such a widespread, radical genomic mutation influences the evolution of species and those around them. 
    more » « less
  4. With the increased availability of sequence data and even of fully sequenced and assembled genomes, phylogeny estimation of very large trees (even of hundreds of thousands of sequences) is now a goal for some biologists. Yet, the construction of these phylogenies is a complex pipeline presenting analytical and computational challenges, especially when the number of sequences is very large. In the past few years, new methods have been developed that aim to enable highly accurate phylogeny estimations on these large datasets, including divide-and-conquer techniques for multiple sequence alignment and/or tree estimation, methods that can estimate species trees from multi-locus datasets while addressing heterogeneity due to biological processes (e.g. incomplete lineage sorting and gene duplication and loss), and methods to add sequences into large gene trees or species trees. Here we present some of these recent advances and discuss opportunities for future improvements. This article is part of a discussion meeting issue ‘Genomic population structures of microbial pathogens’. 
    more » « less
  5. Polyploidy, or whole-genome duplication, is expected to confound the inference of species trees with phyloge- netic methods for two reasons. First, the presence of retained duplicated genes requires the reconciliation of the inferred gene trees to a proposed species tree. Second, even if the analyses are restricted to shared single copy genes, the occurrence of reciprocal gene loss, where the surviving genes in different species are paralogs from the polyploidy rather than orthologs, will mean that such genes will not have evolved under the corresponding species tree and may not produce gene trees that allow inference of that species tree. Here we analyze three different ancient polyploidy events, using synteny-based inferences of orthology and paralogy to infer gene trees from nearly 17,000 sets of homologous genes. We find that the simple use of single copy genes from polyploid organisms provides reasonably robust phylogenetic signals, despite the presence of reciprocal gene losses. Such gene trees are also most often in accord with the inferred species relationships inferred from maximum likelihood models of gene loss after polyploidy: a completely distinct phylogenetic signal present in these genomes. As seen in other studies, however, we find that methods for inferring phylogenetic confidence yield high support values even in cases where the underlying data suggest meaningful conflict in the phylogenetic signals. 
    more » « less