skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Median quartet tree search algorithms using optimal subtree prune and regraft
Abstract Gene trees can be different from the species tree due to biological processes and inference errors. One way to obtain a species tree is to find one that maximizes some measure of similarity to a set of gene trees. The number of shared quartets between a potential species tree and gene trees provides a statistically justifiable score; if maximized properly, it could result in a statistically consistent estimator of the species tree under several statistical models of discordance. However, finding the median quartet score tree, one that maximizes this score, is NP-Hard, motivating several existing heuristic algorithms. These heuristics do not follow the hill-climbing paradigm used extensively in phylogenetics. In this paper, we make theoretical contributions that enable an efficient hill-climbing approach. Specifically, we show that a subtree of sizemcan be placed optimally on a tree of sizenin quasi-linear time with respect tonand (almost) independently ofm. This result enables us to perform subtree prune and regraft (SPR) rearrangements as part of a hill-climbing search. We show that this approach can slightly improve upon the results of widely-used methods such as ASTRAL in terms of the optimization score but not necessarily accuracy.  more » « less
Award ID(s):
1845967
PAR ID:
10510814
Author(s) / Creator(s):
;
Publisher / Repository:
Springer Nature
Date Published:
Journal Name:
Algorithms for Molecular Biology
Volume:
19
Issue:
1
ISSN:
1748-7188
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation Phylogenomics faces a dilemma: on the one hand, most accurate species and gene tree estimation methods are those that co-estimate them; on the other hand, these co-estimation methods do not scale to moderately large numbers of species. The summary-based methods, which first infer gene trees independently and then combine them, are much more scalable but are prone to gene tree estimation error, which is inevitable when inferring trees from limited-length data. Gene tree estimation error is not just random noise and can create biases such as long-branch attraction. Results We introduce a scalable likelihood-based approach to co-estimation under the multi-species coalescent model. The method, called quartet co-estimation (QuCo), takes as input independently inferred distributions over gene trees and computes the most likely species tree topology and internal branch length for each quartet, marginalizing over gene tree topologies and ignoring branch lengths by making several simplifying assumptions. It then updates the gene tree posterior probabilities based on the species tree. The focus on gene tree topologies and the heuristic division to quartets enables fast likelihood calculations. We benchmark our method with extensive simulations for quartet trees in zones known to produce biased species trees and further with larger trees. We also run QuCo on a biological dataset of bees. Our results show better accuracy than the summary-based approach ASTRAL run on estimated gene trees. Availability and implementation QuCo is available on https://github.com/maryamrabiee/quco. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  2. Tang, H. (Ed.)
    Rooted species trees are used in several downstream applications of phylogenetics. Most species tree estimation methods produce unrooted trees and additional methods are then used to root these unrooted trees. Recently, Quintet Rooting (QR) (Tabatabaee et al., ISMB and Bioinformatics 2022), a polynomial-time method for rooting an unrooted species tree given unrooted gene trees under the multispecies coalescent, was introduced. QR, which is based on a proof of identifiability of rooted 5-taxon trees in the presence of incomplete lineage sorting, was shown to have good accuracy, improving over other methods for rooting species trees when incomplete lineage sorting was the only cause of gene tree discordance, except when gene tree estimation error was very high. However, the statistical consistency of QR was left as an open question. Here, we present QR-STAR, a polynomial-time variant of QR that has an additional step for determining the rooted shape of each quintet tree. We prove that QR-STAR is statistically consistent under the multispecies coalescent model, and our simulation study shows that QR-STAR matches or improves on the accuracy of QR. QR-STAR is available in open source form at https://github.com/ytabatabaee/Quintet-Rooting. 
    more » « less
  3. Remnant trees and forest fragments in agricultural landscapes can be important sources of propagules to facilitate forest recovery. However, many studies simply quantify forest cover in the surrounding landscape as a percentage, with little attention given to species composition, and subsequently fail to detect an effect on recruitment patterns. We assessed the relative importance of the spatial distribution and life-history traits of 77 tree species on recruitment patterns at a landscape scale in a well-replicated long-term restoration study in southern Costa Rica. We censused and mapped potential mother trees in a 100-m buffer surrounding eight replicate restoration plots and quantified respective tree recruits within each plot. We assessed how mother tree abundance, species life-history characteristics (seed size, dispersal mode), tree size (DBH, height) and distance to restoration plot affected recruitment at coarse (plot: 50 × 50 m) and fine (quadrat: 3 × 3 m) spatial scales. The presence of a mother tree within 100 m of a restoration plot resulted in a 10-fold increase in potential mean recruitment. Mother tree abundance was also an important driver of recruit density, and particularly so for large-seeded (≥5 mm) zoochorous species with a fivefold increase in recruit density across the observed mother tree abundance range. An interaction between mother tree abundance and proximity demonstrated that the effect of mother tree abundance on recruit density was important but waned with increasing distance from restoration plots. At the fine spatial scale, proximity was uniformly important; height and DBH of the closest potential mother tree also affected recruit abundance but responses differed by seed size. Results highlight the importance of remnant vegetation composition to the recovery of adjacent degraded habitats, underscoring the outsized role nearby remnant forest and isolated trees can play for the persistence of localized biodiversity. See Manuscript 
    more » « less
  4. BackgroundAnalyses of microbial evolution often use reconciliation methods. However, the standard duplication-transfer-loss (DTL) model does not account for the fact that species trees are often not fully sampled and thus, from the perspective of reconciliation, a gene family may enter the species tree from the outside. Moreover, within the genome, genes are often rearranged, causing them to move to new syntenic regions. ResultsWe extend the DTL model to account for two events that commonly arise in the evolution of microbes:originof a gene from outside the sampled species tree andrearrangementof gene syntenic regions. We describe an efficient algorithm for maximum parsimony reconciliation in this new DTLOR model and then show how it can be extended to account for non-binary gene trees to handle uncertainty in gene tree topologies. Finally, we describe preliminary experimental results from the integration of our algorithm into the existing xenoGI tool for reconstructing the histories of genomic islands in closely related bacteria. ConclusionsReconciliation in the DTLOR model can offer new insights into the evolution of microbes that is not currently possible under the DTL model. 
    more » « less
  5. ABSTRACT The application of high‐throughput sequencing to phylogenetic analyses is allowing authors to reconstruct the true evolutionary history of species. This work can illuminate specific mechanisms underlying divergence when combined with analyses of gene flow, recombination and selection. We conducted a phylogenomic analysis ofCatharus, a songbird genus with considerable potential for gene flow, variation in migratory behaviour and genomic resources. We documented discordance among trees constructed for mitochondrial, autosomal and sex (Z) chromosome partitions. Two trees were recovered on the Z. Both trees differed from the autosomes, one matched the mitochondria, and the other was unique to the Z. Gene flow with one species likely generated much of this discordance; substantial admixture betweenustulatusand the remaining species was documented and linked to at least two historic events. The tree unique to the Z likely reflects the true history ofCatharus; local genomic analyses recovered the same tree in autosomal regions with reduced admixture and recombination. Genes previously connected to migration were enriched in these regions suggesting transitions between migratory and non‐migratory states helped generate divergence. Migratory (vs. nonmigratory)Catharusformed a monophyletic clade in a subset of genomic regions. Gene flow was elevated in some of these regions suggesting adaptive introgression may have occurred, but the dominant pattern was of balancing selection maintaining ancestral polymorphisms important for olfaction and perhaps, by extension, adaptation to temperate climates. This work illuminates the evolutionary history of an important model in speciation and demonstrates how differential resistance to gene flow can affect local genomic patterns. 
    more » « less