NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Inferring Phylogenetic Trees of Cancer Evolution from Longitudinal Single-Cell Copy Number Profiles

https://doi.org/10.1007/978-3-031-94928-9_2

Liu, Yushu; Nakhleh, Luay (September 2025, Springer Nature Switzerland)

Free, publicly-accessible full text available September 1, 2026
tMHG-Finder: Tree-Guided Maximal Homologous Group Finder for Bacterial Genomes

https://doi.org/10.1007/978-3-031-94928-9_6

Yin, Yongze; Kille, Bryce; Ogilvie, Huw A; Treangen, Todd J; Nakhleh, Luay (September 2025, Springer Nature Switzerland)

Free, publicly-accessible full text available September 1, 2026
A survey of computational approaches for characterizing microbial interactions in microbial mats

https://doi.org/10.1186/s13059-025-03634-2

Perillo, Vanesa_L; Nute, Michael; Sapoval, Nicolae; Curry, Kristen_D; Golia, Logan; Yin, Yongze; Ogilvie, Huw_A; Nakhleh, Luay; Segarra, Santiago; Bhaya, Devaki; et al (June 2025, Genome Biology)
NestedBD: Bayesian inference of phylogenetic trees from single-cell copy number profiles under a birth-death model

https://doi.org/10.1186/s13015-024-00264-4

Liu, Yushu; Edrisi, Mohammadamin; Yan, Zhi; Ogilvie, Huw; Nakhleh, Luay (December 2024, Algorithms for Molecular Biology)

Abstract Copy number aberrations (CNAs) are ubiquitous in many types of cancer. Inferring CNAs from cancer genomic data could help shed light on the initiation, progression, and potential treatment of cancer. While such data have traditionally been available via “bulk sequencing,” the more recently introduced techniques for single-cell DNA sequencing (scDNAseq) provide the type of data that makes CNA inference possible at the single-cell resolution. We introduce a new birth-death evolutionary model of CNAs and a Bayesian method, NestedBD, for the inference of evolutionary trees (topologies and branch lengths with relative mutation rates) from single-cell data. We evaluated NestedBD’s performance using simulated data sets, benchmarking its accuracy against traditional phylogenetic tools as well as state-of-the-art methods. The results show that NestedBD infers more accurate topologies and branch lengths, and that the birth-death model can improve the accuracy of copy number estimation. And when applied to biological data sets, NestedBD infers plausible evolutionary histories of two colorectal cancer samples. NestedBD is available athttps://github.com/Androstane/NestedBD.
more » « less
Full Text Available
Polyphest: fast polyploid phylogeny estimation

https://doi.org/10.1093/bioinformatics/btae390

Yan, Zhi; Cao, Zhen; Nakhleh, Luay (September 2024, Bioinformatics)

Abstract MotivationDespite the widespread occurrence of polyploids across the Tree of Life, especially in the plant kingdom, very few computational methods have been developed to handle the specific complexities introduced by polyploids in phylogeny estimation. Furthermore, methods that are designed to account for polyploidy often disregard incomplete lineage sorting (ILS), a major source of heterogeneous gene histories, or are computationally very demanding. Therefore, there is a great need for efficient and robust methods to accurately reconstruct polyploid phylogenies. ResultsWe introduce Polyphest (POLYploid PHylogeny ESTimation), a new method for efficiently and accurately inferring species phylogenies in the presence of both polyploidy and ILS. Polyphest bypasses the need for extensive network space searches by first generating a multilabeled tree based on gene trees, which is then converted into a (uniquely labeled) species phylogeny. We compare the performance of Polyphest to that of two polyploid phylogeny estimation methods, one of which does not account for ILS, namely PADRE, and another that accounts for ILS, namely MPAllopp. Polyphest is more accurate than PADRE and achieves comparable accuracy to MPAllopp, while being significantly faster. We also demonstrate the application of Polyphest to empirical data from the hexaploid bread wheat and confirm the allopolyploid origin of bread wheat along with the closest relatives for each of its subgenomes. Availability and implementationPolyphest is available at https://github.com/NakhlehLab/Polyphest.
more » « less
The Impact of Model Misspecification on Phylogenetic Network Inference

https://doi.org/10.18061/bssb.v3i1.9553

Cao, Zhen; Li, Meng; Ogilvie, Huw; Nakhleh, Luay (July 2024, Bulletin of the Society of Systematic Biologists)

The development of statistical methods to infer species phylogenies with reticulations (species networks) has led to many discoveries of gene flow between distinct species. These methods typically assume only incomplete lineage sorting and introgression. Given that phylogenetic networks can be arbitrarily complex, these methods might compensate for model misspecification by increasing the number of dimensions beyond the true value. Herein, we explore the effect of potential model misspecification, including the negligence of gene tree estimation error (GTEE) and assumption of a single substitution rate for all genomic loci, on the accuracy of phylogenetic network inference using both simulated and biological data. In particular, we assess the accuracy of estimated phylogenetic networks as well as test statistics for determining whether a network is the correct evolutionary history, as opposed to the simpler model that is a tree.We found that while GTEE negatively impacts the performance of test statistics to determine the “treeness” of the evolutionary history of a data set, running those tests on triplets of taxa and correcting for multiple-testing significantly ameliorates the problem. We also found that accounting for substitution rate heterogeneity improves the reliability of full Bayesian inference methods of phylogenetic networks, whereas summary statistic methods are robust to GTEE and rate heterogeneity, though currently require manual inspection to determine the network complexity.
more » « less
Full Text Available
Accurate integration of single-cell DNA and RNA for analyzing intratumor heterogeneity using MaCroDNA

https://doi.org/10.1038/s41467-023-44014-3

Edrisi, Mohammadamin; Huang, Xiru; Ogilvie, Huw A.; Nakhleh, Luay (December 2023, Nature Communications)

Abstract Cancers develop and progress as mutations accumulate, and with the advent of single-cell DNA and RNA sequencing, researchers can observe these mutations and their transcriptomic effects and predict proteomic changes with remarkable temporal and spatial precision. However, to connect genomic mutations with their transcriptomic and proteomic consequences, cells with either only DNA data or only RNA data must be mapped to a common domain. For this purpose, we present MaCroDNA, a method that uses maximum weighted bipartite matching of per-gene read counts from single-cell DNA and RNA-seq data. Using ground truth information from colorectal cancer data, we demonstrate the advantage of MaCroDNA over existing methods in accuracy and speed. Exemplifying the utility of single-cell data integration in cancer research, we suggest, based on results derived using MaCroDNA, that genomic mutations of large effect size increasingly contribute to differential expression between cells as Barrett’s esophagus progresses to esophageal cancer, reaffirming the findings of the previous studies.
more » « less
“Correcting” Gene Trees to be More Like Species Trees Frequently Increases Topological Error

https://doi.org/10.1093/gbe/evad094

Yan, Zhi; Ogilvie, Huw A; Nakhleh, Luay (June 2023, Genome Biology and Evolution)
Holland, Barbara (Ed.)
Abstract The evolutionary histories of individual loci in a genome can be estimated independently, but this approach is error-prone due to the limited amount of sequence data available for each gene, which has led to the development of a diverse array of gene tree error correction methods which reduce the distance to the species tree. We investigate the performance of two representatives of these methods: TRACTION and TreeFix. We found that gene tree error correction frequently increases the level of error in gene tree topologies by “correcting” them to be closer to the species tree, even when the true gene and species trees are discordant. We confirm that full Bayesian inference of the gene trees under the multispecies coalescent model is more accurate than independent inference. Future gene tree correction approaches and methods should incorporate an adequately realistic model of evolution instead of relying on oversimplified heuristics.
more » « less
Full Text Available
Comparing inference under the multispecies coalescent with and without recombination

https://doi.org/10.1016/j.ympev.2023.107724

Yan, Zhi; Ogilvie, Huw A.; Nakhleh, Luay (April 2023, Molecular Phylogenetics and Evolution)

Full Text Available
Annotation-free delineation of prokaryotic homology groups

https://doi.org/10.1371/journal.pcbi.1010216

Yin, Yongze; Ogilvie, Huw A.; Nakhleh, Luay (June 2022, PLOS Computational Biology)
Kolodny, Rachel (Ed.)
Phylogenomic studies of prokaryotic taxa often assume conserved marker genes are homologous across their length. However, processes such as horizontal gene transfer or gene duplication and loss may disrupt this homology by recombining only parts of genes, causing gene fission or fusion. We show using simulation that it is necessary to delineate homology groups in a set of bacterial genomes without relying on gene annotations to define the boundaries of homologous regions. To solve this problem, we have developed a graph-based algorithm to partition a set of bacterial genomes into Maximal Homologous Groups of sequences ( MHGs ) where each MHG is a maximal set of maximum-length sequences which are homologous across the entire sequence alignment. We applied our algorithm to a dataset of 19 Enterobacteriaceae species and found that MHGs cover much greater proportions of genomes than markers and, relatedly, are less biased in terms of the functions of the genes they cover. We zoomed in on the correlation between each individual marker and their overlapping MHGs, and show that few phylogenetic splits supported by the markers are supported by the MHGs while many marker-supported splits are contradicted by the MHGs. A comparison of the species tree inferred from marker genes with the species tree inferred from MHGs suggests that the increased bias and lack of genome coverage by markers causes incorrect inferences as to the overall relationship between bacterial taxa.
more » « less
Full Text Available

« Prev Next »

Search for: All records