NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

CRP-Tree: a phylogenetic association test for binary traits

https://doi.org/10.1093/jrsssc/qlad098

Zhang, Julie; Preising, Gabriel_A; Schumer, Molly; Palacios, Julia_A (November 2023, Journal of the Royal Statistical Society Series C: Applied Statistics)

Abstract An important problem in evolutionary genomics is to investigate whether a certain trait measured on each sample is associated with the sample phylogenetic tree. The phylogenetic tree represents the shared evolutionary history of the samples and it is usually estimated from molecular sequence data at a locus or from other type of genetic data. We propose a model for trait evolution inspired by the Chinese Restaurant Process that includes a parameter that controls the degree of preferential attachment, that is, the tendency of nodes in the tree to subtend from nodes of the same type. This model with no preferential attachment is equivalent to a structured coalescent model with simultaneous migration and coalescence events and serves as a null model. We derive a test for phylogenetic binary trait association with linear computational complexity and empirically demonstrate that it is more powerful than some other methods. We apply our test to study the phylogenetic association of some traits in swordtail fish, breast cancer, yellow fever virus, and influenza A H1N1 virus. R-package implementation of our methods is available at https://github.com/jyzhang27/CRPTree.
more » « less
Statistical summaries of unlabelled evolutionary trees

https://doi.org/10.1093/biomet/asad025

Samyak, Rajanala; Palacios, Julia A (April 2023, Biometrika)

Summary Rooted and ranked phylogenetic trees are mathematical objects that are useful in modelling hierarchical data and evolutionary relationships with applications to many fields such as evolutionary biology and genetic epidemiology. Bayesian phylogenetic inference usually explores the posterior distribution of trees via Markov chain Monte Carlo methods. However, assessing uncertainty and summarizing distributions remains challenging for these types of structures. While labelled phylogenetic trees have been extensively studied, relatively less literature exists for unlabelled trees that are increasingly useful, for example when one seeks to summarize samples of trees obtained with different methods, or from different samples and environments, and wishes to assess the stability and generalizability of these summaries. In our paper, we exploit recently proposed distance metrics of unlabelled ranked binary trees and unlabelled ranked genealogies, or trees equipped with branch lengths, to define the Fréchet mean, variance and interquartile sets as summaries of these tree distributions. We provide an efficient combinatorial optimization algorithm for computing the Fréchet mean of a sample or of distributions on unlabelled ranked tree shapes and unlabelled ranked genealogies. We show the applicability of our summary statistics for studying popular tree distributions and for comparing the SARS-CoV-2 evolutionary trees across different locations during the COVID-19 epidemic in 2020. Our current implementations are publicly available at https://github.com/RSamyak/fmatrix.
more » « less
Full Text Available
Accounting for reporting delays in real-time phylodynamic analyses with preferential sampling

https://doi.org/10.1371/journal.pcbi.1012970

Medina, Catalina M; Palacios, Julia A; Minin, Volodymyr M (May 2025, PLOS Computational Biology)
Barido-Sottani, Joëlle (Ed.)
The COVID-19 pandemic demonstrated that fast and accurate analysis of continually collected infectious disease surveillance data is crucial for situational awareness and policy making. Coalescent-based phylodynamic analysis can use genetic sequences of a pathogen to estimate changes in its effective population size, a measure of genetic diversity. These changes in effective population size can be connected to the changes in the number of infections in the population of interest under certain conditions. Phylodynamics is an important set of tools because its methods are often resilient to the ascertainment biases present in traditional surveillance data (e.g., preferentially testing symptomatic individuals). Unfortunately, it takes weeks or months to sequence and deposit the sampled pathogen genetic sequences into a database, making them available for such analyses. These reporting delays severely decrease precision of phylodynamic methods closer to present time, and for some models can lead to extreme biases. Here we present a method that affords reliable estimation of the effective population size trajectory closer to the time of data collection, allowing for policy decisions to be based on more recent data. Our work uses readily available historic times between sampling and reporting of sequenced samples for a population of interest, and incorporates this information into the sampling model to mitigate the effects of reporting delay in real-time analyses. We illustrate our methodology on simulated data and on SARS-CoV-2 sequences collected in the state of Washington in 2021.
more » « less
Free, publicly-accessible full text available May 6, 2026
Multiple merger coalescent inference of effective population size

https://doi.org/10.1098/rstb.2023.0306

Zhang, Julie; Palacios, Julia A (February 2025, Philosophical Transactions of the Royal Society B: Biological Sciences)
NA (Ed.)
Variation in a sample of molecular sequence data informs about the past evolutionary history of the sample’s population. Traditionally, Bayesian modelling coupled with the standard coalescent is used to infer the sample’s bifurcating genealogy and demographic and evolutionary parameters such as effective population size and mutation rates. However, there are many situations where binary coalescent models do not accurately reflect the true underlying ancestral processes. Here, we propose a Bayesian non-parametric method for inferring effective population size trajectories from a multifurcating genealogy under the Lambda-coalescent. In particular, we jointly estimate the effective population size and the model parameter for the Beta-coalescent model, a special type of Lambda-coalescent. Finally, we test our methods on simulations and apply them to study various viral dynamics as well as Japanese sardine population size changes over time. The code and vignettes can be found in the phylodyn package. This article is part of the theme issue ‘“A mathematical theory of evolution”: phylogenetic models dating back 100 years’.
more » « less
Free, publicly-accessible full text available February 13, 2026

Search for: All records