NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Ultrafast learning of four-node hybridization cycles in phylogenetic networks using algebraic invariants

https://doi.org/10.1093/bioadv/vbae014

Wu, Zhaoxing; Solís-Lemus, Claudia; Ouangraoua, ed., Aida (February 2024, Bioinformatics Advances)

Abstract MotivationThe abundance of gene flow in the Tree of Life challenges the notion that evolution can be represented with a fully bifurcating process which cannot capture important biological realities like hybridization, introgression, or horizontal gene transfer. Coalescent-based network methods are increasingly popular, yet not scalable for big data, because they need to perform a heuristic search in the space of networks as well as numerical optimization that can be NP-hard. Here, we introduce a novel method to reconstruct phylogenetic networks based on algebraic invariants. While there is a long tradition of using algebraic invariants in phylogenetics, our work is the first to define phylogenetic invariants on concordance factors (frequencies of four-taxon splits in the input gene trees) to identify level-1 phylogenetic networks under the multispecies coalescent model. ResultsOur novel hybrid detection methodology is optimization-free as it only requires the evaluation of polynomial equations, and as such, it bypasses the traversal of network space, yielding a computational speed at least 10 times faster than the fastest-to-date network methods. We illustrate our method’s performance on simulated and real data from the genus Canis. Availability and implementationWe present an open-source publicly available Julia package PhyloDiamond.jl available at https://github.com/solislemuslab/PhyloDiamond.jl with broad applicability within the evolutionary community.
more » « less
A latent linear model for nonlinear coupled oscillators on graphs

Goyal, Agam; Wu, Zhaoxing; Yim, Richard P; Chen, Binhao; Xu, Zihong; Lyu, Hanbaek (November 2023, arXiv)

A system of coupled oscillators on an arbitrary graph is locally driven by the tendency to mutual synchronization be- tween nearby oscillators, but can and often exhibit nonlinear behavior on the whole graph. Understanding such nonlin- ear behavior has been a key challenge in predicting whether all oscillators in such a system will eventually synchronize. In this paper, we demonstrate that, surprisingly, such nonlinear behavior of coupled oscillators can be effectively lin- earized in certain latent dynamic spaces. The key insight is that there is a small number of ‘latent dynamics filters’, each with a specific association with synchronizing and non-synchronizing dynamics on subgraphs so that any observed dynamics on subgraphs can be approximated by a suitable linear combination of such elementary dynamic patterns. Taking an ensemble of subgraph-level predictions provides an interpretable predictor for whether the system on the whole graph reaches global synchronization. We propose algorithms based on supervised matrix factorization to learn such latent dynamics filters. We demonstrate that our method performs competitively in synchronization prediction tasks against baselines and black-box classification algorithms, despite its simple and interpretable architecture.
more » « less
Full Text Available
Assessment of Projection Pursuit Index for Classifying High Dimension Low Sample Size Data in R

https://doi.org/10.6339/23-JDS1096

Wu, Zhaoxing; Zhang, Chunming (March 2023, Journal of Data Science)

Analyzing “large p small n” data is becoming increasingly paramount in a wide range of application fields. As a projection pursuit index, the Penalized Discriminant Analysis ($$\mathrm{PDA}$$) index, built upon the Linear Discriminant Analysis ($$\mathrm{LDA}$$) index, is devised in Lee and Cook (2010) to classify high-dimensional data with promising results. Yet, there is little information available about its performance compared with the popular Support Vector Machine ($$\mathrm{SVM}$$). This paper conducts extensive numerical studies to compare the performance of the $$\mathrm{PDA}$$ index with the $$\mathrm{LDA}$$ index and $$\mathrm{SVM}$$, demonstrating that the $$\mathrm{PDA}$$ index is robust to outliers and able to handle high-dimensional datasets with extremely small sample sizes, few important variables, and multiple classes. Analyses of several motivating real-world datasets reveal the practical advantages and limitations of individual methods, suggesting that the $$\mathrm{PDA}$$ index provides a useful alternative tool for classifying complex high-dimensional data. These new insights, along with the hands-on implementation of the $$\mathrm{PDA}$$ index functions in the R package classPP, facilitate statisticians and data scientists to make effective use of both sets of classification tools.
more » « less
Full Text Available

Search for: All records