Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract MotivationThe abundance of gene flow in the Tree of Life challenges the notion that evolution can be represented with a fully bifurcating process which cannot capture important biological realities like hybridization, introgression, or horizontal gene transfer. Coalescent-based network methods are increasingly popular, yet not scalable for big data, because they need to perform a heuristic search in the space of networks as well as numerical optimization that can be NP-hard. Here, we introduce a novel method to reconstruct phylogenetic networks based on algebraic invariants. While there is a long tradition of using algebraic invariants in phylogenetics, our work is the first to define phylogenetic invariants on concordance factors (frequencies of four-taxon splits in the input gene trees) to identify level-1 phylogenetic networks under the multispecies coalescent model. ResultsOur novel hybrid detection methodology is optimization-free as it only requires the evaluation of polynomial equations, and as such, it bypasses the traversal of network space, yielding a computational speed at least 10 times faster than the fastest-to-date network methods. We illustrate our method’s performance on simulated and real data from the genus Canis. Availability and implementationWe present an open-source publicly available Julia package PhyloDiamond.jl available at https://github.com/solislemuslab/PhyloDiamond.jl with broad applicability within the evolutionary community.more » « less
-
Ouangraoua, Aida (Ed.)Abstract Scientists world-wide are putting together massive efforts to understand how the biodiversity that we see on Earth evolved from single-cell organisms at the origin of life and this diversification process is represented through the Tree of Life. Low sampling rates and high heterogeneity in the rate of evolution across sites and lineages produce a phenomenon denoted “long branch attraction” (LBA) in which long non-sister lineages are estimated to be sisters regardless of their true evolutionary relationship. LBA has been a pervasive problem in phylogenetic inference affecting different types of methodologies from distance-based to likelihood-based. Here, we present a novel neural network model that outperforms standard phylogenetic methods and other neural network implementations under LBA settings. Furthermore, unlike existing neural network models in phylogenetics, our model naturally accounts for the tree isomorphisms via permutation invariant functions which ultimately result in lower memory and allows the seamless extension to larger trees.more » « less
-
Abstract Gene flow is increasingly recognized as an important macroevolutionary process. The many mechanisms that contribute to gene flow (e.g. introgression, hybridization, lateral gene transfer) uniquely affect the diversification of dynamics of species, making it important to be able to account for these idiosyncrasies when constructing phylogenetic models. Existing phylogenetic‐network simulators for macroevolution are limited in the ways they model gene flow.We presentSiPhyNetwork, an R package for simulating phylogenetic networks under a birth–death‐hybridization process.Our package unifies the existing birth–death‐hybridization models while also extending the toolkit for modelling gene flow. This tool can create patterns of reticulation such as hybridization, lateral gene transfer, and introgression.Specifically, we model different reticulate events by allowing events to either add, remove or keep constant the number of lineages. Additionally, we allow reticulation events to be trait dependent, creating the ability to model the expanse of isolating mechanisms that prevent gene flow. This tool makes it possible for researchers to model many of the complex biological factors associated with gene flow in a phylogenetic context.more » « less
-
Birtwistle, Marc R (Ed.)High-dimensional mixed-effects models are an increasingly important form of regression in which the number of covariates rivals or exceeds the number of samples, which are collected in groups or clusters. The penalized likelihood approach to fitting these models relies on a coordinate descent algorithm that lacks guarantees of convergence to a global optimum. Here, we empirically study the behavior of this algorithm on simulated and real examples of three types of data that are common in modern biology: transcriptome, genome-wide association, and microbiome data. Our simulations provide new insights into the algorithm’s behavior in these settings, and, comparing the performance of two popular penalties, we demonstrate that the smoothly clipped absolute deviation (SCAD) penalty consistently outperforms the least absolute shrinkage and selection operator (LASSO) penalty in terms of both variable selection and estimation accuracy across omics data. To empower researchers in biology and other fields to fit models with the SCAD penalty, we implement the algorithm in a Julia package,HighDimMixedModels.jl.more » « lessFree, publicly-accessible full text available January 13, 2026
-
Hybridization events complicate the accurate reconstruction of phylogenies, as they lead to patterns of genetic heritability that are unexpected under traditional, bifurcating models of species trees. This phenomenon has led to the development of methods to infer these varied hybridization events, both methods that reconstruct networks directly, as well as summary methods that predict individual hybridization events from a subset of taxa. However, a lack of empirical comparisons between methods – especially those pertaining to large networks with varied hybridization scenarios – hinders their practical use. Here, we provide a comprehensive review of popular summary methods: TICR, MSCquartets, HyDe, Patterson’s D-Statistic (ABBA-BABA), D3, and Dp. TICR and MSCquartets are based on quartet concordance factors gathered from gene tree topologies and HyDe, Patterson’s D-Statistic, D3, and Dp use site pattern frequencies to identify hybridization events between sets of three taxa. We then use simulated data to address questions of method accuracy and ideal use scenarios by testing methods against complex networks which depict gene flow events that differ in depth (timing), quantity (single vs. multiple, overlapping hybridizations), and rate of gene flow (γ). We find that deeper or multiple hybridization events may introduce noise and weaken the signal of hybridization, leading to higher relative false negative rates across all methods. Despite some forms of hybridization eluding quartet-based detection methods, MSCquartets displays high precision in most scenarios. While HyDe results in high false negative rates when tested on hybridizations involving extinct or unsampled ghost lineages, HyDe is the only method able to identify the direction of hybridization, distinguishing the source parental lineages from recipient hybrid lineages. Lastly, we test the methods on a dataset of ultraconserved elements from the bee subfamily Nomiinae, finding possible hybridization events between clades which correspond to regions of poor support in the species tree estimated in a previous study.more » « less
An official website of the United States government
