Abstract MotivationThe abundance of gene flow in the Tree of Life challenges the notion that evolution can be represented with a fully bifurcating process which cannot capture important biological realities like hybridization, introgression, or horizontal gene transfer. Coalescent-based network methods are increasingly popular, yet not scalable for big data, because they need to perform a heuristic search in the space of networks as well as numerical optimization that can be NP-hard. Here, we introduce a novel method to reconstruct phylogenetic networks based on algebraic invariants. While there is a long tradition of using algebraic invariants in phylogenetics, our work is the first to define phylogenetic invariants on concordance factors (frequencies of four-taxon splits in the input gene trees) to identify level-1 phylogenetic networks under the multispecies coalescent model. ResultsOur novel hybrid detection methodology is optimization-free as it only requires the evaluation of polynomial equations, and as such, it bypasses the traversal of network space, yielding a computational speed at least 10 times faster than the fastest-to-date network methods. We illustrate our method’s performance on simulated and real data from the genus Canis. Availability and implementationWe present an open-source publicly available Julia package PhyloDiamond.jl available at https://github.com/solislemuslab/PhyloDiamond.jl with broad applicability within the evolutionary community.
more »
« less
Dimensions of Level-1 Group-Based Phylogenetic Networks
Abstract Phylogenetic networks represent evolutionary histories of sets of taxa where horizontal evolution or hybridization has occurred. Placing a Markov model of evolution on a phylogenetic network gives a model that is particularly amenable to algebraic study by representing it as an algebraic variety. In this paper, we give a formula for the dimension of the variety corresponding to a triangle-free level-1 phylogenetic network under a group-based evolutionary model. On our way to this, we give a dimension formula for codimension zero toric fiber products. We conclude by illustrating applications to identifiability.
more »
« less
- Award ID(s):
- 1945584
- PAR ID:
- 10523297
- Publisher / Repository:
- Springer
- Date Published:
- Journal Name:
- Bulletin of Mathematical Biology
- Volume:
- 86
- Issue:
- 8
- ISSN:
- 0092-8240
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Phylogenetic networks can represent evolutionary events that cannot be described by phylogenetic trees. These networks are able to incorporate reticulate evolutionary events such as hybridization, introgression, and lateral gene transfer. Recently, network-based Markov models of DNA sequence evolution have been introduced along with model-based methods for reconstructing phylogenetic networks. For these methods to be consistent, the network parameter needs to be identifiable from data generated under the model. Here, we show that the semi-directed network parameter of a triangle-free, level-1 network model with any fixed number of reticulation vertices is generically identifiable under the Jukes–Cantor, Kimura 2-parameter, or Kimura 3-parameter constraints.more » « less
-
We study deep neural networks with polynomial activations, particularly their expressive power. For a fixed architecture and activation degree, a polynomial neural network defines an algebraic map from weights to polynomials. The image of this map is the functional space associated to the network, and it is an irreducible algebraic variety upon taking closure. This paper proposes the dimension of this variety as a precise measure of the expressive power of polynomial neural networks. We obtain several theoretical results regarding this dimension as a function of architecture, including an exact formula for high activation degrees, as well as upper and lower bounds on layer widths in order for deep polynomials networks to fill the ambient functional space. We also present computational evidence that it is profitable in terms of expressiveness for layer widths to increase monotonically and then decrease monotonically. Finally, we link our study to favorable optimization properties when training weights, and we draw intriguing connections with tensor and polynomial decompositions.more » « less
-
Abstract We considerG, a linear algebraic group defined over$$\Bbbk $$, an algebraically closed field (ACF). By considering$$\Bbbk $$as an embedded residue field of an algebraically closed valued fieldK, we can associate to it a compactG-space$$S^\mu _G(\Bbbk )$$consisting of$$\mu $$-types onG. We show that for each$$p_\mu \in S^\mu _G(\Bbbk )$$,$$\mathrm {Stab}^\mu (p)=\mathrm {Stab}\left (p_\mu \right )$$is a solvable infinite algebraic group when$$p_\mu $$is centered at infinity and residually algebraic. Moreover, we give a description of the dimension of$$\mathrm {Stab}\left (p_\mu \right )$$in terms of the dimension ofp.more » « less
-
Let $$\R$$ be a real closed field and $$\C$$ the algebraic closure of $$\R$$. We give an algorithm for computing a semi-algebraic basis for the first homology group, $$\HH_1(S,\mathbb{F})$$, with coefficients in a field $$\FF$$, of any given semi-algebraic set $$S \subset \R^k$$ defined by a closed formula. The complexity of the algorithm is bounded singly exponentially. More precisely, if the given quantifier-free formula involves $$s$$ polynomials whose degrees are bounded by $$d$$, the complexity of the algorithm is bounded by $$(s d)^{k^{O(1)}}$$. This algorithm generalizes well known algorithms having singly exponential complexity for computing a semi-algebraic basis of the zero-th homology group of semi-algebraic sets, which is equivalent to the problem of computing a set of points meeting every semi-algebraically connected component of the given semi-algebraic set at a unique point. It is not known how to compute such a basis for the higher homology groups with singly exponential complexity. As an intermediate step in our algorithm we construct a semi-algebraic subset $$\Gamma$$ of the given semi-algebraic set $$S$$, such that $$\HH_q(S,\Gamma) = 0$$ for $q=0,1$. We relate this construction to a basic theorem in complex algebraic geometry stating that for any affine variety $$X$$ of dimension $$n$$, there exists Zariski closed subsets \[ Z^{(n-1)} \supset \cdots \supset Z^{(1)} \supset Z^{(0)} \] with $$\dim_\C Z^{(i)} \leq i$, and $$\HH_q(X,Z^{(i)}) = 0$$ for $$0 \leq q \leq i$$. We conjecture a quantitative version of this result in the semi-algebraic category, with $$X$$ and $$Z^{(i)}$$ replaced by closed semi-algebraic sets. We make initial progress on this conjecture by proving the existence of $$Z^{(0)}$$ and $$Z^{(1)}$$ with complexity bounded singly exponentially (previously, such an algorithm was known only for constructing $$Z^{(0)}$$).more » « less
An official website of the United States government

