skip to main content


Title: Tropical Geometric Variation of Tree Shapes
Abstract

We study the behavior of phylogenetic tree shapes in the tropical geometric interpretation of tree space. Tree shapes are formally referred to as tree topologies; a tree topology can also be thought of as a tree combinatorial type, which is given by the tree’s branching configuration and leaf labeling. We use the tropical line segment as a framework to define notions of variance as well as invariance of tree topologies: we provide a combinatorial search theorem that describes all tree topologies occurring along a tropical line segment, as well as a setting under which tree topologies do not change along a tropical line segment. Our study is motivated by comparison to the moduli space endowed with a geodesic metric proposed by Billera, Holmes, and Vogtmann (referred to as BHV space); we consider the tropical geometric setting as an alternative framework to BHV space for sets of phylogenetic trees. We give an algorithm to compute tropical line segments which is lower in computational complexity than the fastest method currently available for BHV geodesics and show that its trajectory behaves more subtly: while the BHV geodesic traverses the origin for vastly different tree topologies, the tropical line segment bypasses it.

 
more » « less
NSF-PAR ID:
10371454
Author(s) / Creator(s):
; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Discrete & Computational Geometry
Volume:
68
Issue:
3
ISSN:
0179-5376
Page Range / eLocation ID:
p. 817-849
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study how to optimize the latent space of neural shape generators that map latent codes to 3D deformable shapes. The key focus is to look at a deformable shape generator from a differential geometry perspective. We define a Riemannian metric based on as-rigid-as-possible and as-conformal-as-possible deformation energies. Under this metric, we study two desired properties of the latent space: 1) straight-line interpolations in latent codes follow geodesic curves; 2) latent codes disentangle pose and shape variations at different scales. Strictly enforcing the geometric interpolation property, however, only applies if the metric matrix is a constant. We show how to achieve this property approximately by enforcing that geodesic interpolations are axis-aligned, i.e., interpolations along coordinate axis follow geodesic curves. In addition, we introduce a novel approach that decouples pose and shape variations via generalized eigendecomposition. We also study efficient regularization terms for learning deformable shape generators, e.g., that promote smooth interpolations. Experimental results on benchmark datasets show that our approach leads to interpretable latent codes, improves the generalizability of synthetic shapes, and enhances performance in geodesic interpolation and geodesic shooting.

     
    more » « less
  2. Abstract

    The marginal likelihood of a model is a key quantity for assessing the evidence provided by the data in support of a model. The marginal likelihood is the normalizing constant for the posterior density, obtained by integrating the product of the likelihood and the prior with respect to model parameters. Thus, the computational burden of computing the marginal likelihood scales with the dimension of the parameter space. In phylogenetics, where we work with tree topologies that are high-dimensional models, standard approaches to computing marginal likelihoods are very slow. Here, we study methods to quickly compute the marginal likelihood of a single fixed tree topology. We benchmark the speed and accuracy of 19 different methods to compute the marginal likelihood of phylogenetic topologies on a suite of real data sets under the JC69 model. These methods include several new ones that we develop explicitly to solve this problem, as well as existing algorithms that we apply to phylogenetic models for the first time. Altogether, our results show that the accuracy of these methods varies widely, and that accuracy does not necessarily correlate with computational burden. Our newly developed methods are orders of magnitude faster than standard approaches, and in some cases, their accuracy rivals the best established estimators.

     
    more » « less
  3. Abstract

    Bayesian Markov chain Monte Carlo explores tree space slowly, in part because it frequently returns to the same tree topology. An alternative strategy would be to explore tree space systematically, and never return to the same topology. In this article, we present an efficient parallelized method to map out the high likelihood set of phylogenetic tree topologies via systematic search, which we show to be a good approximation of the high posterior set of tree topologies on the data sets analyzed. Here, “likelihood” of a topology refers to the tree likelihood for the corresponding tree with optimized branch lengths. We call this method “phylogenetic topographer” (PT). The PT strategy is very simple: starting in a number of local topology maxima (obtained by hill-climbing from random starting points), explore out using local topology rearrangements, only continuing through topologies that are better than some likelihood threshold below the best observed topology. We show that the normalized topology likelihoods are a useful proxy for the Bayesian posterior probability of those topologies. By using a nonblocking hash table keyed on unique representations of tree topologies, we avoid visiting topologies more than once across all concurrent threads exploring tree space. We demonstrate that PT can be used directly to approximate a Bayesian consensus tree topology. When combined with an accurate means of evaluating per-topology marginal likelihoods, PT gives an alternative procedure for obtaining Bayesian posterior distributions on phylogenetic tree topologies.

     
    more » « less
  4. Abstract

    The genusLiriomyzaMik (Diptera: Agromyzidae) is a diverse and globally distributed group of acalyptrate flies. Phylogenetic relationships amongLiriomyzaspecies have remained incompletely investigated and have never been fully addressed using molecular data. Here, we reconstruct the phylogeny of the genusLiriomyzausing various phylogenetic methods (maximum likelihood, Bayesian inference, and gene tree coalescence) on target‐capture‐based phylogenomic datasets (nucleotides and amino acids) obtained from anchored hybrid enrichment (AHE). We have recovered tree topologies that are nearly congruent across all data types and methods, and individual clade support is strong across all phylogenetic analyses. Moreover, defined morphological species groups and clades are well‐supported in our best estimates of the molecular phylogeny.Liriomyza violivora(Spencer) is a sister group to all remaining sampledLiriomyzaspecies, and the well‐known polyphagous vegetable pests [L. huidobrensis(Blanchard),L. langeiFrick,L. bryoniae.(Kaltenbach),L. trifolii(Burgess),L. sativaeBlanchard, andL. brassicae(Riley)]. belong to multiple clades that are not particularly closely related on the trees. Often, closely relatedLiriomyzaspecies feed on distantly related host plants. We reject the hypothesis that cophylogenetic processes betweenLiriomyzaspecies and their host plants drive diversification in this genus. Instead,Liriomyzaexhibits a widespread pattern of major host shifts across plant taxa. Our new phylogenetic estimate forLiriomyzaspecies provides considerable new information on the evolution of host‐use patterns in this genus. In addition, it provides a framework for further study of the morphology, ecology, and diversification of these important flies.

     
    more » « less
  5. Abstract

    The hemipteran suborder Auchenorrhyncha is a highly diverse, ecologically and agriculturally important group of primarily phytophagous insects which has been a source of phylogenetic contention for many years. Here, we have used transcriptome sequencing to assemble 2139 orthologues from 84 auchenorrhynchan species representing 27 families; this is the largest and most taxonomically comprehensive phylogenetic dataset for this group to date. We used both maximum likelihood and multispecies coalescent analyses to reconstruct the evolutionary history in this group using amino acid, nucleotide, and degeneracy‐coded nucleotide orthologue data. Although many relationships at the superfamily level were consistent between analyses, several differing, highly supported topologies were recovered using different datasets and reconstruction methods, most notably the differential placement of Cercopoidea as sister to either Cicadoidea or Membracoidea. To further interrogate the recovered topologies, we explored the contribution of genes as partitioned by third‐codon‐position guanine‐cytosine (GC) content and heterogeneity. We found consistent support for several relationships, including Cercopoidea + Cicadoidea, most often in genes that would be expected to be enriched for the true species tree if recombination‐based dynamics in GC content have contributed to the observed GC heterogeneity. Our results provide a generally well‐supported framework for future studies of auchenorrhynchan phylogeny and suggest that transcriptome sequencing is likely to be a fruitful source of phylogenetic data for resolving its clades. However, we caution that future work should account for the potential effects of GC content heterogeneity on relationships recovered in this group.

     
    more » « less