skip to main content

Title: Comparing embedded graphs using average branching distance
Graphs drawn in the plane are ubiquitous, arising from data sets through a variety of methods ranging from GIS analysis to image classification to shape analysis. A fundamental problem in this type of data is comparison: given a set of such graphs, can we rank how similar they are in such a way that we capture their geometric “shape” in the plane? We explore a method to compare two such embedded graphs, via a simplified combinatorial representation called a tail-less merge tree which encodes the structure based on a fixed direction. First, we examine the properties of a distance designed to compare merge trees called the branching distance, and show that the distance as defined in previous work fails to satisfy some of the requirements of a metric. We incorporate this into a new distance function called average branching distance to compare graphs by looking at the branching distance for merge trees defined over many directions. Despite the theoretical issues, we show that the definition is still quite useful in practice by using our open-source code to cluster data sets of embedded graphs.  more » « less
Award ID(s):
1907591 1907612
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Involve: A Journal of Mathematics
Date Published:
Journal Name:
Involve, a Journal of Mathematics
Page Range / eLocation ID:
365 to 388
Subject(s) / Keyword(s):
["topological data analysis, merge tree, embedded graph"]
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Applications in data science, shape analysis, and object classification frequently require comparison of probability distributions defined on different ambient spaces. To accomplish this, one requires a notion of distance on a given class of metric measure spaces—that is, compact metric spaces endowed with probability measures. Such distances are typically defined as comparisons between metric measure space invariants, such as distance distributions (also referred to as shape distributions, distance histograms, or shape contexts in the literature). Generally, distances defined in terms of distance distributions are actually pseudometrics, in that they may vanish when comparing nonisomorphic spaces. The goal of this paper is to set up a formal framework for assessing the discrimininative power of distance distributions, that is, the extent to which these pseudometrics fail to define proper metrics. We formulate several precise inverse problems in terms of these invariants and answer them in several categories of metric measure spaces, including the category of plane curves, where we give a counterexample to the curve histogram conjecture of Brinkman and Olver, the categories of embedded and Riemannian manifolds, where we obtain sphere rigidity results, and the category of metric graphs, where we obtain a local injectivity result along the lines of classical work of Boutin and Kemper on point cloud configurations. The inverse problems are further contextualized by the introduction of a variant of the Gromov–Wasserstein distance on the space of metric measure spaces, which is inspired by the original Monge formulation of optimal transport.

    more » « less
  2. For the past several centuries, millipede taxonomists have used the morphology of male copulatory structures (modified legs called gonopods), which are strongly variable and suggestive of species-level differences, as a source to understand taxon relationships. Millipedes in the family Xystodesmidae are blind, dispersal-limited and have narrow habitat requirements. Therefore, geographical proximity may instead be a better predictor of evolutionary relationship than morphology, especially since gonopodal anatomy is extremely divergent and similarities may be masked by evolutionary convergence. Here we provide a phylogenetics-based test of the power of morphological versus geographical character sets for resolving phylogenetic relationships in xystodesmid millipedes. Molecular data from 90 species-group taxa in the family were included in a six-gene phylogenetic analysis to provide the basis for comparing trees generated from these alternative character sets. The molecular phylogeny was compared to topologies representing three hypotheses: (1) a prior classification formulated using morphological and geographical data, (2) hierarchical groupings derived from Euclidean geographical distance, and (3) one based solely on morphological data. Euclidean geographical distance was not found to be a better predictor of evolutionary relationship than the prior classification, the latter of which was the most similar to the molecular topology. However, all three of the alternative topologies were highly divergent (Bayes factor >10) from the molecular topology, with the tree inferred exclusively from morphology being the most divergent. The results of this analysis show that a high degree of morphological convergence from substantial gonopod shape divergence generated spurious phylogenetic relationships. These results indicate the impact that a high degree of morphological homoplasy may have had on prior treatments of the family. Using the results of our phylogenetic analysis, we make several changes to the classification of the family, including transferring the rare state-threatened species Sigmoria whiteheadi Shelley, 1986 to the genus Apheloria Chamberlin, 1921—a relationship not readily apparent based on morphology alone. We show that while gonopod differences are a premier source of taxonomic characters to diagnose species pairwise, the traits should be viewed critically as taxonomic features uniting higher levels. 
    more » « less
  3. Abstract

    Graphs in metric spaces appear in a wide range of data sets, and there is a large body of work focused on comparing, matching, or analyzing collections of graphs in different ambient spaces. In this survey, we provide an overview of a diverse collection of distance measures that can be defined on the set of finite graphs immersed (and in some cases, embedded) in a metric space. For each of the distance measures, we recall their definitions and investigate which of the properties of a metric they satisfy. Furthermore we compare the distance measures based on these properties and discuss their computational complexity.

    more » « less
  4. null (Ed.)
    Ribonucleic acid (RNA) secondary structures and branching properties are important for determining functional ramifications in biology. While energy minimization of the Nearest Neighbor Thermodynamic Model (NNTM) is commonly used to identify such properties (number of hairpins, maximum ladder distance, etc.), it is difficult to know whether the resultant values fall within expected dispersion thresholds for a given energy function. The goal of this study was to construct a Markov chain capable of examining the dispersion of RNA secondary structures and branching properties obtained from NNTM energy function minimization independent of a specific nucleotide sequence. Plane trees are studied as a model for RNA secondary structure, with energy assigned to each tree based on the NNTM, and a corresponding Gibbs distribution is defined on the trees. Through a bijection between plane trees and 2-Motzkin paths, a Markov chain converging to the Gibbs distribution is constructed, and fast mixing time is established by estimating the spectral gap of the chain. The spectral gap estimate is obtained through a series of decompositions of the chain and also by building on known mixing time results for other chains on Dyck paths. The resulting algorithm can be used as a tool for exploring the branching structure of RNA, especially for long sequences, and to examine branching structure dependence on energy model parameters. Full exposition is provided for the mathematical techniques used with the expectation that these techniques will prove useful in bioinformatics, computational biology, and additional extended applications. 
    more » « less
  5. Classical statistical mechanics has long relied on assumptions such as the equipartition theorem to understand the behavior of the complicated systems of many particles. The successes of this approach are well known, but there are also many well-known issues with classical theories. For some of these, the introduction of quantum mechanics is necessary, e.g., the ultraviolet catastrophe. However, more recently, the validity of assumptions such as the equipartition of energy in classical systems was called into question. For instance, a detailed analysis of a simplified model for blackbody radiation was apparently able to deduce the Stefan–Boltzmann law using purely classical statistical mechanics. This novel approach involved a careful analysis of a “metastable” state which greatly delays the approach to equilibrium. In this paper, we perform a broad analysis of such a metastable state in the classical Fermi–Pasta–Ulam–Tsingou (FPUT) models. We treat both the α-FPUT and β-FPUT models, exploring both quantitative and qualitative behavior. After introducing the models, we validate our methodology by reproducing the well-known FPUT recurrences in both models and confirming earlier results on how the strength of the recurrences depends on a single system parameter. We establish that the metastable state in the FPUT models can be defined by using a single degree-of-freedom measure—the spectral entropy (η)—and show that this measure has the power to quantify the distance from equipartition. For the α-FPUT model, a comparison to the integrable Toda lattice allows us to define rather clearly the lifetime of the metastable state for the standard initial conditions. We next devise a method to measure the lifetime of the metastable state tm in the α-FPUT model that reduces the sensitivity to the exact initial conditions. Our procedure involves averaging over random initial phases in the plane of initial conditions, the P1-Q1 plane. Applying this procedure gives us a power-law scaling for tm, with the important result that the power laws for different system sizes collapse down to the same exponent as Eα2→0. We examine the energy spectrum E(k) over time in the α-FPUT model and again compare the results to those of the Toda model. This analysis tentatively supports a method for an irreversible energy dissipation process suggested by Onorato et al.: four-wave and six-wave resonances as described by the “wave turbulence” theory. We next apply a similar approach to the β-FPUT model. Here, we explore in particular the different behavior for the two different signs of β. Finally, we describe a procedure for calculating tm in the β-FPUT model, a very different task than for the α-FPUT model, because the β-FPUT model is not a truncation of an integrable nonlinear model. 
    more » « less