Graphs in metric spaces appear in a wide range of data sets, and there is a large body of work focused on comparing, matching, or analyzing collections of graphs in different ambient spaces. In this survey, we provide an overview of a diverse collection of distance measures that can be defined on the set of finite graphs immersed (and in some cases, embedded) in a metric space. For each of the distance measures, we recall their definitions and investigate which of the properties of a metric they satisfy. Furthermore we compare the distance measures based on these properties and discuss their computational complexity.
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

Abstract 
Societal Impact Statement Citrus are intrinsically connected to human health and culture, preventing human diseases like scurvy and inspiring sacred rituals. Citrus fruits come in a stunning number of different sizes and shapes, ranging from small clementines to oversized pummelos, and fruits display a vast diversity of flavors and aromas. These qualities are key in both traditional and modern medicine and in the production of cleaning and perfume products. By quantifying and modeling overall fruit shape and oil gland distribution, we can gain further insight into citrus development and the impacts of domestication and improvement on multiple characteristics of the fruit.
Summary Citrus come in diverse sizes and shapes, and play a key role in world culture and economy. Citrus oil glands in particular contain essential oils which include plant secondary metabolites associated with flavor and aroma. Capturing and analyzing nuanced information behind the citrus fruit shape and its oil gland distribution provide a morphology‐driven path to further our insight into phenotype–genotype interactions.
We investigated the shape of citrus fruit of 51 accessions based on 3D X‐ray computed tomography (CT) scan reconstructions. Accessions include members of the three ancestral citrus species as well as related genera, and several interspecific hybrids. We digitally separate and compare the size of fruit endocarp, mesocarp, exocarp, and oil gland tissue. Based on the centers of the oil glands, overall fruit shape is approximated with an ellipsoid. Possible oil gland distributions on this ellipsoid surface are explored using directional statistics.
There is a strong allometry along fruit tissues; that is, we observe a strong linear relationship between the logarithmic volume of any pair of major tissues. This suggests that the relative growth of fruit tissues with respect to each other follows a power law. We also observe that on average, glands distance themselves from their nearest neighbor following a square root relationship, which suggests normal diffusion dynamics at play.
The observed allometry and square root models point to the existence of biophysical developmental constraints that govern novel relationships between fruit dimensions from both evolutionary and breeding perspectives. Understanding these biophysical interactions prompts an exciting research path on fruit development and breeding.

Abstract Shape is data and data is shape. Biologists are accustomed to thinking about how the shape of biomolecules, cells, tissues, and organisms arise from the effects of genetics, development, and the environment. Less often do we consider that data itself has shape and structure, or that it is possible to measure the shape of data and analyze it. Here, we review applications of topological data analysis (TDA) to biology in a way accessible to biologists and applied mathematicians alike. TDA uses principles from algebraic topology to comprehensively measure shape in data sets. Using a function that relates the similarity of data points to each other, we can monitor the evolution of topological features—connected components, loops, and voids. This evolution, a topological signature, concisely summarizes large, complex data sets. We first provide a TDA primer for biologists before exploring the use of TDA across biological sub‐disciplines, spanning structural biology, molecular biology, evolution, and development. We end by comparing and contrasting different TDA approaches and the potential for their use in biology. The vision of TDA, that data are shape and shape is data, will be relevant as biology transitions into a data‐driven era where the meaningful interpretation of large data sets is a limiting factor.

Graphs drawn in the plane are ubiquitous, arising from data sets through a variety of methods ranging from GIS analysis to image classification to shape analysis. A fundamental problem in this type of data is comparison: given a set of such graphs, can we rank how similar they are in such a way that we capture their geometric “shape” in the plane? We explore a method to compare two such embedded graphs, via a simplified combinatorial representation called a tailless merge tree which encodes the structure based on a fixed direction. First, we examine the properties of a distance designed to compare merge trees called the branching distance, and show that the distance as defined in previous work fails to satisfy some of the requirements of a metric. We incorporate this into a new distance function called average branching distance to compare graphs by looking at the branching distance for merge trees defined over many directions. Despite the theoretical issues, we show that the definition is still quite useful in practice by using our opensource code to cluster data sets of embedded graphs.more » « lessFree, publiclyaccessible full text available January 1, 2024

Buchin, Kevin ; Colin de Verdi\` (Ed.)In this paper, we introduce an extension of smoothing on Reeb graphs, which we call truncated smoothing; this in turn allows us to define a new family of metrics which generalize the interleaving distance for Reeb graphs. Intuitively, we "chop off" parts near local minima and maxima during the course of smoothing, where the amount cut is controlled by a parameter τ. After formalizing truncation as a functor, we show that when applied after the smoothing functor, this prevents extensive expansion of the range of the function, and yields particularly nice properties (such as maintaining connectivity) when combined with smoothing for 0 ≤ τ ≤ 2ε, where ε is the smoothing parameter. Then, for the restriction of τ ∈ [0,ε], we have additional structure which we can take advantage of to construct a categorical flow for any choice of slope m ∈ [0,1]. Using the infrastructure built for a category with a flow, this then gives an interleaving distance for every m ∈ [0,1], which is a generalization of the original interleaving distance, which is the case m = 0. While the resulting metrics are not stable, we show that any pair of these for m, m' ∈ [0,1) are strongly equivalent metrics, which in turn gives stability of each metric up to a multiplicative constant. We conclude by discussing implications of this metric within the broader family of metrics for Reeb graphs.more » « less

null (Ed.)Abstract We study the probabilistic convergence between the mapper graph and the Reeb graph of a topological space $${\mathbb {X}}$$ X equipped with a continuous function $$f: {\mathbb {X}}\rightarrow \mathbb {R}$$ f : X → R . We first give a categorification of the mapper graph and the Reeb graph by interpreting them in terms of cosheaves and stratified covers of the real line $$\mathbb {R}$$ R . We then introduce a variant of the classic mapper graph of Singh et al. (in: Eurographics symposium on pointbased graphics, 2007), referred to as the enhanced mapper graph, and demonstrate that such a construction approximates the Reeb graph of $$({\mathbb {X}}, f)$$ ( X , f ) when it is applied to points randomly sampled from a probability density function concentrated on $$({\mathbb {X}}, f)$$ ( X , f ) . Our techniques are based on the interleaving distance of constructible cosheaves and topological estimation via kernel density estimates. Following Munch and Wang (In: 32nd international symposium on computational geometry, volume 51 of Leibniz international proceedings in informatics (LIPIcs), Dagstuhl, Germany, pp 53:1–53:16, 2016), we first show that the mapper graph of $$({\mathbb {X}}, f)$$ ( X , f ) , a constructible $$\mathbb {R}$$ R space (with a fixed open cover), approximates the Reeb graph of the same space. We then construct an isomorphism between the mapper of $$({\mathbb {X}},f)$$ ( X , f ) to the mapper of a superlevel set of a probability density function concentrated on $$({\mathbb {X}}, f)$$ ( X , f ) . Finally, building on the approach of Bobrowski et al. (Bernoulli 23 (1):288–328, 2017b), we show that, with high probability, we can recover the mapper of the superlevel set given a sufficiently large sample. Our work is the first to consider the mapper construction using the theory of cosheaves in a probabilistic setting. It is part of an ongoing effort to combine sheaf theory, probability, and statistics, to support topological data analysis with random data.more » « less

null (Ed.)Bifurcations in dynamical systems characterize qualitative changes in the system behavior. Therefore, their detection is important because they can signal the transition from normal system operation to imminent failure. In an experimental setting, this transition could lead to incorrect data or damage to the entire experiment. While standard persistent homology has been used in this setting, it usually requires analyzing a collection of persistence diagrams, which in turn drives up the computational cost considerably. Using zigzag persistence, we can capture topological changes in the state space of the dynamical system in only one persistence diagram. Here, we present Bifurcations using ZigZag (BuZZ), a onestep method to study and detect bifurcations using zigzag persistence. The BuZZ method is successfully able to detect this type of behavior in two synthetic examples as well as an example dynamical system.more » « less

As the field of Topological Data Analysis continues to show success in theory and in applications, there has been increasing interest in using tools from this field with methods for machine learning. Using persistent homology, specifically persistence diagrams, as inputs to machine learning techniques requires some mathematical creativity. The space of persistence diagrams does not have the desirable properties for machine learning, thus methods such as kernel methods and vectorization methods have been developed. One such featurization of persistence diagrams by Perea, Munch and Khasawneh uses continuous, compactly supported functions, referred to as "template functions," which results in a stable vector representation of the persistence diagram. In this paper, we provide a method of adaptively partitioning persistence diagrams to improve these featurizations based on localized information in the diagrams. Additionally, we provide a framework to adaptively select parameters required for the template functions in order to best utilize the partitioning method. We present results for application to example data sets comparing classification results between template function featurizations with and without partitioning, in addition to other methods from the literature.more » « less

Chatter detection has become a prominent subject of interest due to its effect on cutting tool life, surface finish and spindle of machine tool. Most of the existing methods in chatter detection literature are based on signal processing and signal decomposition. In this study, we use topological features of data simulating cutting tool vibrations, combined with four supervised machine learning algorithms to diagnose chatter in the milling process. Persistence diagrams, a method of representing topological features, are not easily used in the context of machine learning, so they must be transformed into a form that is more amenable. Specifically, we will focus on two different methods for featurizing persistence diagrams, Carlsson coordinates and template functions. In this paper, we provide classification results for simulated data from various cutting configurations, including upmilling and downmilling, in addition to the same data with some added noise. Our results show that Carlsson Coordinates and Template Functions yield accuracies as high as 96% and 95%, respectively. We also provide evidence that these topological methods are noise robust descriptors for chatter detection.more » « less