Graphs in metric spaces appear in a wide range of data sets, and there is a large body of work focused on comparing, matching, or analyzing collections of graphs in different ambient spaces. In this survey, we provide an overview of a diverse collection of distance measures that can be defined on the set of finite graphs immersed (and in some cases, embedded) in a metric space. For each of the distance measures, we recall their definitions and investigate which of the properties of a metric they satisfy. Furthermore we compare the distance measures based on these properties and discuss their computational complexity.
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract -
Societal Impact Statement Citrus are intrinsically connected to human health and culture, preventing human diseases like scurvy and inspiring sacred rituals. Citrus fruits come in a stunning number of different sizes and shapes, ranging from small clementines to oversized pummelos, and fruits display a vast diversity of flavors and aromas. These qualities are key in both traditional and modern medicine and in the production of cleaning and perfume products. By quantifying and modeling overall fruit shape and oil gland distribution, we can gain further insight into citrus development and the impacts of domestication and improvement on multiple characteristics of the fruit.
Summary Citrus come in diverse sizes and shapes, and play a key role in world culture and economy. Citrus oil glands in particular contain essential oils which include plant secondary metabolites associated with flavor and aroma. Capturing and analyzing nuanced information behind the citrus fruit shape and its oil gland distribution provide a morphology‐driven path to further our insight into phenotype–genotype interactions.
We investigated the shape of citrus fruit of 51 accessions based on 3D X‐ray computed tomography (CT) scan reconstructions. Accessions include members of the three ancestral citrus species as well as related genera, and several interspecific hybrids. We digitally separate and compare the size of fruit endocarp, mesocarp, exocarp, and oil gland tissue. Based on the centers of the oil glands, overall fruit shape is approximated with an ellipsoid. Possible oil gland distributions on this ellipsoid surface are explored using directional statistics.
There is a strong allometry along fruit tissues; that is, we observe a strong linear relationship between the logarithmic volume of any pair of major tissues. This suggests that the relative growth of fruit tissues with respect to each other follows a power law. We also observe that on average, glands distance themselves from their nearest neighbor following a square root relationship, which suggests normal diffusion dynamics at play.
The observed allometry and square root models point to the existence of biophysical developmental constraints that govern novel relationships between fruit dimensions from both evolutionary and breeding perspectives. Understanding these biophysical interactions prompts an exciting research path on fruit development and breeding.
-
Abstract Shape is data and data is shape. Biologists are accustomed to thinking about how the shape of biomolecules, cells, tissues, and organisms arise from the effects of genetics, development, and the environment. Less often do we consider that data itself has shape and structure, or that it is possible to measure the shape of data and analyze it. Here, we review applications of topological data analysis (TDA) to biology in a way accessible to biologists and applied mathematicians alike. TDA uses principles from algebraic topology to comprehensively measure shape in data sets. Using a function that relates the similarity of data points to each other, we can monitor the evolution of topological features—connected components, loops, and voids. This evolution, a topological signature, concisely summarizes large, complex data sets. We first provide a TDA primer for biologists before exploring the use of TDA across biological sub‐disciplines, spanning structural biology, molecular biology, evolution, and development. We end by comparing and contrasting different TDA approaches and the potential for their use in biology. The vision of TDA, that data are shape and shape is data, will be relevant as biology transitions into a data‐driven era where the meaningful interpretation of large data sets is a limiting factor.
-
Graphs drawn in the plane are ubiquitous, arising from data sets through a variety of methods ranging from GIS analysis to image classification to shape analysis. A fundamental problem in this type of data is comparison: given a set of such graphs, can we rank how similar they are in such a way that we capture their geometric “shape” in the plane? We explore a method to compare two such embedded graphs, via a simplified combinatorial representation called a tail-less merge tree which encodes the structure based on a fixed direction. First, we examine the properties of a distance designed to compare merge trees called the branching distance, and show that the distance as defined in previous work fails to satisfy some of the requirements of a metric. We incorporate this into a new distance function called average branching distance to compare graphs by looking at the branching distance for merge trees defined over many directions. Despite the theoretical issues, we show that the definition is still quite useful in practice by using our open-source code to cluster data sets of embedded graphs.more » « less
-
Chen, Tsu-Wei ; Long, Stephen P (Ed.)Abstract Shape plays a fundamental role in biology. Traditional phenotypic analysis methods measure some features but fail to measure the information embedded in shape comprehensively. To extract, compare and analyse this information embedded in a robust and concise way, we turn to topological data analysis (TDA), specifically the Euler characteristic transform. TDA measures shape comprehensively using mathematical representations based on algebraic topology features. To study its use, we compute both traditional and topological shape descriptors to quantify the morphology of 3121 barley seeds scanned with X-ray computed tomography (CT) technology at 127 μm resolution. The Euler characteristic transform measures shape by analysing topological features of an object at thresholds across a number of directional axes. A Kruskal–Wallis analysis of the information encoded by the topological signature reveals that the Euler characteristic transform picks up successfully the shape of the crease and bottom of the seeds. Moreover, while traditional shape descriptors can cluster the seeds based on their accession, topological shape descriptors can cluster them further based on their panicle. We then successfully train a support vector machine to classify 28 different accessions of barley based exclusively on the shape of their grains. We observe that combining both traditional and topological descriptors classifies barley seeds better than using just traditional descriptors alone. This improvement suggests that TDA is thus a powerful complement to traditional morphometrics to comprehensively describe a multitude of ‘hidden’ shape nuances which are otherwise not detected.more » « less
-
Buchin, Kevin ; Colin de Verdi\` (Ed.)In this paper, we introduce an extension of smoothing on Reeb graphs, which we call truncated smoothing; this in turn allows us to define a new family of metrics which generalize the interleaving distance for Reeb graphs. Intuitively, we "chop off" parts near local minima and maxima during the course of smoothing, where the amount cut is controlled by a parameter τ. After formalizing truncation as a functor, we show that when applied after the smoothing functor, this prevents extensive expansion of the range of the function, and yields particularly nice properties (such as maintaining connectivity) when combined with smoothing for 0 ≤ τ ≤ 2ε, where ε is the smoothing parameter. Then, for the restriction of τ ∈ [0,ε], we have additional structure which we can take advantage of to construct a categorical flow for any choice of slope m ∈ [0,1]. Using the infrastructure built for a category with a flow, this then gives an interleaving distance for every m ∈ [0,1], which is a generalization of the original interleaving distance, which is the case m = 0. While the resulting metrics are not stable, we show that any pair of these for m, m' ∈ [0,1) are strongly equivalent metrics, which in turn gives stability of each metric up to a multiplicative constant. We conclude by discussing implications of this metric within the broader family of metrics for Reeb graphs.more » « less
-
null (Ed.)Abstract We study the probabilistic convergence between the mapper graph and the Reeb graph of a topological space $${\mathbb {X}}$$ X equipped with a continuous function $$f: {\mathbb {X}}\rightarrow \mathbb {R}$$ f : X → R . We first give a categorification of the mapper graph and the Reeb graph by interpreting them in terms of cosheaves and stratified covers of the real line $$\mathbb {R}$$ R . We then introduce a variant of the classic mapper graph of Singh et al. (in: Eurographics symposium on point-based graphics, 2007), referred to as the enhanced mapper graph, and demonstrate that such a construction approximates the Reeb graph of $$({\mathbb {X}}, f)$$ ( X , f ) when it is applied to points randomly sampled from a probability density function concentrated on $$({\mathbb {X}}, f)$$ ( X , f ) . Our techniques are based on the interleaving distance of constructible cosheaves and topological estimation via kernel density estimates. Following Munch and Wang (In: 32nd international symposium on computational geometry, volume 51 of Leibniz international proceedings in informatics (LIPIcs), Dagstuhl, Germany, pp 53:1–53:16, 2016), we first show that the mapper graph of $$({\mathbb {X}}, f)$$ ( X , f ) , a constructible $$\mathbb {R}$$ R -space (with a fixed open cover), approximates the Reeb graph of the same space. We then construct an isomorphism between the mapper of $$({\mathbb {X}},f)$$ ( X , f ) to the mapper of a super-level set of a probability density function concentrated on $$({\mathbb {X}}, f)$$ ( X , f ) . Finally, building on the approach of Bobrowski et al. (Bernoulli 23 (1):288–328, 2017b), we show that, with high probability, we can recover the mapper of the super-level set given a sufficiently large sample. Our work is the first to consider the mapper construction using the theory of cosheaves in a probabilistic setting. It is part of an ongoing effort to combine sheaf theory, probability, and statistics, to support topological data analysis with random data.more » « less
-
null (Ed.)Bifurcations in dynamical systems characterize qualitative changes in the system behavior. Therefore, their detection is important because they can signal the transition from normal system operation to imminent failure. In an experimental setting, this transition could lead to incorrect data or damage to the entire experiment. While standard persistent homology has been used in this setting, it usually requires analyzing a collection of persistence diagrams, which in turn drives up the computational cost considerably. Using zigzag persistence, we can capture topological changes in the state space of the dynamical system in only one persistence diagram. Here, we present Bifurcations using ZigZag (BuZZ), a one-step method to study and detect bifurcations using zigzag persistence. The BuZZ method is successfully able to detect this type of behavior in two synthetic examples as well as an example dynamical system.more » « less
-
As the field of Topological Data Analysis continues to show success in theory and in applications, there has been increasing interest in using tools from this field with methods for machine learning. Using persistent homology, specifically persistence diagrams, as inputs to machine learning techniques requires some mathematical creativity. The space of persistence diagrams does not have the desirable properties for machine learning, thus methods such as kernel methods and vectorization methods have been developed. One such featurization of persistence diagrams by Perea, Munch and Khasawneh uses continuous, compactly supported functions, referred to as "template functions," which results in a stable vector representation of the persistence diagram. In this paper, we provide a method of adaptively partitioning persistence diagrams to improve these featurizations based on localized information in the diagrams. Additionally, we provide a framework to adaptively select parameters required for the template functions in order to best utilize the partitioning method. We present results for application to example data sets comparing classification results between template function featurizations with and without partitioning, in addition to other methods from the literature.more » « less