skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Computation of the Sample Frechet Mean for Sets of Large Graphs with Applications to Regression
To characterize the location (mean, median) of a set of graphs, one needs a notion of centrality that is adapted to metric spaces, since graph sets are not Euclidean spaces. A standard approach is to consider the Fr ́echet mean. In this work, we equip a set of graph with the pseudometric defined by the l2 norm between the eigenvalues of their respective adjacency matrix. Unlike the edit distance, this pseudometric reveals structural changes at multiple scales, and is well adapted to studying various statistical problems for graph-valued data. We describe an algorithm to compute an approximation to the sample Fr ́echet mean of a set of undirected unweighted graphs with a fixed size using this pseudometric.  more » « less
Award ID(s):
1815971
PAR ID:
10318701
Author(s) / Creator(s):
;
Editor(s):
Banerjee, Arindam; Zhou, Zhi-Hua
Date Published:
Journal Name:
SIAM International Conference on Data Mining (SDM 2022)
ISSN:
978-1-611977-17-2
Page Range / eLocation ID:
379-387
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Benito, Rosa Maria; Cherifi, Chantal; Cherifi, Hocine; Moro, Esteban; Rocha, Luis M. (Ed.)
    To characterize the “average” of a set of graphs, one can compute the sample Fr ́echet mean. We prove the following result: if we use the Hamming distance to compute distances between graphs, then the Fr ́echet mean of an ensemble of inhomogeneous random graphs is obtained by thresholding the expected adjacency matrix: an edge exists between the vertices i and j in the Fr ́echet mean graph if and only if the corresponding entry of the expected adjacency matrix is greater than 1/2. We prove that the result also holds for the sample Fr ́echet mean when the expected adjacency matrix is replaced with the sample mean adjacency matrix. This novel theoretical result has some significant practical consequences; for instance, the Fr ́echet mean of an ensemble of sparse inhomogeneous random graphs is the empty graph. 
    more » « less
  2. We examine topological properties of spaces of paths and graphs mapped to $$\R^d$$ under the Fr\'echet distance. We show that these spaces are path-connected if the map is either continuous or an immersion. If the map is an embedding, we show that the space of paths is path-connected, while the space of graphs only maintains this property in dimensions four or higher. 
    more » « less
  3. To characterize the location (mean, median) of a set of graphs, one needs a notion of centrality that has been adapted to metric spaces. A standard approach is to consider the Fréchet mean. In practice, computing the Fréchet mean for sets of large graphs presents many computational issues. In this work, we suggest a method that may be used to compute the Fréchet mean for sets of graphs which is metric independent. We show that the technique proposed can be used to determine the Fréchet mean when considering the Hamming distance or a distance defined by the difference between the spectra of the adjacency matrices of the graphs. 
    more » « less
  4. The application of graph Laplacian eigenvectors has been quite popular in the graph signal processing field: one can use them as ingredients to design smooth multiscale basis. Our long-term goal is to study and understand the dual geometry of graph Laplacian eigenvectors. In order to do that, it is necessary to define a certain metric to measure the behavioral differences between each pair of the eigenvectors. Saito (2018) considered the ramified optimal transportation (ROT) cost between the square of the eigenvectors as such a metric. Clonginger and Steinerberger (2018) proposed a way to measure the affinity (or `similarity') between the eigenvectors based on their Hadamard (HAD) product. In this article, we propose a simplified ROT metric that is more computational efficient and introduce two more ways to define the distance between the eigenvectors, i.e., the time-stepping diffusion (TSD) metric and the difference of absolute gradient (DAG) pseudometric. The TSD metric measures the cost of "flattening" the initial graph signal via diffusion process up to certain time, hence it can be viewed as a time-dependent version of the ROT metric. The DAG pseudometric is the l2-distance between the feature vectors derived from the eigenvectors, in particular, the absolute gradients of the eigenvectors. We then compare the performance of ROT, HAD and the two new "metrics: on different kinds of graphs. Finally, we investigate their relationship as well as their pros and cons. Keywords: Graph Laplacian eigenvectors, metrics between orthonormal vectors, dual geometry of graph Laplacian eigenvectors, multiscale basis dictionaries on graphs, heat diffusion on graphs, Wasserstein distance, optimal transport 
    more » « less
  5. Wasserstein distances form a family of metrics on spaces of probability measures that have recently seen many applications. However, statistical analysis in these spaces is complex due to the nonlinearity of Wasserstein spaces. One potential solution to this problem is Linear Optimal Transport (LOT). This method allows one to find a Euclidean embedding, called {\it LOT embedding}, of measures in some Wasserstein spaces, but some information is lost in this embedding. So, to understand whether statistical analysis relying on LOT embeddings can make valid inferences about original data, it is helpful to quantify how well these embeddings describe that data. To answer this question, we present a decomposition of the {\it Fr\'echet variance} of a set of measures in the 2-Wasserstein space, which allows one to compute the percentage of variance explained by LOT embeddings of those measures. We then extend this decomposition to the Fused Gromov-Wasserstein setting. We also present several experiments that explore the relationship between the dimension of the LOT embedding, the percentage of variance explained by the embedding, and the classification accuracy of machine learning classifiers built on the embedded data. We use the MNIST handwritten digits dataset, IMDB-50000 dataset, and Diffusion Tensor MRI images for these experiments. Our results illustrate the effectiveness of low dimensional LOT embeddings in terms of the percentage of variance explained and the classification accuracy of models built on the embedded data. 
    more » « less