- Award ID(s):
- 1633631
- PAR ID:
- 10310846
- Editor(s):
- Oliva, Gabriele
- Date Published:
- Journal Name:
- PLOS ONE
- Volume:
- 16
- Issue:
- 4
- ISSN:
- 1932-6203
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
The application of graph Laplacian eigenvectors has been quite popular in the graph signal processing field: one can use them as ingredients to design smooth multiscale basis. Our long-term goal is to study and understand the dual geometry of graph Laplacian eigenvectors. In order to do that, it is necessary to define a certain metric to measure the behavioral differences between each pair of the eigenvectors. Saito (2018) considered the ramified optimal transportation (ROT) cost between the square of the eigenvectors as such a metric. Clonginger and Steinerberger (2018) proposed a way to measure the affinity (or `similarity') between the eigenvectors based on their Hadamard (HAD) product. In this article, we propose a simplified ROT metric that is more computational efficient and introduce two more ways to define the distance between the eigenvectors, i.e., the time-stepping diffusion (TSD) metric and the difference of absolute gradient (DAG) pseudometric. The TSD metric measures the cost of "flattening" the initial graph signal via diffusion process up to certain time, hence it can be viewed as a time-dependent version of the ROT metric. The DAG pseudometric is the l2-distance between the feature vectors derived from the eigenvectors, in particular, the absolute gradients of the eigenvectors. We then compare the performance of ROT, HAD and the two new "metrics: on different kinds of graphs. Finally, we investigate their relationship as well as their pros and cons. Keywords: Graph Laplacian eigenvectors, metrics between orthonormal vectors, dual geometry of graph Laplacian eigenvectors, multiscale basis dictionaries on graphs, heat diffusion on graphs, Wasserstein distance, optimal transportmore » « less
-
Abstract Let be a connected graph and . The ‐distance domination number of is the smallest cardinality of a set of vertices such that every vertex of is within distance from some vertex of . While for , that is, for the ordinary domination number, the problem of finding asymptotically sharp upper bounds in terms of order and minimum degree of the graph has been solved, corresponding bounds for have remained elusive. In this paper, we solve this problem and present an asymptotically sharp upper bound on the ‐distance domination number of a graph in terms of its order and minimum degree, which significantly improves on bounds in the literature. We also obtain an asymptotically sharp upper bound on the ‐radius of graphs in terms of order and minimum degree. For , the ‐radius of is defined as the smallest integer such that there exists a set of vertices of having the property that every vertex of is within distance of some vertex in . We also present improved bounds for graphs of given order, minimum degree and maximum degree, for triangle‐free graphs and for graphs not containing a ‐cycle as a subgraph.
-
Quantifying the differences between networks is a challenging and ever-present problem in network science. In recent years, a multitude of diverse, ad hoc solutions to this problem have been introduced. Here, we propose that simple and well-understood ensembles of random networks—such as Erdős–Rényi graphs, random geometric graphs, Watts–Strogatz graphs, the configuration model and preferential attachment networks—are natural benchmarks for network comparison methods. Moreover, we show that the expected distance between two networks independently sampled from a generative model is a useful property that encapsulates many key features of that model. To illustrate our results, we calculate this within-ensemble graph distance and related quantities for classic network models (and several parameterizations thereof) using 20 distance measures commonly used to compare graphs. The within-ensemble graph distance provides a new framework for developers of graph distances to better understand their creations and for practitioners to better choose an appropriate tool for their particular task.more » « less
-
Graph based non-linear reference structures such as variation graphs and colored de Bruijn graphs enable incorporation of full genomic diversity within a population. However, transitioning from a simple string-based reference to graphs requires addressing many computational challenges, one of which concerns accurately mapping sequencing read sets to graphs. Paired-end Illumina sequencing is a commonly used sequencing platform in genomics, where the paired-end distance constraints allow disambiguation of repeats. Many recent works have explored provably good index-based and alignment-based strategies for mapping individual reads to graphs. However, validating distance constraints efficiently over graphs is not trivial, and existing sequence to graph mappers rely on heuristics. We introduce a mathematical formulation of the problem, and provide a new algorithm to solve it exactly. We take advantage of the high sparsity of reference graphs, and use sparse matrix-matrix multiplications (SpGEMM) to build an index which can be queried efficiently by a mapping algorithm for validating the distance constraints. Effectiveness of the algorithm is demonstrated using real reference graphs, including a human MHC variation graph, and a pan-genome de-Bruijn graph built using genomes of 20 B. anthracis strains. While the one-time indexing time can vary from a few minutes to a few hours using our algorithm, answering a million distance queries takes less than a second.more » « less
-
Learning representations of sets of nodes in a graph is crucial for applications ranging from node-role discovery to link prediction and molecule classification. Graph Neural Networks (GNNs) have achieved great success in graph representation learning. However, expressive power of GNNs is limited by the 1-Weisfeiler-Lehman (WL) test and thus GNNs generate identical representations for graph substructures that may in fact be very different. More powerful GNNs, proposed recently by mimicking higher-order-WL tests, only focus on representing entire graphs and they are computationally inefficient as they cannot utilize sparsity of the underlying graph. Here we propose and mathematically analyze a general class of structure related features, termed Distance Encoding (DE). DE assists GNNs in representing any set of nodes, while providing strictly more expressive power than the 1-WL test. DE captures the distance between the node set whose representation is to be learned and each node in the graph. To capture the distance DE can apply various graph-distance measures such as shortest path distance or generalized PageRank scores. We propose two ways for GNNs to use DEs (1) as extra node features, and (2) as controllers of message aggregation in GNNs. Both approaches can utilize the sparse structure of the underlying graph, which leads to computational efficiency and scalability. We also prove that DE can distinguish node sets embedded in almost all regular graphs where traditional GNNs always fail. We evaluate DE on three tasks over six real networks: structural role prediction, link prediction, and triangle prediction. Results show that our models outperform GNNs without DE by up-to 15% in accuracy and AUROC. Furthermore, our models also significantly outperform other state-of-the-art methods especially designed for the above tasks.more » « less