skip to main content


Title: Rapid Analysis of Network Connectivity
This research focuses on accelerating the computational time of two base network algorithms (k-simple shortest paths and minimum spanning tree for a subset of nodes)---cornerstones behind a variety of network connectivity mining tasks---with the goal of rapidly finding networkpathways andtrees using a set of user-specific query nodes. To facilitate this process we utilize: (1) multi-threaded algorithm variations, (2) network re-use for subsequent queries and (3) a novel algorithm, Key Neighboring Vertices (KNV), to reduce the network search space. The proposed KNV algorithm serves a dual purpose: (a) to reduce the computation time for algorithmic analysis and (b) to identify key vertices in the network (\textit ). Empirical results indicate this combination of techniques significantly improves the baseline performance of both algorithms. We have also developed a web platform utilizing the proposed network algorithms to enable researchers and practitioners to both visualize and interact with their datasets (PathFinder: http://www.path-finder.io.)  more » « less
Award ID(s):
1651203 1743040 1715385 1947135 2003924
NSF-PAR ID:
10062416
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ACM CIKM
Page Range / eLocation ID:
2463 to 2466
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Nowadays, large-scale graph data is being generated in a variety of real-world applications, from social networks to co-authorship networks, from protein-protein interaction networks to road traffic networks. Many existing works on graph mining focus on the vertices and edges, with the first-order Markov chain as the underlying model. They fail to explore the high-order network structures, which are of key importance in many high impact domains. For example, in bank customer personally identifiable information (PII) networks, the star structures often correspond to a set of synthetic identities; in financial transaction networks, the loop structures may indicate the existence of money laundering. In this paper, we focus on mining user-specified high-order network structures and aim to find a structure-rich subgraph which does not break many such structures by separating the subgraph from the rest. A key challenge associated with finding a structure-rich subgraph is the prohibitive computational cost. To address this problem, inspired by the family of local graph clustering algorithms for efficiently identifying a low-conductance cut without exploring the entire graph, we propose to generalize the key idea to model high-order network structures. In particular, we start with a generic definition of high-order conductance, and define the high-order diffusion core, which is based on a high-order random walk induced by user-specified high-order network structure. Then we propose a novel High-Order Structure-Preserving LOcal Cut (HOSPLOC) algorithm, which runs in polylogarithmic time with respect to the number of edges in the graph. It starts with a seed vertex and iteratively explores its neighborhood until a subgraph with a small high-order conductance is found. Furthermore, we analyze its performance in terms of both effectiveness and efficiency. The experimental results on both synthetic graphs and real graphs demonstrate the effectiveness and efficiency of our proposed HOSPLOC algorithm. 
    more » « less
  2. We consider the maximum vertex-weighted matching problem (MVM), in which non-negative weights are assigned to the vertices of a graph, and the weight of a matching is the sum of the weights of the matched vertices. Although exact algorithms for MVM are faster than exact algorithms for the maximum edge-weighted matching problem, there are graphs on which these exact algorithms could take hundreds of hours. For a natural number k, we design a k/(k + 1)approximation algorithm for MVM on non-bipartite graphs that updates the matching along certain short paths in the graph: either augmenting paths of length at most 2k + 1 or weight-increasing paths of length at most 2k. The choice of k = 2 leads to a 2/3-approximation algorithm that computes nearly optimal weights fast. This algorithm could be initialized with a 2/3-approximate maximum cardinality matching to reduce its runtime in practice. A 1/2-approximation algorithm may be obtained using k = 1, which is faster than the 2/3-approximation algorithm but it computes lower weights. The 2/3-approximation algorithm has time complexity O(Δ2m) while the time complexity of the 1/2-approximation algorithm is O(Δm), where m is the number of edges and Δ is the maximum degree of a vertex. Results from our serial implementations show that on average the 1/2-approximation algorithm runs faster than the Suitor algorithm (currently the fastest 1/2-approximation algorithm) while the 2/3-approximation algorithm runs as fast as the Suitor algorithm but obtains higher weights for the matching. One advantage of the proposed algorithms is that they are well-suited for parallel implementation since they can process vertices to match in any order. The 1/2- and 2/3-approximation algorithms have been implemented on a shared memory parallel computer, and both approximation algorithms obtain good speedups, while the former algorithm runs faster on average than the parallel Suitor algorithm. Care is needed to design the parallel algorithm to avoid cyclic waits, and we prove that it is live-lock free. 
    more » « less
  3. The Sylvester equation offers a powerful and unifying primitive for a variety of important graph mining tasks, including network alignment, graph kernel, node similarity, subgraph matching, etc. A major bottleneck of Sylvester equation lies in its high computational complexity. Despite tremendous effort, state-of-the-art methods still require a complexity that is at least \em quadratic in the number of nodes of graphs, even with approximations. In this paper, we propose a family of Krylov subspace based algorithms (\fasten) to speed up and scale up the computation of Sylvester equation for graph mining. The key idea of the proposed methods is to project the original equivalent linear system onto a Kronecker Krylov subspace. We further exploit (1) the implicit representation of the solution matrix as well as the associated computation, and (2) the decomposition of the original Sylvester equation into a set of inter-correlated Sylvester equations of smaller size. The proposed algorithms bear two distinctive features. First, they provide the \em exact solutions without any approximation error. Second, they significantly reduce the time and space complexity for solving Sylvester equation, with two of the proposed algorithms having a \em linear complexity in both time and space. Experimental evaluations on a diverse set of real networks, demonstrate that our methods (1) are up to $10,000\times$ faster against Conjugate Gradient method, the best known competitor that outputs the exact solution, and (2) scale up to million-node graphs. 
    more » « less
  4. There is a rich theory and plethora of algorithms in the literature aiming at the efficient scheduling of stochastic networks. These solutions are predominantly designed under the assumption of traffic demands that are independently generated at network nodes, without any requirement for synchronization among their received services. In this work, we note that many applications, including cloud computing, virtual reality, gaming, autonomous vehicular networks and collaborative design, generate traffic simultaneously at multiple nodes when they arrive, with possibly non-uniform file sizes, whose performance relies on the synchronous completion of the traffic across the network. This calls for the design of new scheduling algorithms that aims to coordinate the service of packets of the same traffic across the network. Towards this end, we propose a novel scheduling algorithm that not only accounts for the heterogeneity of the file size distributions, but also works towards synchronizing the completion time of the same traffic stream across the network. This is achieved by employing two insights that emanate from key motivating examples we develop: (1) the normalization of traffic load with respect to the non-uniform file sizes; and (2) the incorporation of deviation of normalized loads across network nodes that serve synchronized traffic. After establishing the throughput-optimality of our algorithm in general stochastic networks, we perform extensive simulations under various (spanning both wired and wireless) settings to reveal the potential completion time gains that it yields over other throughput-optimal strategies designed under the assumption of independent traffic generation. 
    more » « less
  5. There is a rich theory and plethora of algorithms in the literature aiming at the efficient scheduling of stochastic networks. These solutions are predominantly designed under the assumption of traffic demands that are independently generated at network nodes, without any requirement for synchronization among their received services. In this work, we note that many applications, including cloud computing, virtual reality, gaming, autonomous vehicular networks and collaborative design, generate traffic simultaneously at multiple nodes when they arrive, with possibly non-uniform file sizes, whose performance relies on the synchronous completion of the traffic across the network. This calls for the design of new scheduling algorithms that aims to coordinate the service of packets of the same traffic across the network. Towards this end, we propose a novel scheduling algorithm that not only accounts for the heterogeneity of the file size distributions, but also works towards synchronizing the completion time of the same traffic stream across the network. This is achieved by employing two insights that emanate from key motivating examples we develop: (1) the normalization of traffic load with respect to the non-uniform file sizes; and (2) the incorporation of deviation of normalized loads across network nodes that serve synchronized traffic. After establishing the throughput-optimality of our algorithm in general stochastic networks, we perform extensive simulations under various (spanning both wired and wireless) settings to reveal the potential completion time gains that it yields over other throughput-optimal strategies designed under the assumption of independent traffic generation. 
    more » « less