skip to main content

Title: Fast and Accurate Network Embeddings via Very Sparse Random Projection
We present FastRP, a scalable and performant algorithm for learning distributed node representations in a graph. FastRP is over 4,000 times faster than state-of-the-art methods such as DeepWalk and node2vec, while achieving comparable or even better performance as evaluated on several real-world networks on various downstream tasks. We observe that most network embedding methods consist of two components: construct a node similarity matrix and then apply dimension reduction techniques to this matrix. We show that the success of these methods should be attributed to the proper construction of this similarity matrix, rather than the dimension reduction method employed. FastRP is proposed as a scalable algorithm for network embeddings. Two key features of FastRP are: 1) it explicitly constructs a node similarity matrix that captures transitive relationships in a graph and normalizes matrix entries based on node degrees; 2) it utilizes very sparse random projection, which is a scalable optimization-free method for dimension reduction. An extra benefit from combining these two design choices is that it allows the iterative computation of node embeddings so that the similarity matrix need not be explicitly constructed, which further speeds up FastRP. FastRP is also advantageous for its ease of implementation, parallelization and hyperparameter tuning. more » The source code is available at https://github.com/GTmac/FastRP. « less
Authors:
; ; ; ;
Award ID(s):
1927227
Publication Date:
NSF-PAR ID:
10283034
Journal Name:
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
Page Range or eLocation-ID:
399 to 408
Sponsoring Org:
National Science Foundation
More Like this
  1. Network embedding has become the cornerstone of a variety of mining tasks, such as classification, link prediction, clustering, anomaly detection and many more, thanks to its superior ability to encode the intrinsic network characteristics in a compact low-dimensional space. Most of the existing methods focus on a single network and/or a single resolution, which generate embeddings of different network objects (node/subgraph/network) from different networks separately. A fundamental limitation with such methods is that the intrinsic relationship across different networks (e.g., two networks share same or similar subgraphs) and that across different resolutions (e.g., the node-subgraph membership) are ignored, resulting in disparate embeddings. Consequentially, it leads to sub-optimal performance or even becomes inapplicable for some downstream mining tasks (e.g., role classification, network alignment. etc.). In this paper, we propose a unified framework MrMine to learn the representations of objects from multiple networks at three complementary resolutions (i.e., network, subgraph and node) simultaneously. The key idea is to construct the cross-resolution cross-network context for each object. The proposed method bears two distinctive features. First, it enables and/or boosts various multi-network downstream mining tasks by having embeddings at different resolutions from different networks in the same embedding space. Second, Our method is efficientmore »and scalable, with a O(nlog(n)) time complexity for the base algorithm and a linear time complexity w.r.t. the number of nodes and edges of input networks for the accelerated version. Extensive experiments on real-world data show that our methods (1) are able to enable and enhance a variety of multi-network mining tasks, and (2) scale up to million-node networks.« less
  2. Network alignment is a fundamental task in many high-impact applications. Most of the existing approaches either explicitly or implicitly consider the alignment matrix as a linear transformation to map one network to another, and might overlook the complicated alignment relationship across networks. On the other hand, node representation learning based alignment methods are hampered by the incomparability among the node representations of different networks. In this paper, we propose a unified semi-supervised deep model (ORIGIN) that simultaneously finds the non-rigid network alignment and learns node representations in multiple networks in a mutually beneficial way. The key idea is to learn node representations by the effective graph convolutional networks, which subsequently enable us to formulate network alignment as a point set alignment problem. The proposed method offers two distinctive advantages. First (node representations), unlike the existing graph convolutional networks that aggregate the node information within a single network, we can effectively aggregate the auxiliary information from multiple sources, achieving far-reaching node representations. Second (network alignment), guided by the highquality node representations, our proposed non-rigid point set alignment approach overcomes the bottleneck of the linear transformation assumption. We conduct extensive experiments that demonstrate the proposed non-rigid alignment method is (1) effective, outperformingmore »both the state-of-the-art linear transformation-based methods and node representation based methods, and (2) efficient, with a comparable computational time between the proposed multi-network representation learning component and its single-network counterpart.« less
  3. Attributed subgraph matching is a powerful tool for explorative mining of large attributed networks. In many applications (e.g., network science of teams, intelligence analysis, finance informatics), the user might not know what exactly s/he is looking for, and thus require the user to constantly revise the initial query graph based on what s/he finds from the current matching results. A major bottleneck in such an interactive matching scenario is the efficiency, as simply rerunning the matching algorithm on the revised query graph is computationally prohibitive. In this paper, we propose a family of effective and efficient algorithms (FIRST) to support interactive attributed subgraph matching. There are two key ideas behind the proposed methods. The first is to recast the attributed subgraph matching problem as a cross-network node similarity problem, whose major computation lies in solving a Sylvester equation for the query graph and the underlying data graph. The second key idea is to explore the smoothness between the initial and revised queries, which allows us to solve the new/updated Sylvester equation incrementally, without re-solving it from scratch. Experimental results show that our method can achieve (1) up to 16x speed-up when applying on networks with 6M$+$ nodes; (2) preserving moremore »than 90% accuracy compared with existing methods; and (3) scales linearly with respect to the size of the data graph.« less
  4. Learning discriminative node representations benefits various downstream tasks in graph analysis such as community detection and node classification. Existing graph representation learning methods (e.g., based on random walk and contrastive learning) are limited to maximizing the local similarity of connected nodes. Such pair-wise learning schemes could fail to capture the global distribution of representations, since it has no explicit constraints on the global geometric properties of representation space. To this end, we propose Geometric Graph Representation Learning (G2R) to learn node representations in an unsupervised manner via maximizing rate reduction. In this way, G2R maps nodes in distinct groups (implicitly stored in the adjacency matrix) into different subspaces, while each subspace is compact and different subspaces are dispersedly distributed. G2R adopts a graph neural network as the encoder and maximizes the rate reduction with the adjacency matrix. Furthermore, we theoretically and empirically demonstrate that rate reduction maximization is equivalent to maximizing the principal angles between different subspaces. Experiments on real-world datasets show that G2R outperforms various baselines on node classification and community detection tasks.
  5. The Sylvester equation offers a powerful and unifying primitive for a variety of important graph mining tasks, including network alignment, graph kernel, node similarity, subgraph matching, etc. A major bottleneck of Sylvester equation lies in its high computational complexity. Despite tremendous effort, state-of-the-art methods still require a complexity that is at least \em quadratic in the number of nodes of graphs, even with approximations. In this paper, we propose a family of Krylov subspace based algorithms (\fasten) to speed up and scale up the computation of Sylvester equation for graph mining. The key idea of the proposed methods is to project the original equivalent linear system onto a Kronecker Krylov subspace. We further exploit (1) the implicit representation of the solution matrix as well as the associated computation, and (2) the decomposition of the original Sylvester equation into a set of inter-correlated Sylvester equations of smaller size. The proposed algorithms bear two distinctive features. First, they provide the \em exact solutions without any approximation error. Second, they significantly reduce the time and space complexity for solving Sylvester equation, with two of the proposed algorithms having a \em linear complexity in both time and space. Experimental evaluations on a diverse setmore »of real networks, demonstrate that our methods (1) are up to $10,000\times$ faster against Conjugate Gradient method, the best known competitor that outputs the exact solution, and (2) scale up to million-node graphs.« less