Pattern counting in graphs is fundamental to several network sci ence tasks, and there is an abundance of scalable methods for estimating counts of small patterns, often called motifs, in large graphs. However, modern graph datasets now contain richer structure, and incorporating temporal information in particular has become a key part of network analysis. Consequently, temporal motifs, which are generalizations of small subgraph patterns that incorporate temporal ordering on edges, are an emerging part of the network analysis toolbox. However, there are no algorithms for fast estimation of temporal motifs counts; moreover, we show that even counting simple temporal star motifs is NPcomplete. Thus, there is a need for fast and approximate algorithms. Here, we present the first frequency estimation algorithms for counting temporal motifs. More specifically, we develop a sampling framework that sits as a layer on top of existing exact counting algorithms and enables fast and accurate memoryefficient estimates of temporal motif counts. Our results show that we can achieve one to two orders of magnitude speedups over existing algorithms with minimal and controllable loss in accuracy on a number of datasets.
Retrieving Top Weighted Triangles in Graphs
Pattern counting in graphs is a fundamental primitive for many network analysis tasks, and there are several methods for scaling subgraph counting to large graphs. Many realworld networks have a notion of strength of connection between nodes, which is often modeled by a weighted graph, but existing scalable algorithms for pattern mining are designed for unweighted graphs. Here, we develop deterministic and random sampling algorithms that enable the fast discovery of the 3cliques (triangles) of largest weight, as measured by the generalized mean of the triangle’s edge weights. For example, one of our proposed algorithms can find the top1000 weighted triangles of a weighted graph with billions of edges in thirty seconds on a commodity server, which is orders of magnitude faster than existing “fast” enumeration schemes. Our methods open the door towards scalable pattern mining in weighted graphs.
 Award ID(s):
 1830274
 Publication Date:
 NSFPAR ID:
 10173811
 Journal Name:
 WSDM '20: Proceedings of the 13th International Conference on Web Search and Data Mining
 Page Range or eLocationID:
 295 to 303
 Sponsoring Org:
 National Science Foundation
More Like this


Triangle counting is a fundamental technique in network analysis, that has received much attention in various input models. The vast majority of triangle counting algorithms are targeted to static graphs. Yet, many realworld graphs are directed and temporal, where edges come with timestamps. Temporal triangles yield much more information, since they account for both the graph topology and the timestamps. Temporal triangle counting has seen a few recent results, but there are varying definitions of temporal triangles. In all cases, temporal triangle patterns enforce constraints on the time interval between edges (in the triangle). We define a general notion (δ1,3,δ1,2,δ2,3)temporal triangles that allows for separate time constraints for all pairs of edges. Our main result is a new algorithm, DOTTT (Degeneracy Oriented Temporal Triangle Totaler), that exactly counts all directed variants of (δ1,3,δ1,2,δ2,3)temporal triangles. Using the classic idea of degeneracy ordering with careful combinatorial arguments, we can prove that DOTTT runs in O(mκlogm) time, where m is the number of (temporal) edges and κ is the graph degeneracy (max core number). Up to log factors, this matches the running time of the best static triangle counters. Moreover, this running time is better than existing. DOTTT has excellent practical behavior andmore »

Triangle counting is a fundamental technique in network analysis, that has received much attention in various input models. The vast majority of triangle counting algorithms are targeted to static graphs. Yet, many realworld graphs are directed and temporal, where edges come with timestamps. Temporal triangles yield much more information, since they account for both the graph topology and the timestamps. Temporal triangle counting has seen a few recent results, but there are varying definitions of temporal triangles. In all cases, temporal triangle patterns enforce constraints on the time interval between edges (in the triangle). We define a general notion (δ1,3,δ1,2,δ2,3)temporal triangles that allows for separate time constraints for all pairs of edges. Our main result is a new algorithm, DOTTT (Degeneracy Oriented Temporal Triangle Totaler), that exactly counts all directed variants of (δ1,3,δ1,2,δ2,3)temporal triangles. Using the classic idea of degeneracy ordering with careful combinatorial arguments, we can prove that DOTTT runs in O(mκlogm) time, where m is the number of (temporal) edges and κ is the graph degeneracy (max core number). Up to log factors, this matches the running time of the best static triangle counters. Moreover, this running time is better than existing. DOTTT has excellent practical behavior andmore »

A massive amount of data generated today on platforms such as social networks, telecommunication networks, and the internet in general can be represented as graph streams. Activity in a network’s underlying graph generates a sequence of edges in the form of a stream; for example, a social network may generate a graph stream based on the interactions (edges) between different users (nodes) over time. While many graph mining algorithms have already been developed for analyzing relatively small graphs, graphs that begin to approach the size of realworld networks stress the limitations of such methods due to their dynamic nature and the substantial number of nodes and connections involved. In this paper we present GraphZip, a scalable method for mining interesting patterns in graph streams. GraphZip is inspired by the LempelZiv (LZ) class of compression algorithms, and uses a novel dictionarybased compression approach to discover maximally compressing patterns in a graph stream. We experimentally show that GraphZip is able to retrieve complex and insightful patterns from large realworld graphs and artificiallygenerated graphs with ground truth patterns. Additionally, our results demonstrate that GraphZip is both highly efficient and highly effective compared to existing stateoftheart methods for mining graph streams.

We study graph computations in an enhanced data streaming setting, where a spacebounded client reading the edge stream of a massive graph may delegate some of its work to a cloud service. We seek algorithms that allow the client to verify a purported proof sent by the cloud service that the work done in the cloud is correct. A line of work starting with Chakrabarti et al. (ICALP 2009) has provided such algorithms, which we call schemes, for several statistical and graphtheoretic problems, many of which exhibit a tradeoff between the length of the proof and the space used by the streaming verifier. This work designs new schemes for a number of basic graph problems  including triangle counting, maximum matching, topological sorting, and singlesource shortest paths  where past work had either failed to obtain smooth tradeoffs between these two key complexity measures or only obtained suboptimal tradeoffs. Our key innovation is having the verifier compute certain nonlinear sketches of the input stream, leading to either new or improved tradeoffs. In many cases, our schemes, in fact, provide optimal tradeoffs up to logarithmic factors. Specifically, for most graph problems that we study, it is known that the product ofmore »