Butterflies are an important motif found in bipartite graphs that provide a structural way for finding dense regions within the graph. Beyond counting butterflies and enumerating them, other metrics and peeling for bipartite graphs are designed around counting butterfly motifs. The importance of counting butterflies has led to many works on efficient implementations for butterfly counting, given certain situational or hardware constraints. However, most algorithms are based on first counting the building block of the butterfly motif, and from that calculating the total possible number of butterflies in the graph. In this paper, using a linear algebra approach, we show that many provably correct algorithms for counting butterflies can be systematically derived. Moreover, we show how this formulation facilitates butterfly peeling algorithms that find the k-tip and k-wing subgraphs within a bipartite graph.
more »
« less
Butterfly Counting in Bipartite Networks
We consider the problem of counting motifs in bipartite affiliation networks, such as author-paper, user-product, and actor-movie relations. We focus on counting the number of occurrences of a "butterfly", a complete 2x2 biclique, the simplest cohesive higher-order structure in a bipartite graph. Our main contribution is a suite of randomized algorithms that can quickly approximate the number of butterflies in a graph with a provable guarantee on accuracy. An experimental evaluation on large real-world networks shows that our algorithms return accurate estimates within a few seconds, even for networks with trillions of butterflies and hundreds of millions of edges.
more »
« less
- Award ID(s):
- 1725702
- PAR ID:
- 10073154
- Date Published:
- Journal Name:
- Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
- Page Range / eLocation ID:
- 2150 to 2159
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Butterflies are the smallest non-trivial subgraph in bipartite graphs, and therefore having efficient computations for analyzing them is crucial to improving the quality of certain applications on bipartite graphs. In this paper, we design a framework called ParButterfly that contains new parallel algorithms for the following problems on processing butterflies: global counting, per-vertex counting, per-edge counting, tip decomposition (vertex peeling), and wing decomposition (edge peeling). The main component of these algorithms is aggregating wedges incident on subsets of vertices, and our framework supports different methods for wedge aggregation, including sorting, hashing, histogramming, and batching. In addition, ParButterfly supports different ways of ranking the vertices to speed up counting, including side ordering, approximate and exact degree ordering, and approximate and exact complement coreness ordering. For counting, ParButterfly also supports both exact computation as well as approximate computation via graph sparsification. We prove strong theoretical guarantees on the work and span of the algorithms in ParButterfly. We perform a comprehensive evaluation of all of the algorithms in ParButterfly on a collection of real-world bipartite graphs using a 48-core machine. Our counting algorithms obtain significant parallel speedup, outperforming the fastest sequential algorithms by up to 13.6× with a self-relative speedup of up to 38.5×. Compared to general subgraph counting solutions, we are orders of magnitude faster. Our peeling algorithms achieve self-relative speedups of up to 10.7× and outperform the fastest sequential baseline by up to several orders of magnitude.more » « less
-
Bringmann, Karl; Grohe, Martin; Puppis, Gabriele; Svensson, Ola (Ed.)We give a randomized algorithm that approximates the number of independent sets in a dense, regular bipartite graph - in the language of approximate counting, we give an FPRAS for #BIS on the class of dense, regular bipartite graphs. Efficient counting algorithms typically apply to "high-temperature" problems on bounded-degree graphs, and our contribution is a notable exception as it applies to dense graphs in a low-temperature setting. Our methods give a counting-focused complement to the long line of work in combinatorial optimization showing that CSPs such as Max-Cut and Unique Games are easy on dense graphs via spectral arguments. Our contributions include a novel extension of the method of graph containers that differs considerably from other recent low-temperature algorithms. The additional key insights come from spectral graph theory and have previously been successful in approximation algorithms. As a result, we can overcome some limitations that seem inherent to the aforementioned class of algorithms. In particular, we exploit the fact that dense, regular graphs exhibit a kind of small-set expansion (i.e., bounded threshold rank), which, via subspace enumeration, lets us enumerate small cuts efficiently.more » « less
-
Bipartite graphs are a powerful tool for modeling the interactions between two distinct groups. These bipartite relationships often feature small, recurring structural patterns called motifs which are building blocks for community structure. One promising structure is the induced 6-cycle which consists of three nodes on each node set forming a cycle where each node has exactly two edges. In this paper, we study the problem of counting and utilizing induced 6-cycles in large bipartite networks. We first consider two adaptations inspired by previous works for cycle counting in bipartite networks. Then, we introduce a new approach for node triplets which offer a systematic way to count the induced 6-cycles, used in BATCHTRIPLETJOIN. Our experimental evaluation shows that BATCHTRIPLETJOIN is significantly faster than the other algorithms while being scalable to large graph sizes and number of cores. On a network with 112M edges, BATCHTRIPLETJOIN is able to finish the computation in 78 mins by using 52 threads. In addition, we provide a new way to identify anomalous node triplets by comparing and contrasting the butterfly and induced 6-cycle counts of the nodes. We showcase several case studies on real-world networks from Amazon Kindle ratings, Steam game reviews, and Yelp ratings.more » « less
-
Censor-Hillel, Keren; Grandoni, Fabrizio; Ouaknine, Joel; Puppis, Gabriele (Ed.)We study the classic problem of subgraph counting, where we wish to determine the number of occurrences of a fixed pattern graph H in an input graph G of n vertices. Our focus is on bounded degeneracy inputs, a rich family of graph classes that also characterizes real-world massive networks. Building on the seminal techniques introduced by Chiba-Nishizeki (SICOMP 1985), a recent line of work has built subgraph counting algorithms for bounded degeneracy graphs. Assuming fine-grained complexity conjectures, there is a complete characterization of patterns H for which linear time subgraph counting is possible. For every r ≥ 6, there exists an H with r vertices that cannot be counted in linear time. In this paper, we initiate a study of subquadratic algorithms for subgraph counting on bounded degeneracy graphs. We prove that when H has at most 9 vertices, subgraph counting can be done in Õ(n^{5/3}) time. As a secondary result, we give improved algorithms for counting cycles of length at most 10. Previously, no subquadratic algorithms were known for the above problems on bounded degeneracy graphs. Our main conceptual contribution is a framework that reduces subgraph counting in bounded degeneracy graphs to counting smaller hypergraphs in arbitrary graphs. We believe that our results will help build a general theory of subgraph counting for bounded degeneracy graphs.more » « less
An official website of the United States government

