We consider the problem of sampling and approximately counting an arbitrary given motif H in a graph G, where access to G is given via queries: degree, neighbor, and pair, as well as uniform edge sample queries. Previous algorithms for these tasks were based on a decomposition of H into a collection of odd cycles and stars, denoted D^*(H) = {O_{k₁},...,O_{k_q}, S_{p₁},...,S_{p_𝓁}}. These algorithms were shown to be optimal for the case where H is a clique or an odd-length cycle, but no other lower bounds were known. We present a new algorithm for sampling arbitrary motifs which, up to poly(log n) factors, is always at least as good, and for most graphs G is strictly better. The main ingredient leading to this improvement is an improved uniform algorithm for sampling stars, which might be of independent interest, as it allows to sample vertices according to the p-th moment of the degree distribution. Finally, we prove that this algorithm is decomposition-optimal for decompositions that contain at least one odd cycle. These are the first lower bounds for motifs H with a nontrivial decomposition, i.e., motifs that have more than a single component in their decomposition.
more »
« less
Towards a Decomposition-Optimal Algorithm for Counting and Sampling Arbitrary Motifs in Sublinear Time
Counting and uniformly sampling motifs in a graph are fundamental algorithmic tasks with numerous applications across multiple fields. Since these problems are computationally expensive, recent efforts have focused on devising sublinear-time algorithms for these problems. We consider the model where the algorithm gets a constant size motif H and query access to a graph G, where the allowed queries are degree, neighbor, and pair queries, as well as uniform edge sample queries. In the sampling task, the algorithm is required to output a uniformly distributed copy of H in G (if one exists), and in the counting task it is required to output a good estimate to the number of copies of H in G. Previous algorithms for the uniform sampling task were based on a decomposition of H into a collection of odd cycles and stars, denoted D∗(H) = {Ok1 , ...,Okq , Sp1 , ..., Spℓ19 }. These algorithms were shown to be optimal for the case where H is a clique or an odd-length cycle, but no other lower bounds were known. We present a new algorithm for sampling arbitrary motifs which, up to poly(log n) factors, for any motif H whose decomposition contains at least two components or at least one star, is always preferable. The main ingredient leading to this improvement is an improved uniform algorithm for sampling stars, which might be of independent interest, as it allows to sample vertices according to the p-th moment of the degree distribution. We further show how to use our sampling algorithm to get an approximate counting algorithm, with essentially the same complexity. Finally, we prove that this algorithm is decomposition-optimal for decompositions that contain at least one odd cycle. That is, we prove that for any decomposition D that contains at least one odd cycle, there exists a motif HD 30 with decomposition D, and a family of graphs G, so that in order to output a uniform copy of H in a uniformly chosen graph in G, the number of required queries matches our upper bound. These are the first lower bounds for motifs H with a nontrivial decomposition, i.e., motifs that have more than a single component in their decomposition.
more »
« less
- PAR ID:
- 10279839
- Date Published:
- Journal Name:
- Random
- ISSN:
- 1198-8193
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Counting small subgraphs, referred to as motifs, in large graphs is a fundamental task in graph analysis, extensively studied across various contexts and computational models. In the sublinear-time regime, the relaxed problem of approximate counting has been explored within two prominent query frameworks: the standard model, which permits degree, neighbor, and pair queries, and the strictly more powerful augmented model, which additionally allows for uniform edge sampling. Currently, in the standard model, (opti- mal) results have been established only for approximately counting edges, stars, and cliques, all of which have a radius of one. This contrasts sharply with the state of affairs in the augmented model, where algorithmic results (some of which are optimal) are known for any input motif, leading to a disparity which we term the “scope gap" between the two models. In this work, we make significant progress in bridging this gap. Our approach draws inspiration from recent advancements in the augmented model and utilizes a framework centered on counting by uniform sampling, thus allowing us to establish new results in the standard model and simplify on previous results. In particular, our first, and main, contribution is a new algorithm in the standard model for approximately counting any Hamiltonian motif in sublinear time, where the complexity of the algorithm is the sum of two terms. One term equals the complexity of the known algorithms by Assadi, Kapralov, and Khanna (ITCS 2019) and Fichtenberger and Peng (ICALP 2020) in the (strictly stronger) augmented model and the other is an additional, necessary, additive overhead. Our second contribution is a variant of our algorithm that en- ables nearly uniform sampling of these motifs, a capability pre- viously limited in the standard model to edges and cliques. Our third contribution is to introduce even simpler algorithms for stars and cliques by exploiting their radius-one property. As a result, we simplify all previously known algorithms in the standard model for stars (Gonen, Ron, Shavitt (SODA 2010)), triangles (Eden, Levi, Ron Seshadhri (FOCS 2015)) and cliques (Eden, Ron, Seshadri (STOC 2018)).more » « less
-
null (Ed.)Counting homomorphisms of a constant sized pattern graph H in an input graph G is a fundamental computational problem. There is a rich history of studying the complexity of this problem, under various constraints on the input G and the pattern H. Given the significance of this problem and the large sizes of modern inputs, we investigate when near-linear time algorithms are possible. We focus on the case when the input graph has bounded degeneracy, a commonly studied and practically relevant class for homomorphism counting. It is known from previous work that for certain classes of H, H-homomorphisms can be counted exactly in near-linear time in bounded degeneracy graphs. Can we precisely characterize the patterns H for which near-linear time algorithms are possible? We completely resolve this problem, discovering a clean dichotomy using fine-grained complexity. Let m denote the number of edges in G. We prove the following: if the largest induced cycle in H has length at most 5, then there is an O(mlogm) algorithm for counting H-homomorphisms in bounded degeneracy graphs. If the largest induced cycle in H has length at least 6, then (assuming standard fine-grained complexity conjectures) there is a constant γ>0, such that there is no o(m1+γ) time algorithm for counting H-homomorphisms.more » « less
-
null (Ed.)Counting homomorphisms of a constant sized pattern graph H in an input graph G is a fundamental computational problem. There is a rich history of studying the complexity of this problem, under various constraints on the input G and the pattern H. Given the significance of this problem and the large sizes of modern inputs, we investigate when near-linear time algorithms are possible. We focus on the case when the input graph has bounded degeneracy, a commonly studied and practically relevant class for homomorphism counting. It is known from previous work that for certain classes of H, H-homomorphisms can be counted exactly in near-linear time in bounded degeneracy graphs. Can we precisely characterize the patterns H for which near-linear time algorithms are possible? We completely resolve this problem, discovering a clean dichotomy using fine-grained complexity. Let m denote the number of edges in G. We prove the following: if the largest induced cycle in H has length at most 5, then there is an O(m log m) algorithm for counting H-homomorphisms in bounded degeneracy graphs. If the largest induced cycle in H has length at least 6, then (assuming standard fine-grained complexity conjectures) there is a constant γ > 0, such that there is no o(m1+γ) time algorithm for counting H-homomorphisms.more » « less
-
null (Ed.)We introduce Tiered Sampling , a novel technique for estimating the count of sparse motifs in massive graphs whose edges are observed in a stream. Our technique requires only a single pass on the data and uses a memory of fixed size M , which can be magnitudes smaller than the number of edges. Our methods address the challenging task of counting sparse motifs—sub-graph patterns—that have a low probability of appearing in a sample of M edges in the graph, which is the maximum amount of data available to the algorithms in each step. To obtain an unbiased and low variance estimate of the count, we partition the available memory into tiers (layers) of reservoir samples. While the base layer is a standard reservoir sample of edges, other layers are reservoir samples of sub-structures of the desired motif. By storing more frequent sub-structures of the motif, we increase the probability of detecting an occurrence of the sparse motif we are counting, thus decreasing the variance and error of the estimate. While we focus on the designing and analysis of algorithms for counting 4-cliques, we present a method which allows generalizing Tiered Sampling to obtain high-quality estimates for the number of occurrence of any sub-graph of interest, while reducing the analysis effort due to specific properties of the pattern of interest. We present a complete analytical analysis and extensive experimental evaluation of our proposed method using both synthetic and real-world data. Our results demonstrate the advantage of our method in obtaining high-quality approximations for the number of 4 and 5-cliques for large graphs using a very limited amount of memory, significantly outperforming the single edge sample approach for counting sparse motifs in large scale graphs.more » « less
An official website of the United States government

