Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

Free, publiclyaccessible full text available June 2, 2024

Over the last two decades, frameworks for distributedmemory parallel computation, such as MapReduce, Hadoop, Spark and Dryad, have gained significant popularity with the growing prevalence of large network datasets. The Massively Parallel Computation (MPC) model is the defacto standard for studying graph algorithms in these frameworks theoretically. Subgraph counting is one such fundamental problem in analyzing massive graphs, with the main algorithmic challenges centering on designing methods which are both scalable and accurate. Given a graph G = (V, E) with n vertices, m edges and T triangles, our first result is an algorithm that outputs a (1+ε)approximation to T, with asymptotically optimal round and total space complexity provided any S ≥ max{(√ m, n²/m)} space per machine and assuming T = Ω(√{m/n}). Our result gives a quadratic improvement on the bound on T over previous works. We also provide a simple extension of our result to counting any subgraph of k size for constant k ≥ 1. Our second result is an O_δ(log log n)round algorithm for exactly counting the number of triangles, whose total space usage is parametrized by the arboricity α of the input graph. We extend this result to exactly counting kcliques for any constant k. Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri (ITCS 2020) for exactly counting all subgraphs of size at most 5 can be implemented in the MPC model in Õ_δ(√{log n}) rounds, O(n^δ) space per machine and O(mα³) total space. In addition to our theoretical results, we simulate our triangle counting algorithms in realworld graphs obtained from the Stanford Network Analysis Project (SNAP) database. Our results show that both our approximate and exact counting algorithms exhibit improvements in terms of round complexity and approximation ratio, respectively, compared to two previous widely used algorithms for these problems.more » « less

Recent work shows that the expressive power of Graph Neural Networks (GNNs) in distinguishing nonisomorphic graphs is exactly the same as that of the WeisfeilerLehman (WL) graph test. In particular, they show that the WL test can be simulated by GNNs. However, those simulations involve neural networks for the “combine” function of size polynomial or even exponential in the number of graph nodes n, as well as feature vectors of length linear in n. We present an improved simulation of the WL test on GNNs with exponentially lower complexity. In particular, the neural network implementing the combine function in each node has only polylog(n) parameters, and the feature vectors exchanged by the nodes of GNN consists of only O(log n) bits. We also give logarithmic lower bounds for the feature vector length and the size of the neural networks, showing the (near)optimality of our construction.more » « less

We consider the problem of estimating the number of distinct elements in a large data set (or, equivalently, the support size of the distribution induced by the data set) from a random sample of its elements. The problem occurs in many applications, including biology, genomics, computer systems and linguistics. A line of research spanning the last decade resulted in algorithms that estimate the support up to ±εn from a sample of size O(log2(1/ε)·n/logn), where n is the data set size. Unfortunately, this bound is known to be tight, limiting further improvements to the complexity of this problem. In this paper we consider estimation algorithms augmented with a machinelearningbased predictor that, given any element, returns an estimation of its frequency. We show that if the predictor is correct up to a constant approximation factor, then the sample complexity can be reduced significantly, to log(1/ε)·n1−Θ(1/log(1/ε)).We evaluate the proposed algorithms on a collection of data sets, using the neuralnetwork based estimators from Hsu et al, ICLR’19 as predictors. Our experiments demonstrate substantial (up to 3x) improvements in the estimation accuracy compared to the state of the art algorithm.more » « less

null (Ed.)A probability distribution over the Boolean cube is monotone if flipping the value of a coordinate from zero to one can only increase the probability of an element. Given samples of an unknown monotone distribution over the Boolean cube, we give (to our knowledge) the first algorithm that learns an approximation of the distribution in statistical distance using a number of samples that is sublinear in the domain. To do this, we develop a structural lemma describing monotone probability distributions. The structural lemma has further implications to the sample complexity of basic testing tasks for analyzing monotone probability distributions over the Boolean cube: We use it to give nontrivial upper bounds on the tasks of estimating the distance of a monotone distribution to uniform and of estimating the support size of a monotone distribution. In the setting of monotone probability distributions over the Boolean cube, our algorithms are the first to have sample complexity lower than known lower bounds for the same testing tasks on arbitrary (not necessarily monotone) probability distributions. One further consequence of our learning algorithm is an improved sample complexity for the task of testing whether a distribution on the Boolean cube is monotone.more » « less

A probability distribution over the Boolean cube is monotone if flipping the value of a coordinate from zero to one can only increase the probability of an element. Given samples of an unknown monotone distribution over the Boolean cube, we give (to our knowledge) the first algorithm that learns an approximation of the distribution in statistical distance using a number of samples that is sublinear in the domain. To do this, we develop a structural lemma describing monotone probability distributions. The structural lemma has further implications to the sample complexity of basic testing tasks for analyzing monotone probability distributions over the Boolean cube: We use it to give nontrivial upper bounds on the tasks of estimating the distance of a monotone distribution to uniform and of estimating the support size of a monotone distribution. In the setting of monotone probability distributions over the Boolean cube, our algorithms are the first to have sample complexity lower than known lower bounds for the same testing tasks on arbitrary (not necessarily monotone) probability distributions. One further consequence of our learning algorithm is an improved sample complexity for the task of testing whether a distribution on the Boolean cube is monotone.more » « less

Constructing a spanning tree of a graph is one of the most basic tasks in graph theory. We consider a relaxed version of this problem in the setting of local algorithms. The relaxation is that the constructed subgraph is a sparse spanning subgraph containing at most (1+ϵ)n edges (where n is the number of vertices and ϵ is a given approximation/sparsity parameter). In the local setting, the goal is to quickly determine whether a given edge e belongs to such a subgraph, without constructing the whole subgraph, but rather by inspecting (querying) the local neighborhood of e. The challenge is to maintain consistency. That is, to provide answers concerning different edges according to the same spanning subgraph. We first show that for general boundeddegree graphs, the query complexity of any such algorithm must be Ω(n−−√). This lower bound holds for constantdegree graphs that have high expansion. Next we design an algorithm for (boundeddegree) graphs with high expansion, obtaining a result that roughly matches the lower bound. We then turn to study graphs that exclude a fixed minor (and are hence nonexpanding). We design an algorithm for such graphs, which may have an unbounded maximum degree. The query complexity of this algorithm is poly(1/ϵ,h) (independent of n and the maximum degree), where h is the number of vertices in the excluded minor. Though our two algorithms are designed for very different types of graphs (and have very different complexities), on a highlevel there are several similarities, and we highlight both the similarities and the differences.more » « less