skip to main content

This content will become publicly available on January 1, 2023

Title: Privately Estimating Graph Parameters in Sublinear Time
We initiate a systematic study of algorithms that are both differentially-private and run in sublinear time for several problems in which the goal is to estimate natural graph parameters. Our main result is a differentially-private $(1+\rho)$-approximation algorithm for the problem of computing the average degree of a graph, for every $\rho>0$. The running time of the algorithm is roughly the same (for sparse graphs) as its non-private version proposed by Goldreich and Ron (Sublinear Algorithms, 2005). We also obtain the first differentially-private sublinear-time approximation algorithms for the maximum matching size and the minimum vertex cover size of a graph. An overarching technique we employ is the notion of \emph{coupled global sensitivity} of randomized algorithms. Related variants of this notion of sensitivity have been used in the literature in ad-hoc ways. Here we formalize the notion and develop it as a unifying framework for privacy analysis of randomized approximation algorithms.
Authors:
; ;
Editors:
Mikołaj Boja´nczyk, Emanuela Merelli
Award ID(s):
1931443 1910659 1910411
Publication Date:
NSF-PAR ID:
10322751
Journal Name:
49th International Colloquium on Automata, Languages, and Programming (ICALP 2022).
Sponsoring Org:
National Science Foundation
More Like this
  1. There has been a flurry of recent literature studying streaming algorithms for which the input stream is chosen adaptively by a black-box adversary who observes the output of the streaming algorithm at each time step. However, these algorithms fail when the adversary has access to the internal state of the algorithm, rather than just the output of the algorithm. We study streaming algorithms in the white-box adversarial model, where the stream is chosen adaptively by an adversary who observes the entire internal state of the algorithm at each time step. We show that nontrivial algorithms are still possible. We first give a randomized algorithm for the L1-heavy hitters problem that outperforms the optimal deterministic Misra-Gries algorithm on long streams. If the white-box adversary is computationally bounded, we use cryptographic techniques to reduce the memory of our L1-heavy hitters algorithm even further and to design a number of additional algorithms for graph, string, and linear algebra problems. The existence of such algorithms is surprising, as the streaming algorithm does not even have a secret key in this model, i.e., its state is entirely known to the adversary. One algorithm we design is for estimating the number of distinct elements in amore »stream with insertions and deletions achieving a multiplicative approximation and sublinear space; such an algorithm is impossible for deterministic algorithms. We also give a general technique that translates any two-player deterministic communication lower bound to a lower bound for randomized algorithms robust to a white-box adversary. In particular, our results show that for all p ≥ 0, there exists a constant Cp > 1 such that any Cp-approximation algorithm for Fp moment estimation in insertion-only streams with a white-box adversary requires Ω(n) space for a universe of size n. Similarly, there is a constant C > 1 such that any C-approximation algorithm in an insertion-only stream for matrix rank requires Ω(n) space with a white-box adversary. These results do not contradict our upper bounds since they assume the adversary has unbounded computational power. Our algorithmic results based on cryptography thus show a separation between computationally bounded and unbounded adversaries. Finally, we prove a lower bound of Ω(log n) bits for the fundamental problem of deterministic approximate counting in a stream of 0’s and 1’s, which holds even if we know how many total stream updates we have seen so far at each point in the stream. Such a lower bound for approximate counting with additional information was previously unknown, and in our context, it shows a separation between multiplayer deterministic maximum communication and the white-box space complexity of a streaming algorithm« less
  2. We consider the problem of designing sublinear time algorithms for estimating the cost of minimum] metric traveling salesman (TSP) tour. Specifically, given access to a n × n distance matrix D that specifies pairwise distances between n points, the goal is to estimate the TSP cost by performing only sublinear (in the size of D) queries. For the closely related problem of estimating the weight of a metric minimum spanning tree (MST), it is known that for any epsilon > 0, there exists an O^~(n/epsilon^O(1))-time algorithm that returns a (1+epsilon)-approximate estimate of the MST cost. This result immediately implies an O^~(n/epsilon^O(1)) time algorithm to estimate the TSP cost to within a (2 + epsilon) factor for any epsilon > 0. However, no o(n^2)-time algorithms are known to approximate metric TSP to a factor that is strictly better than 2. On the other hand, there were also no known barriers that rule out existence of (1 + epsilon)-approximate estimation algorithms for metric TSP with O^~ (n) time for any fixed epsilon > 0. In this paper, we make progress on both algorithms and lower bounds for estimating metric TSP cost. On the algorithmic side, we first consider the graphic TSP problemmore »where the metric D corresponds to shortest path distances in a connected unweighted undirected graph. We show that there exists an O^~(n) time algorithm that estimates the cost of graphic TSP to within a factor of (2 − epsilon_0) for some epsilon_0 > 0. This is the first sublinear cost estimation algorithm for graphic TSP that achieves an approximation factor less than 2. We also consider another well-studied special case of metric TSP, namely, (1, 2)-TSP where all distances are either 1 or 2, and give an O^~(n ^ 1.5) time algorithm to estimate optimal cost to within a factor of 1.625. Our estimation algorithms for graphic TSP as well as for (1, 2)-TSP naturally lend themselves to O^~(n) space streaming algorithms that give an 11/6-approximation for graphic TSP and a 1.625-approximation for (1, 2)-TSP. These results motivate the natural question if analogously to metric MST, for any epsilon > 0, (1 + epsilon)-approximate estimates can be obtained for graphic TSP and (1, 2)-TSP using O^~ (n) queries. We answer this question in the negative – there exists an epsilon_0 > 0, such that any algorithm that estimates the cost of graphic TSP ((1, 2)-TSP) to within a (1 + epsilon_0)-factor, necessarily requires (n^2) queries. This lower bound result highlights a sharp separation between the metric MST and metric TSP problems. Similarly to many classical approximation algorithms for TSP, our sublinear time estimation algorithms utilize subroutines for estimating the size of a maximum matching in the underlying graph. We show that this is not merely an artifact of our approach, and that for any epsilon > 0, any algorithm that estimates the cost of graphic TSP or (1, 2)-TSP to within a (1 + epsilon)-factor, can also be used to estimate the size of a maximum matching in a bipartite graph to within an epsilon n additive error. This connection allows us to translate known lower bounds for matching size estimation in various models to similar lower bounds for metric TSP cost estimation.« less
  3. We present an $\tilde O(m+n^{1.5})$-time randomized algorithm for maximum cardinality bipartite matching and related problems (e.g. transshipment, negative-weight shortest paths, and optimal transport) on $m$-edge, $n$-node graphs. For maximum cardinality bipartite matching on moderately dense graphs, i.e. $m = \Omega(n^{1.5})$, our algorithm runs in time nearly linear in the input size and constitutes the first improvement over the classic $O(m\sqrt{n})$-time [Dinic 1970; Hopcroft-Karp 1971; Karzanov 1973] and $\tilde O(n^\omega)$-time algorithms [Ibarra-Moran 1981] (where currently $\omega\approx 2.373$). On sparser graphs, i.e. when $m = n^{9/8 + \delta}$ for any constant $\delta>0$, our result improves upon the recent advances of [Madry 2013] and [Liu-Sidford 2020b, 2020a] which achieve an $\tilde O(m^{4/3+o(1)})$ runtime. We obtain these results by combining and advancing recent lines of research in interior point methods (IPMs) and dynamic graph algorithms. First, we simplify and improve the IPM of [v.d.Brand-Lee-Sidford-Song 2020], providing a general primal-dual IPM framework and new sampling-based techniques for handling infeasibility induced by approximate linear system solvers. Second, we provide a simple sublinear-time algorithm for detecting and sampling high-energy edges in electric flows on expanders and show that when combined with recent advances in dynamic expander decompositions, this yields efficient data structures for maintaining the iterates ofmore »both [v.d.Brand~et~al.] and our new IPMs. Combining this general machinery yields a simpler $\tilde O(n \sqrt{m})$ time algorithm for matching based on the logarithmic barrier function, and our state-of-the-art $\tilde O(m+n^{1.5})$ time algorithm for matching based on the [Lee-Sidford 2014] barrier (as regularized in [v.d.Brand~et~al.]).« less
  4. Current flow closeness centrality (CFCC) has a better discriminating ability than the ordinary closeness centrality based on shortest paths. In this paper, we extend this notion to a group of vertices in a weighted graph, and then study the problem of finding a subset S of k vertices to maximize its CFCC C(S), both theoretically and experimentally. We show that the problem is NP-hard, but propose two greedy algorithms for minimizing the reciprocal of C(S) with provable guarantees using the monotoncity and supermodularity. The first is a deterministic algorithm with an approximation factor (1−kk−1⋅1e) and cubic running time; while the second is a randomized algorithm with a (1−kk−1⋅1e−ϵ)-approximation and nearly-linear running time for any ϵ>0. Extensive experiments on model and real networks demonstrate that our algorithms are effective and efficient, with the second algorithm being scalable to massive networks with more than a million vertices.
  5. We study the communication cost (or message complexity) of fundamental distributed symmetry breaking problems, namely, coloring and MIS. While significant progress has been made in understanding and improving the running time of such problems, much less is known about the message complexity of these problems. In fact, all known algorithms need at least Ω(m) communication for these problems, where m is the number of edges in the graph. We addressthe following question in this paper: can we solve problems such as coloring and MIS using sublinear, i.e., o(m) communication, and if sounder what conditions? In a classical result, Awerbuch, Goldreich, Peleg, and Vainish [JACM 1990] showed that fundamental global problems such asbroadcast and spanning tree construction require at least o(m) messages in the KT-1 Congest model (i.e., Congest model in which nodes have initial knowledge of the neighbors' ID's) when algorithms are restricted to be comparison-based (i.e., algorithms inwhich node ID's can only be compared). Thirty five years after this result, King, Kutten, and Thorup [PODC 2015] showed that onecan solve the above problems using Õ(n) messages (n is the number of nodes in the graph) in Õ(n) rounds in the KT-1 Congest model if non-comparison-based algorithms are permitted. Anmore »important implication of this result is that one can use the synchronous nature of the KT-1 Congest model, using silence to convey information,and solve any graph problem using non-comparison-based algorithms with Õ(n) messages, but this takes an exponential number of rounds. In the asynchronous model, even this is not possible. In contrast, much less is known about the message complexity of local symmetry breaking problems such as coloring and MIS. Our paper fills this gap by presenting the following results. Lower bounds: In the KT-1 CONGEST model, we show that any comparison-based algorithm, even a randomized Monte Carlo algorithm with constant success probability, requires Ω(n 2) messages in the worst case to solve either (△ + 1)-coloring or MIS, regardless of the number of rounds. We also show that Ω(n) is a lower bound on the number ofmessages for any (△ + 1)-coloring or MIS algorithm, even non-comparison-based, and even with nodes having initial knowledge of up to a constant radius. Upper bounds: In the KT-1 CONGEST model, we present the following randomized non-comparison-based algorithms for coloring that, with high probability, use o(m) messages and run in polynomially many rounds.(a) A (△ + 1)-coloring algorithm that uses Õ(n1.5) messages, while running in Õ(D + √ n) rounds, where D is the graph diameter. Our result also implies an asynchronous algorithm for (△ + 1)-coloring with the same message bound but running in Õ(n) rounds. (b) For any constantε > 0, a (1+ε)△-coloring algorithm that uses Õ(n/ε 2 ) messages, while running in Õ(n) rounds. If we increase our input knowledge slightly to radius 2, i.e.,in the KT-2 CONGEST model, we obtain:(c) A randomized comparison-based MIS algorithm that uses Õ(n 1.5) messages. while running in Õ( √n) rounds. While our lower bound results can be viewed as counterparts to the classical result of Awerbuch, Goldreich, Peleg, and Vainish [JACM 90], but for local problems, our algorithms are the first-known algorithms for coloring and MIS that take o(m) messages and run in polynomially many rounds.« less