skip to main content


Search for: All records

Creators/Authors contains: "Sudan, Madhu"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. We give an $\widetilde{O}(\sqrt{n})$-space single-pass 0.483-approximation streaming algorithm for estimating the maximum directed cut size (Max-DICUT) in a directed graph on n vertices. This improves over an $O(\log n)$-space $4 / 9 < 0.45$ approximation algorithm due to Chou, Golovnev, and Velusamy (FOCS 2020), which was known to be optimal for $o(\sqrt{n})$-space algorithms. Max-DICUT is a special case of a constraint satisfaction problem (CSP). In this broader context, we give the first CSP for which algorithms with $\widetilde{O}(\sqrt{n})$- space can provably outperform $o(\sqrt{n})$- space algorithms. The key technical contribution of our work is development of the notions of a first-order snapshot of a (directed) graph and of estimates of such snapshots. These snapshots can be used to simulate certain (non-streaming) Max-DICUT algorithms, including the “oblivious” algorithms introduced by Feige and Jozeph (Algorithmica, 2015), who showed that one such algorithm Previous work of the authors (SODA 2023) studied the restricted case of bounded-degree graphs, and observed that in this setting, it is straightforward to estimate the snapshot with $\ell_{1}$ errors and this suffices to simulate oblivious algorithms. But for unbounded-degree graphs, even defining an achievable and sufficient notion of estimation is subtle. We describe a new notion of snapshot estimation and prove its sufficiency using careful smoothing techniques, and then develop an algorithm which sketches such an estimate via a delicate process of intertwined vertex- and edge-subsampling. Prior to our work, the only streaming algorithms for any CSP on general instances were based on generalizations of the $O(\log n)$-space algorithm for Max-DICUT, and can roughly be characterized as based on “zeroth” order snapshots. Our work thus opens the possibility of a new class of algorithms for approximating CSPs by demonstrating that more sophisticated snapshots can outperform cruder ones in the case of Max-DICUT. 
    more » « less
    Free, publicly-accessible full text available November 6, 2024
  2. Megow, Nicole ; Smith, Adam (Ed.)
    We study the question of local testability of low (constant) degree functions from a product domain 𝒮_1 × … × 𝒮_n to a field 𝔽, where 𝒮_i ⊆ 𝔽 can be arbitrary constant sized sets. We show that this family is locally testable when the grid is "symmetric". That is, if 𝒮_i = 𝒮 for all i, there is a probabilistic algorithm using constantly many queries that distinguishes whether f has a polynomial representation of degree at most d or is Ω(1)-far from having this property. In contrast, we show that there exist asymmetric grids with |𝒮_1| = ⋯ = |𝒮_n| = 3 for which testing requires ω_n(1) queries, thereby establishing that even in the context of polynomials, local testing depends on the structure of the domain and not just the distance of the underlying code. The low-degree testing problem has been studied extensively over the years and a wide variety of tools have been applied to propose and analyze tests. Our work introduces yet another new connection in this rich field, by building low-degree tests out of tests for "junta-degrees". A function f:𝒮_1 × ⋯ × 𝒮_n → 𝒢, for an abelian group 𝒢 is said to be a junta-degree-d function if it is a sum of d-juntas. We derive our low-degree test by giving a new local test for junta-degree-d functions. For the analysis of our tests, we deduce a small-set expansion theorem for spherical/hamming noise over large grids, which may be of independent interest. 
    more » « less
  3. Nikhil, Bansal ; Nagarajan, Viswanath (Ed.)
    We initiate a study of the streaming complexity of constraint satisfaction problems (CSPs) when the constraints arrive in a random order. We show that there exists a CSP, namely Max-DICUT, for which random ordering makes a provable difference. Whereas a 4/9 ≈ 0.445 approximation of DICUT requires space with adversarial ordering, we show that with random ordering of constraints there exists a 0.483-approximation algorithm that only needs O(log n) space. We also give new algorithms for Max-DICUT in variants of the adversarial ordering setting. Specifically, we give a two-pass O(log n) space 0.483-approximation algorithm for general graphs and a single-pass space 0.483-approximation algorithm for bounded-degree graphs. On the negative side, we prove that CSPs where the satisfying assignments of the constraints support a one-wise independent distribution require -space for any non-trivial approximation, even when the constraints are randomly ordered. This was previously known only for adversarially ordered constraints. Extending the results to randomly ordered constraints requires switching the hard instances from a union of random matchings to simple Erdős-Renyi random (hyper)graphs and extending tools that can perform Fourier analysis on such instances. The only CSP to have been considered previously with random ordering is Max-CUT where the ordering is not known to change the approximability. Specifically it is known to be as hard to approximate with random ordering as with adversarial ordering, for space algorithms. Our results show a richer variety of possibilities and motivate further study of CSPs with randomly ordered constraints. 
    more » « less
  4. Kaiai, Yael Tauman (Ed.)
    Societal accumulation of knowledge is a complex process. The correctness of new units of knowledge depends not only on the correctness of new reasoning, but also on the correctness of old units that the new one builds on. The errors in such accumulation processes are often remedied by error correction and detection heuristics. Motivating examples include the scientific process based on scientific publications, and software development based on libraries of code. Natural processes that aim to keep errors under control, such as peer review in scientific publications, and testing and debugging in software development, would typically check existing pieces of knowledge - both for the reasoning that generated them and the previous facts they rely on. In this work, we present a simple process that models such accumulation of knowledge and study the persistence (or lack thereof) of errors. We consider a simple probabilistic model for the generation of new units of knowledge based on the preferential attachment growth model, which additionally allows for errors. Furthermore, the process includes checks aimed at catching these errors. We investigate when effects of errors persist forever in the system (with positive probability) and when they get rooted out completely by the checking process. The two basic parameters associated with the checking process are the probability of conducting a check and the depth of the check. We show that errors are rooted out if checks are sufficiently frequent and sufficiently deep. In contrast, shallow or infrequent checks are insufficient to root out errors. 
    more » « less
  5. Bojanczyk, Mikolaj ; Merelli, Emanuela ; Woodruff, David P. (Ed.)
    In this survey we describe progress over the last decade or so in understanding the complexity of solving constraint satisfaction problems (CSPs) approximately in the streaming and sketching models of computation. After surveying some of the results we give some sketches of the proofs and in particular try to explain why there is a tight dichotomy result for sketching algorithms working in subpolynomial space regime. 
    more » « less
  6. We study the log-rank conjecture from the perspective of point-hyperplane incidence geometry. We formulate the following conjecture: Given a point set in ℝ d that is covered by constant-sized sets of parallel hyperplanes, there exists an affine subspace that accounts for a large (i.e., 2 –polylog( d ) ) fraction of the incidences, in the sense of containing a large fraction of the points and being contained in a large fraction of the hyperplanes. In other words, the point-hyperplane incidence graph for such configurations has a large complete bipartite subgraph. Alternatively, our conjecture may be interpreted linear-algebraically as follows: Any rank- d matrix containing at most O (1) distinct entries in each column contains a submatrix of fractional size 2 –polylog( d ) , in which each column is constant. We prove that our conjecture is equivalent to the log-rank conjecture; the crucial ingredient of this proof is a reduction from bounds for parallel k -partitions to bounds for parallel ( k -1)-partitions. We also introduce an (apparent) strengthening of the conjecture, which relaxes the requirements that the sets of hyperplanes be parallel. Motivated by the connections above, we revisit well-studied questions in point-hyperplane incidence geometry without structural assumptions (i.e., the existence of partitions). We give an elementary argument for the existence of complete bipartite subgraphs of density Ω (ε 2 d / d ) in any d -dimensional configuration with incidence density ε, qualitatively matching previous results proved using sophisticated geometric techniques. We also improve an upper-bound construction of Apfelbaum and Sharir [ 2 ], yielding a configuration whose complete bipartite subgraphs are exponentially small and whose incidence density is Ω (1/√ d ). Finally, we discuss various constructions (due to others) of products of Boolean matrices which yield configurations with incidence density Ω (1) and complete bipartite subgraph density 2 -Ω (√ d ) , and pose several questions for this special case in the alternative language of extremal set combinatorics. Our framework and results may help shed light on the difficulty of improving Lovett’s Õ(√ rank( f )) bound [ 20 ] for the log-rank conjecture. In particular, any improvement on this bound would imply the first complete bipartite subgraph size bounds for parallel 3-partitioned configurations which beat our generic bounds for unstructured configurations. 
    more » « less
  7. Chakrabarti, Amit ; Swamy, Chaitanya (Ed.)
    We analyze the sketching approximability of constraint satisfaction problems on Boolean domains, where the constraints are balanced linear threshold functions applied to literals. In particular, we explore the approximability of monarchy-like functions where the value of the function is determined by a weighted combination of the vote of the first variable (the president) and the sum of the votes of all remaining variables. The pure version of this function is when the president can only be overruled by when all remaining variables agree. For every k ≥ 5, we show that CSPs where the underlying predicate is a pure monarchy function on k variables have no non-trivial sketching approximation algorithm in o(√n) space. We also show infinitely many weaker monarchy functions for which CSPs using such constraints are non-trivially approximable by O(log(n)) space sketching algorithms. Moreover, we give the first example of sketching approximable asymmetric Boolean CSPs. Our results work within the framework of Chou, Golovnev, Sudan, and Velusamy (FOCS 2021) that characterizes the sketching approximability of all CSPs. Their framework can be applied naturally to get a computer-aided analysis of the approximability of any specific constraint satisfaction problem. The novelty of our work is in using their work to get an analysis that applies to infinitely many problems simultaneously. 
    more » « less
  8. Leonardi, Stefano ; Gupta, Anupam (Ed.)
    We consider the approximability of constraint satisfaction problems in the streaming setting. For every constraint satisfaction problem (CSP) on n variables taking values in {0,…,q−1}, we prove that improving over the trivial approximability by a factor of q requires Ω(n) space even on instances with O(n) constraints. We also identify a broad subclass of problems for which any improvement over the trivial approximability requires Ω(n) space. The key technical core is an optimal, q−(k−1)-inapproximability for the Max k-LIN-mod q problem, which is the Max CSP problem where every constraint is given by a system of k−1 linear equations mod q over k variables. Our work builds on and extends the breakthrough work of Kapralov and Krachun (Proc. STOC 2019) who showed a linear lower bound on any non-trivial approximation of the MaxCut problem in graphs. MaxCut corresponds roughly to the case of Max k-LIN-mod q with k=q=2. For general CSPs in the streaming setting, prior results only yielded Ω(√n) space bounds. In particular no linear space lower bound was known for an approximation factor less than 1/2 for any CSP. Extending the work of Kapralov and Krachun to Max k-LIN-mod q to k>2 and q>2 (while getting optimal hardness results) is the main technical contribution of this work. Each one of these extensions provides non-trivial technical challenges that we overcome in this work. 
    more » « less
  9. Arıkan’s exciting discovery of polar codes has provided an altogether new way to efficiently achieve Shannon capacity. Given a (constant-sized) invertible matrix M , a family of polar codes can be associated with this matrix and its ability to approach capacity follows from the polarization of an associated [0, 1]-bounded martingale, namely its convergence in the limit to either 0 or 1 with probability 1. Arıkan showed appropriate polarization of the martingale associated with the matrix ( G 2 = ( 1 1 0 1) to get capacity achieving codes. His analysis was later extended to all matrices M that satisfy an obvious necessary condition for polarization. While Arıkan’s theorem does not guarantee that the codes achieve capacity at small blocklengths (specifically in length, which is a polynomial in ( 1ε ) where (ε) is the difference between the capacity of a channel and the rate of the code), it turns out that a “strong” analysis of the polarization of the underlying martingale would lead to such constructions. Indeed for the martingale associated with ( G 2 ) such a strong polarization was shown in two independent works (Guruswami and Xia (IEEE IT’15) and Hassani et al. (IEEE IT’14)), thereby resolving a major theoretical challenge associated with the efficient attainment of Shannon capacity. In this work we extend the result above to cover martingales associated with all matrices that satisfy the necessary condition for (weak) polarization. In addition to being vastly more general, our proofs of strong polarization are (in our view) also much simpler and modular. Key to our proof is a notion of local polarization that only depends on the evolution of the martingale in a single time step. We show that local polarization always implies strong polarization. We then apply relatively simple reasoning about conditional entropies to prove local polarization in very general settings. Specifically, our result shows strong polarization over all prime fields and leads to efficient capacity-achieving source codes for compressing arbitrary i.i.d. sources, and capacity-achieving channel codes for arbitrary symmetric memoryless channels. We show how to use our analyses to achieve exponentially small error probabilities at lengths inverse polynomial in the gap to capacity. Indeed we show that we can essentially match any error probability while maintaining lengths that are only inverse polynomial in the gap to capacity. 
    more » « less