Consider a system of m polynomial equations {pi(x)=bi}i≤m of degree D≥2 in ndimensional variable x∈ℝn such that each coefficient of every pi and bis are chosen at random and independently from some continuous distribution. We study the basic question of determining the smallest m  the algorithmic threshold  for which efficient algorithms can find refutations (i.e. certificates of unsatisfiability) for such systems. This setting generalizes problems such as refuting random SAT instances, lowrank matrix sensing and certifying pseudorandomness of Goldreich's candidate generators and generalizations.
We show that for every d∈ℕ, the (n+m)O(d)time canonical sumofsquares (SoS) relaxation refutes such a system with high probability whenever m≥O(n)⋅(nd)D−1. We prove a lower bound in the restricted lowdegree polynomial model of computation which suggests that this tradeoff between SoS degree and the number of equations is nearly tight for all d. We also confirm the predictions of this lower bound in a limited setting by showing a lower bound on the canonical degree4 sumofsquares relaxation for refuting random quadratic polynomials. Together, our results provide evidence for an algorithmic threshold for the problem at m≳O˜(n)⋅n(1−δ)(D−1) for 2nδtime algorithms for all δ.
more »
« less
Algorithms Approaching the Threshold for Semirandom Planted Clique
We design new polynomialtime algorithms for recovering planted cliques in the semirandom graph model introduced by Feige and Kilian 2001. The previous best algorithms for this model succeed if the planted clique has size at least n2/3 in a graph with n vertices (Mehta, Mckenzie, Trevisan 2019 and Charikar, Steinhardt, Valiant 2017). Our algorithms work for plantedclique sizes approaching n1/2  the informationtheoretic threshold in the semirandom model (Steinhardt 2017) and a conjectured computational threshold even in the easier fullyrandom model. This result comes close to resolving open questions by Feige 2019 and Steinhardt 2017.
Our algorithms are based on higher constant degree sumofsquares relaxation and rely on a new conceptual connection that translates certificates of upper bounds on biclique numbers in unbalanced bipartite ErdősRényi random graphs into algorithms for semirandom planted clique. The use of a higherconstant degree sumofsquares is essential in our setting: we prove a lower bound on the basic SDP for certifying bicliques that shows that the basic SDP cannot succeed for planted cliques of size k=o(n2/3). We also provide some evidence that the informationcomputation tradeoff of our current algorithms may be inherent by proving an averagecase lower bound for unbalanced bicliques in the lowdegreepolynomials model.
more »
« less
 Award ID(s):
 2211971
 NSFPAR ID:
 10435077
 Date Published:
 Journal Name:
 ACM Symposium on Theory of Computing, STOC
 Issue:
 2023
 Page Range / eLocation ID:
 1918 to 1926
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


Given a userspecified minimum degree threshold γ, a γquasiclique is a subgraph where each vertex connects to at least γ fraction of the other vertices. Quasiclique is a natural definition for dense structures, so finding large and hence statistically significant quasicliques is useful in applications such as community detection in social networks and discovering significant biomolecule structures and pathways. However, mining maximal quasicliques is notoriously expensive, and even a recent algorithm for mining large maximal quasicliques is flawed and can lead to a lot of repeated searches. This paper proposes a parallel solution for mining maximal quasicliques that is able to fully utilize CPU cores. Our solution utilizes divide and conquer to decompose the workloads into independent tasks for parallel mining, and we addressed the problem of (i) drastic load imbalance among different tasks and (ii) difficulty in predicting the task running time and the time growth with task subgraph size, by (a) using a timeoutbased task decomposition strategy, and by (b) utilizing a priority task queue to schedule longrunning tasks earlier for mining and decomposition to avoid stragglers. Unlike our conference version in PVLDB 2020 where the solution was built on a distributed graph mining framework called Gthinker, this paper targets a singlemachine multicore environment which is more accessible to an average end user. A general framework called Tthinker is developed to facilitate the programming of parallel programs for algorithms that adopt divide and conquer, including but not limited to our quasiclique mining algorithm. Additionally, we consider the problem of directly mining large quasicliques from dense parts of a graph, where we identify the repeated search issue of a recent method and address it using a carefully designed concurrent trie data structure. Extensive experiments verify that our parallel solution scales well with the number of CPU cores, achieving 26.68× runtime speedup when mining a graph with 3.77M vertices and 16.5M edges with 32 mining threads. Additionally, mining large quasicliques from dense parts can provide an additional speedup of up to 89.46×.more » « less

null (Ed.)The GilbertVarshamov bound (nonconstructively) establishes the existence of binary codes of distance 1/2ε and rate Ω(ε 2 ) (where an upper bound of O(ε 2 log(1/ε)) is known). TaShma [STOC 2017] gave an explicit construction of εbalanced binary codes, where any two distinct codewords are at a distance between 1/2ε/2 and 1/2+ε/2, achieving a near optimal rate of Ω(ε 2+β ), where β→ 0 as ε→ 0. We develop unique and list decoding algorithms for (a slight modification of) the family of codes constructed by TaShma, in the adversarial error model. We prove the following results for εbalanced codes with block length N and rate Ω(ε 2+β ) in this family: For all , there are explicit codes which can be uniquely decoded up to an error of half the minimum distance in time N Oε,β(1) . For any fixed constant β independent of ε, there is an explicit construction of codes which can be uniquely decoded up to an error of half the minimum distance in time (log(1/ε)) O(1) ·N Oβ(1) . For any , there are explicit εbalanced codes with rate Ω(ε 2+β ) which can be list decoded up to error 1/2ε ' in time N Oε,ε' ,β(1), where ε ' ,β→ 0 as ε→ 0. The starting point of our algorithms is the framework for list decoding directsum codes develop in Alev et al. [SODA 2020], which uses the SumofSquares SDP hierarchy. The rates obtained there were quasipolynomial in ε. Here, we show how to overcome the far from optimal rates of this framework obtaining unique decoding algorithms for explicit binary codes of near optimal rate. These codes are based on simple modifications of TaShma's construction.more » « less

null (Ed.)Given a userspecified minimum degree threshold γ, a γquasiclique is a subgraph g = (Vg, Eg) where each vertex ν ∈ Vg connects to at least γ fraction of the other vertices (i.e., ⌈γ · (Vg 1)⌉ vertices) in g. Quasiclique is one of the most natural definitions for dense structures useful in finding communities in social networks and discovering significant biomolecule structures and pathways. However, mining maximal quasicliques is notoriously expensive. In this paper, we design parallel algorithms for mining maximal quasicliques on Gthinker, a distributed graph mining framework that decomposes mining into computeintensive tasks to fully utilize CPU cores. We found that directly using Gthinker results in the straggler problem due to (i) the drastic load imbalance among different tasks and (ii) the difficulty of predicting the task running time. We address these challenges by redesigning Gthinker's execution engine to prioritize longrunning tasks for execution, and by utilizing a novel timeout strategy to effectively decompose longrunning tasks to improve load balancing. While this system redesign applies to many other expensive dense subgraph mining problems, this paper verifies the idea by adapting the stateoftheart quasiclique algorithm, Quick, to our redesigned Gthinker. Extensive experiments verify that our new solution scales well with the number of CPU cores, achieving 201× runtime speedup when mining a graph with 3.77M vertices and 16.5M edges in a 16node cluster.more » « less

Cliquecounting is a fundamental problem that has application in many areas eg. dense subgraph discovery, community detection, spam detection, etc. The problem of kcliquecounting is difficult because as k increases, the number of kcliques goes up exponentially. Enumeration algorithms (even parallel ones) fail to count kcliques beyond a small k. Approximation algorithms, like TuránShadow have been shown to perform well upto k = 10, but are inefficient for larger cliques. The recently proposed Pivoter algorithm significantly improved the stateoftheart and was able to give exact counts of all kcliques in a large number of graphs. However, the clique counts of some graphs (for example, comlj) are still out of reach of these algorithms. We revisit the TuránShadow algorithm and propose a generalized framework called YACC that leverages several insights about realworld graphs to achieve faster cliquecounting. The bottleneck in TuránShadow is a recursive subroutine whose stopping condition is based on a classic result from extremal combinatorics called Turán's theorem. This theorem gives a lower bound for the kclique density in a subgraph in terms of its edge density. However, this stopping condition is based on a worstcase graph that does not reflect the nature of realworld graphs. Using techniques for quickly discovering dense subgraphs, we relax the stopping condition in a systematic way such that we get a smaller recursion tree while still maintaining the guarantees provided by TuránShadow. We deploy our algorithm on several realworld data sets and show that YACC reduces the size of the recursion tree and the running time by over an order of magnitude. Using YACC, we are able to obtain clique counts for several graphs for which cliquecounting was infeasible before, including comlj.more » « less