skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Algorithms Approaching the Threshold for Semi-random Planted Clique
We design new polynomial-time algorithms for recovering planted cliques in the semi-random graph model introduced by Feige and Kilian 2001. The previous best algorithms for this model succeed if the planted clique has size at least n2/3 in a graph with n vertices (Mehta, Mckenzie, Trevisan 2019 and Charikar, Steinhardt, Valiant 2017). Our algorithms work for planted-clique sizes approaching n1/2 -- the information-theoretic threshold in the semi-random model (Steinhardt 2017) and a conjectured computational threshold even in the easier fully-random model. This result comes close to resolving open questions by Feige 2019 and Steinhardt 2017. Our algorithms are based on higher constant degree sum-of-squares relaxation and rely on a new conceptual connection that translates certificates of upper bounds on biclique numbers in unbalanced bipartite Erdős--Rényi random graphs into algorithms for semi-random planted clique. The use of a higher-constant degree sum-of-squares is essential in our setting: we prove a lower bound on the basic SDP for certifying bicliques that shows that the basic SDP cannot succeed for planted cliques of size k=o(n2/3). We also provide some evidence that the information-computation trade-off of our current algorithms may be inherent by proving an average-case lower bound for unbalanced bicliques in the low-degree-polynomials model.  more » « less
Award ID(s):
2211971
PAR ID:
10435077
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
ACM Symposium on Theory of Computing, STOC
Issue:
2023
Page Range / eLocation ID:
1918 to 1926
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Consider a system of m polynomial equations {pi(x)=bi}i≤m of degree D≥2 in n-dimensional variable x∈ℝn such that each coefficient of every pi and bis are chosen at random and independently from some continuous distribution. We study the basic question of determining the smallest m -- the algorithmic threshold -- for which efficient algorithms can find refutations (i.e. certificates of unsatisfiability) for such systems. This setting generalizes problems such as refuting random SAT instances, low-rank matrix sensing and certifying pseudo-randomness of Goldreich's candidate generators and generalizations. We show that for every d∈ℕ, the (n+m)O(d)-time canonical sum-of-squares (SoS) relaxation refutes such a system with high probability whenever m≥O(n)⋅(nd)D−1. We prove a lower bound in the restricted low-degree polynomial model of computation which suggests that this trade-off between SoS degree and the number of equations is nearly tight for all d. We also confirm the predictions of this lower bound in a limited setting by showing a lower bound on the canonical degree-4 sum-of-squares relaxation for refuting random quadratic polynomials. Together, our results provide evidence for an algorithmic threshold for the problem at m≳O˜(n)⋅n(1−δ)(D−1) for 2nδ-time algorithms for all δ. 
    more » « less
  2. We present an exact algorithm for graph coloring and maximum clique problems based on SAT technology. It relies on four sub-algorithms that alternatingly compute cliques of larger size and colorings with fewer colors. We show how these techniques can mutually help each other: larger cliques facilitate finding smaller colorings, which in turn can boost finding larger cliques. We evaluate our approach on the DIMACS graph coloring suite. For finding maximum cliques, we show that our algorithm can improve the state-of-the-art MaxSAT-based solver IncMaxCLQ, and for the graph coloring problem, we close two open instances, decrease two upper bounds, and increase one lower bound. 
    more » « less
  3. Quasi-cliques are dense incomplete subgraphs of a graph that generalize the notion of cliques. Enumerating quasi-cliques from a graph is a robust way to detect densely connected structures with applications in bioinformatics and social network analysis. However, enumerating quasi-cliques in a graph is a challenging problem, even harder than the problem of enumerating cliques. We consider the enumeration of top- k degree-based quasi-cliques and make the following contributions: (1) we show that even the problem of detecting whether a given quasi-clique is maximal (i.e., not contained within another quasi-clique) is NP-hard. (2) We present a novel heuristic algorithm K ernel QC to enumerate the k largest quasi-cliques in a graph. Our method is based on identifying kernels of extremely dense subgraphs within a graph, followed by growing subgraphs around these kernels, to arrive at quasi-cliques with the required densities. (3) Experimental results show that our algorithm accurately enumerates quasi-cliques from a graph, is much faster than current state-of-the-art methods for quasi-clique enumeration (often more than three orders of magnitude faster), and can scale to larger graphs than current methods. 
    more » « less
  4. Given a user-specified minimum degree threshold γ, a γ-quasi-clique is a subgraph where each vertex connects to at least γ fraction of the other vertices. Quasi-clique is a natural definition for dense structures, so finding large and hence statistically significant quasi-cliques is useful in applications such as community detection in social networks and discovering significant biomolecule structures and pathways. However, mining maximal quasi-cliques is notoriously expensive, and even a recent algorithm for mining large maximal quasi-cliques is flawed and can lead to a lot of repeated searches. This paper proposes a parallel solution for mining maximal quasi-cliques that is able to fully utilize CPU cores. Our solution utilizes divide and conquer to decompose the workloads into independent tasks for parallel mining, and we addressed the problem of (i) drastic load imbalance among different tasks and (ii) difficulty in predicting the task running time and the time growth with task subgraph size, by (a) using a timeout-based task decomposition strategy, and by (b) utilizing a priority task queue to schedule long-running tasks earlier for mining and decomposition to avoid stragglers. Unlike our conference version in PVLDB 2020 where the solution was built on a distributed graph mining framework called G-thinker, this paper targets a single-machine multi-core environment which is more accessible to an average end user. A general framework called T-thinker is developed to facilitate the programming of parallel programs for algorithms that adopt divide and conquer, including but not limited to our quasi-clique mining algorithm. Additionally, we consider the problem of directly mining large quasi-cliques from dense parts of a graph, where we identify the repeated search issue of a recent method and address it using a carefully designed concurrent trie data structure. Extensive experiments verify that our parallel solution scales well with the number of CPU cores, achieving 26.68× runtime speedup when mining a graph with 3.77M vertices and 16.5M edges with 32 mining threads. Additionally, mining large quasi-cliques from dense parts can provide an additional speedup of up to 89.46×. 
    more » « less
  5. Clique-counting is a fundamental problem that has application in many areas eg. dense subgraph discovery, community detection, spam detection, etc. The problem of k-clique-counting is difficult because as k increases, the number of k-cliques goes up exponentially. Enumeration algorithms (even parallel ones) fail to count k-cliques beyond a small k. Approximation algorithms, like TuránShadow have been shown to perform well upto k = 10, but are inefficient for larger cliques. The recently proposed Pivoter algorithm significantly improved the state-of-the-art and was able to give exact counts of all k-cliques in a large number of graphs. However, the clique counts of some graphs (for example, com-lj) are still out of reach of these algorithms. We revisit the TuránShadow algorithm and propose a generalized framework called YACC that leverages several insights about real-world graphs to achieve faster clique-counting. The bottleneck in TuránShadow is a recursive subroutine whose stopping condition is based on a classic result from extremal combinatorics called Turán's theorem. This theorem gives a lower bound for the k-clique density in a subgraph in terms of its edge density. However, this stopping condition is based on a worst-case graph that does not reflect the nature of real-world graphs. Using techniques for quickly discovering dense subgraphs, we relax the stopping condition in a systematic way such that we get a smaller recursion tree while still maintaining the guarantees provided by TuránShadow. We deploy our algorithm on several real-world data sets and show that YACC reduces the size of the recursion tree and the running time by over an order of magnitude. Using YACC, we are able to obtain clique counts for several graphs for which clique-counting was infeasible before, including com-lj. 
    more » « less