Dense subgraph discovery (DSD) is a key primitive in graph mining that typically deals with extracting cliques and near-cliques. In this paper, we revisit the optimal quasi-clique (OQC) formulation for DSD and establish that it is NP--hard. In addition, we reveal the hitherto unknown property that OQC can be used to explore the entire spectrum of densest subgraphs of all distinct sizes by appropriately varying a single hyperparameter, thereby forging an intimate link with the classic densest-k-subgraph problem (DkS). We corroborate these findings on real-world graphs by applying the simple greedy algorithm for OQC with improved hyperparameter tuning, to quickly generate high-quality approximations of the size-density frontier. Our findings indicate that OQC not only extracts high quality (near)-cliques, but also large and loosely-connected subgraphs that exhibit well defined local community structure. The latter discovery is particularly intriguing, since OQC is not explicitly geared towards community detection.
more »
« less
The Triangle-Densest-K-Subgraph Problem: Hardness, Lovász Extension, and Application to Document Summarization
We introduce the triangle-densest-K-subgraph problem (TDKS) for undirected graphs: given a size parameter K, compute a subset of K vertices that maximizes the number of induced triangles. The problem corresponds to the simplest generalization of the edge based densest-K-subgraph problem (DKS) to the case of higher-order network motifs. We prove that TDKS is NP-hard and is not amenable to efficient approximation, in the worst-case. By judiciously exploiting the structure of the problem, we propose a relaxation algorithm for the purpose of obtaining high-quality, sub-optimal solutions. Our approach utilizes the fact that the cost function of TDKS is submodular to construct a convex relaxation for the problem based on the Lovász extension for submodular functions. We demonstrate that our approaches attain state-of-the-art performance on real-world graphs and can offer substantially improved exploration of the optimal density-size curve compared to sophisticated approximation baselines for DKS. We use document summarization to showcase why TDKS is a useful generalization of DKS.
more »
« less
- Award ID(s):
- 1908070
- PAR ID:
- 10357457
- Date Published:
- Journal Name:
- Proceedings of the AAAI Conference on Artificial Intelligence
- Volume:
- 36
- Issue:
- 4
- ISSN:
- 2159-5399
- Page Range / eLocation ID:
- 4075 to 4082
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Megow, Nicole; Smith, Adam (Ed.)Maximum weight independent set (MWIS) admits a 1/k-approximation in inductively k-independent graphs [Karhan Akcoglu et al., 2002; Ye and Borodin, 2012] and a 1/(2k)-approximation in k-perfectly orientable graphs [Kammer and Tholey, 2014]. These are a parameterized class of graphs that generalize k-degenerate graphs, chordal graphs, and intersection graphs of various geometric shapes such as intervals, pseudo-disks, and several others [Ye and Borodin, 2012; Kammer and Tholey, 2014]. We consider a generalization of MWIS to a submodular objective. Given a graph G = (V,E) and a non-negative submodular function f: 2^V → ℝ_+, the goal is to approximately solve max_{S ∈ ℐ_G} f(S) where ℐ_G is the set of independent sets of G. We obtain an Ω(1/k)-approximation for this problem in the two mentioned graph classes. The first approach is via the multilinear relaxation framework and a simple contention resolution scheme, and this results in a randomized algorithm with approximation ratio at least 1/e(k+1). This approach also yields parallel (or low-adaptivity) approximations. Motivated by the goal of designing efficient and deterministic algorithms, we describe two other algorithms for inductively k-independent graphs that are inspired by work on streaming algorithms: a preemptive greedy algorithm and a primal-dual algorithm. In addition to being simpler and faster, these algorithms, in the monotone submodular case, yield the first deterministic constant factor approximations for various special cases that have been previously considered such as intersection graphs of intervals, disks and pseudo-disks.more » « less
-
Given a graph and an integer k, Densest k-Subgraph is the algorithmic task of finding the subgraph on k vertices with the maximum number of edges. This is a fundamental problem that has been subject to intense study for decades, with applications spanning a wide variety of fields. The state-of-the-art algorithm is an O(n^{1/4+ϵ})-factor approximation (for any ϵ>0) due to Bhaskara et al. [STOC '10]. Moreover, the so-called log-density framework predicts that this is optimal, i.e. it is impossible for an efficient algorithm to achieve an O(n^{1/4−ϵ})-factor approximation. In the average case, Densest k-Subgraph is a prototypical noisy inference task which is conjectured to exhibit a statistical-computational gap. In this work, we provide the strongest evidence yet of hardness for Densest k-Subgraph by showing matching lower bounds against the powerful Sum-of-Squares (SoS) algorithm, a meta-algorithm based on convex programming that achieves state-of-art algorithmic guarantees for many optimization and inference problems. For k ≤ n^1/2, we obtain a degree n^δ SoS lower bound for the hard regime as predicted by the log-density framework. To show this, we utilize the modern framework for proving SoS lower bounds on average-case problems pioneered by Barak et al. [FOCS '16]. A key issue is that small denser-than-average subgraphs in the input will greatly affect the value of the candidate pseudo-expectation operator around the subgraph. To handle this challenge, we devise a novel matrix factorization scheme based on the positive minimum vertex separator. We then prove an intersection tradeoff lemma to show that the error terms when using this separator are indeed small.more » « less
-
The densest subgraph problem in a graph (\dsg), in the simplest form, is the following. Given an undirected graph $G=(V,E)$ find a subset $$S \subseteq V$$ of vertices that maximizes the ratio $|E(S)|/|S|$ where $E(S)$ is the set of edges with both endpoints in $$S$$. \dsg and several of its variants are well-studied in theory and practice and have many applications in data mining and network analysis. In this paper we study fast algorithms and structural aspects of \dsg via the lens of \emph{supermodularity}. For this we consider the densest supermodular subset problem (\dssp): given a non-negative supermodular function $$f: 2^V \rightarrow \mathbb{R}_+$, maximize $f(S)/|S|$$. For \dsg we describe a simple flow-based algorithm that outputs a $$(1-\eps)$-approximation in deterministic $$\tilde{O}(m/\eps)$$ time where $$m$$ is the number of edges. Our algorithm is the first to have a near-linear dependence on $$m$$ and $$1/\eps$$ and improves previous methods based on an LP relaxation. It generalizes to hypergraphs, and also yields a faster algorithm for directed \dsg. Greedy peeling algorithms have been very popular for \dsg and several variants due to their efficiency, empirical performance, and worst-case approximation guarantees. We describe a simple peeling algorithm for \dssp and analyze its approximation guarantee in a fashion that unifies several existing results. Boob et al.\ \cite{bgpstww-20} developed an \emph{iterative} peeling algorithm for \dsg which appears to work very well in practice, and made a conjecture about its convergence to optimality. We affirmatively answer their conjecture, and in fact prove that a natural generalization of their algorithm converges to a $$(1-\eps)$$-approximation for \emph{any} supermodular function $$f$$; the key to our proof is to consider an LP formulation that is derived via the \Lovasz extension of a supermodular function. For \dsg the bound on the number of iterations we prove is $$O(\frac{\Delta \ln |V|}{\lambda^*}\cdot \frac{1}{\eps^2})$ where $$\Delta$$ is the maximum degree and $$\lambda^*$$ is the optimum value. Our work suggests that iterative peeling can be an effective heuristic for several objectives considered in the literature. Finally, we show that the $$2$$-approximation for densest-at-least-$$k$$ subgraph \cite{ks-09} extends to the supermodular setting. We also give a unified analysis of the peeling algorithm for this problem, and via this analysis derive an approximation guarantee for a generalization of \dssp to maximize $$f(S)/g(|S|)$ for a concave function $$g$$.more » « less
-
Kumar, Amit; Ron-Zewi, Noga (Ed.)Dense subgraph discovery is an important problem in graph mining and network analysis with several applications. Two canonical polynomial-time solvable problems here are to find a maxcore (subgraph of maximum min degree) and to find a densest subgraph (subgraph of maximum average degree). Both of these problems can be solved in polynomial time. Veldt, Benson, and Kleinberg [Veldt et al., 2021] introduced the generalized p-mean densest subgraph problem which captures the maxcore problem when p = -∞ and the densest subgraph problem when p = 1. They observed that for p ≥ 1, the objective function is supermodular and hence the problem can be solved in polynomial time. In this work, we focus on the p-mean densest subgraph problem for p ∈ (-∞, 1). We prove that for every p ∈ (-∞,1), the problem is NP-hard, thus resolving an open question from [Veldt et al., 2021]. We also show that for every p ∈ (0,1), the weighted version of the problem is APX-hard. On the algorithmic front, we describe two simple 1/2-approximation algorithms for every p ∈ (-∞, 1). We complement the approximation algorithms by exhibiting non-trivial instances on which the algorithms simultaneously achieve an approximation factor of at most 1/2.more » « less