Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Aichholzer, Oswin; Wang, Haitao (Ed.)The πβΒ² min-sum k-clustering problem is to partition an input set into clusters C_1,β¦,C_k to minimize β_{i=1}^k β_{p,q β C_i} βp-qββΒ². Although πβΒ² min-sum k-clustering is NP-hard, it is not known whether it is NP-hard to approximate πβΒ² min-sum k-clustering beyond a certain factor. In this paper, we give the first hardness-of-approximation result for the πβΒ² min-sum k-clustering problem. We show that it is NP-hard to approximate the objective to a factor better than 1.056 and moreover, assuming a balanced variant of the Johnson Coverage Hypothesis, it is NP-hard to approximate the objective to a factor better than 1.327. We then complement our hardness result by giving a fast PTAS for πβΒ² min-sum k-clustering. Specifically, our algorithm runs in time O(n^{1+o(1)}dβ 2^{(k/Ξ΅)^O(1)}), which is the first nearly linear time algorithm for this problem. We also consider a learning-augmented setting, where the algorithm has access to an oracle that outputs a label i β [k] for input point, thereby implicitly partitioning the input dataset into k clusters that induce an approximately optimal solution, up to some amount of adversarial error Ξ± β [0,1/2). We give a polynomial-time algorithm that outputs a (1+Ξ³Ξ±)/(1-Ξ±)Β²-approximation to πβΒ² min-sum k-clustering, for a fixed constant Ξ³ > 0.more » « less
-
Agrawal, Shipra; Roth, Aaron (Ed.)We consider the problem of \emph{identifying,} from statistics, a distribution of discrete random variables $$X_1 \ldots,X_n$$ that is a mixture of $$k$$ product distributions. The best previous sample complexity for $$n \in O(k)$$ was $$(1/\zeta)^{O(k^2 \log k)}$$ (under a mild separation assumption parameterized by $$\zeta$$). The best known lower bound was $$\exp(\Omega(k))$$. It is known that $$n\geq 2k-1$$ is necessary and sufficient for identification. We show, for any $$n\geq 2k-1$$, how to achieve sample complexity and run-time complexity $$(1/\zeta)^{O(k)}$$. We also extend the known lower bound of $$e^{\Omega(k)}$$ to match our upper bound across a broad range of $$\zeta$$. Our results are obtained by combining (a) a classic method for robust tensor decomposition, (b) a novel way of bounding the condition number of key matrices called Hadamard extensions, by studying their action only on flattened rank-1 tensors.more » « less
-
In the Euclidean k-Means problem we are given a collection of n points D in an Euclidean space and a positive integer k. Our goal is to identify a collection of k points in the same space (centers) so as to minimize the sum of the squared Euclidean distances between each point in D and the closest center. This problem is known to be APX-hard and the current best approximation ratio is a primal-dual 6.357 approximation based on a standard LP for the problem [Ahmadian et al. FOCS'17, SICOMP'20]. In this note we show how a minor modification of Ahmadian et al.'s analysis leads to a slightly improved 6.12903 approximation. As a related result, we also show that the mentioned LP has integrality gap at least (16+Sqrt(5))/15 > 1.2157. .more » « less
An official website of the United States government

Full Text Available