skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Sparse Approximation via Generating Point Sets
For a set P of n points in the unit ball b ⊆ R d , consider the problem of finding a small subset T ⊆ P such that its convex-hull ε-approximates the convex-hull of the original set. Specifically, the Hausdorff distance between the convex hull of T and the convex hull of P should be at most ε. We present an efficient algorithm to compute such an ε ′ -approximation of size kalg, where ε ′ is a function of ε, and kalg is a function of the minimum size kopt of such an ε-approximation. Surprisingly, there is no dependence on the dimension d in either of the bounds. Furthermore, every point of P can be ε- approximated by a convex-combination of points of T that is O(1/ε2 )-sparse. Our result can be viewed as a method for sparse, convex autoencoding: approximately representing the data in a compact way using sparse combinations of a small subset T of the original data. The new algorithm can be kernelized, and it preserves sparsity in the original input.  more » « less
Award ID(s):
1525971
PAR ID:
10057818
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA2016)
Page Range / eLocation ID:
548 to 557
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Bojanczyk, Mikolaj; Chekuri, Chandra (Ed.)
    Given a point set P in the plane, we seek a subset Q ⊆ P, whose convex hull gives a smaller and thus simpler representation of the convex hull of P. Specifically, let cost(Q,P) denote the Hausdorff distance between the convex hulls CH(Q) and CH(P). Then given a value ε > 0 we seek the smallest subset Q ⊆ P such that cost(Q,P) ≤ ε. We also consider the dual version, where given an integer k, we seek the subset Q ⊆ P which minimizes cost(Q,P), such that |Q| ≤ k. For these problems, when P is in convex position, we respectively give an O(n log²n) time algorithm and an O(n log³n) time algorithm, where the latter running time holds with high probability. When there is no restriction on P, we show the problem can be reduced to APSP in an unweighted directed graph, yielding an O(n^2.5302) time algorithm when minimizing k and an O(min{n^2.5302, kn^2.376}) time algorithm when minimizing ε, using prior results for APSP. Finally, we show our near linear algorithms for convex position give 2-approximations for the general case. 
    more » « less
  2. Caratheodory’s theorem says that any point in the convex hull of a set $$P$$ in $R^d$ is in the convex hull of a subset $P'$ of $$P$$ such that $$|P'| \le d + 1$$. For some sets P, the upper bound d + 1 can be improved. The best upper bound for P is known as the Caratheodory number [2, 15, 17]. In this paper, we study a computational problem of finding the smallest set $P'$ for a given set $$P$$ and a point $$p$$. We call the size of this set $P'$, the Caratheodory number of a point p or CNP. We show that the problem of deciding the Caratheodory number of a point is NP-hard. Furthermore, we show that the problem is k-LDT-hard. We present two algorithms for computing a smallest set $P'$, if CNP= 2,3. Bárány [1] generalized Caratheodory’s theorem by using d+1 sets (colored sets) such that their convex hulls intersect. We introduce a Colorful Caratheodory number of a point or CCNP which can be smaller than d+1. Then we extend our results for CNP to CCNP. 
    more » « less
  3. We consider the problem of dynamically maintaining the convex hull of a set S of points in the plane under the following special sequence of insertions and deletions (called window-sliding updates): insert a point to the right of all points of S and delete the leftmost point of S. We propose an O(|S|)-space data structure that can handle each update in O(1) amortized time, such that all standard binary-search-based queries on the convex hull of S can be answered in 𝑂(log |S|) time, and the convex hull itself can be output in time linear in its size. 
    more » « less
  4. We study a clustering problem where the goal is to maximize the coverage of the input points by k chosen centers. Specifically, given a set of n points P ⊆ ℝ^d, the goal is to pick k centers C ⊆ ℝ^d that maximize the service ∑_{p∈P}φ(𝖽(p,C)) to the points P, where 𝖽(p,C) is the distance of p to its nearest center in C, and φ is a non-increasing service function φ: ℝ+ → ℝ+. This includes problems of placing k base stations as to maximize the total bandwidth to the clients - indeed, the closer the client is to its nearest base station, the more data it can send/receive, and the target is to place k base stations so that the total bandwidth is maximized. We provide an n^{ε^-O(d)} time algorithm for this problem that achieves a (1-ε)-approximation. Notably, the runtime does not depend on the parameter k and it works for an arbitrary non-increasing service function φ: ℝ+ → ℝ+. 
    more » « less
  5. Bojańczyk, Mikołaj; Merelli, Emanuela; Woodruff, David P (Ed.)
    Given n points in 𝓁_p^d, we consider the problem of partitioning points into k clusters with associated centers. The cost of a clustering is the sum of p-th powers of distances of points to their cluster centers. For p ∈ [1,2], we design sketches of size poly(log(nd),k,1/ε) such that the cost of the optimal clustering can be estimated to within factor 1+ε, despite the fact that the compressed representation does not contain enough information to recover the cluster centers or the partition into clusters. This leads to a streaming algorithm for estimating the clustering cost with space poly(log(nd),k,1/ε). We also obtain a distributed memory algorithm, where the n points are arbitrarily partitioned amongst m machines, each of which sends information to a central party who then computes an approximation of the clustering cost. Prior to this work, no such streaming or distributed-memory algorithm was known with sublinear dependence on d for p ∈ [1,2). 
    more » « less