skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Near-optimal fitting of ellipsoids to random points
Given independent standard Gaussian points v1, . . . , vn in dimension d, for what values of (n, d) does there exist with high probability an origin-symmetric ellipsoid that simultaneously passes through all of the points? This basic problem of fitting an ellipsoid to random points has connections to low-rank matrix decompositions, independent component analysis, and principal component analysis. Based on strong numerical evidence, Saunderson, Parrilo, and Willsky (Saunderson, 2011; Saunderson et al., 2013) conjectured that the ellipsoid fitting problem transitions from feasible to infeasible as the number of points n increases, with a sharp threshold at n ∼ d^2/4. We resolve this conjecture up to logarithmic factors by constructing a fitting ellipsoid for some n = d^2/polylog(d). Our proof demonstrates feasibility of the least squares construction of (Saunderson, 2011; Saunderson et al., 2013) using a convenient decomposition of a certain non-standard random matrix and a careful analysis of its Neumann expansion via the theory of graph matrices.  more » « less
Award ID(s):
2008920
PAR ID:
10483229
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Proceedings of Machine Learning Research
Date Published:
Journal Name:
Proceedings of Machine Learning Research
ISSN:
2640-3498
Subject(s) / Keyword(s):
High-dimensional probability semi-definite programming phase transitions convex geometry
Format(s):
Medium: X
Location:
Bangalore, India
Sponsoring Org:
National Science Foundation
More Like this
  1. Etessami, Kousha; Feige, Uriel; Puppis, Gabriele (Ed.)
    In [Saunderson, 2011; Saunderson et al., 2013], Saunderson, Parrilo, and Willsky asked the following elegant geometric question: what is the largest m = m(d) such that there is an ellipsoid in ℝ^d that passes through v_1, v_2, …, v_m with high probability when the v_is are chosen independently from the standard Gaussian distribution N(0,I_d)? The existence of such an ellipsoid is equivalent to the existence of a positive semidefinite matrix X such that v_i^⊤ X v_i = 1 for every 1 ⩽ i ⩽ m - a natural example of a random semidefinite program. SPW conjectured that m = (1-o(1)) d²/4 with high probability. Very recently, Potechin, Turner, Venkat and Wein [Potechin et al., 2022] and Kane and Diakonikolas [Kane and Diakonikolas, 2022] proved that m ≳ d²/polylog(d) via a certain natural, explicit construction. In this work, we give a substantially tighter analysis of their construction to prove that m ≳ d²/C for an absolute constant C > 0. This resolves one direction of the SPW conjecture up to a constant. Our analysis proceeds via the method of Graphical Matrix Decomposition that has recently been used to analyze correlated random matrices arising in various areas [Barak et al., 2019; Bafna et al., 2022]. Our key new technical tool is a refined method to prove singular value upper bounds on certain correlated random matrices that are tight up to absolute dimension-independent constants. In contrast, all previous methods that analyze such matrices lose logarithmic factors in the dimension. 
    more » « less
  2. In [Saunderson, 2011; Saunderson et al., 2013], Saunderson, Parrilo, and Willsky asked the following elegant geometric question: what is the largest m = m(d) such that there is an ellipsoid in ℝ^d that passes through v_1, v_2, …, v_m with high probability when the v_is are chosen independently from the standard Gaussian distribution N(0,I_d)? The existence of such an ellipsoid is equivalent to the existence of a positive semidefinite matrix X such that v_i^⊤ X v_i = 1 for every 1 ⩽ i ⩽ m - a natural example of a random semidefinite program. SPW conjectured that m = (1-o(1)) d²/4 with high probability. Very recently, Potechin, Turner, Venkat and Wein [Potechin et al., 2022] and Kane and Diakonikolas [Kane and Diakonikolas, 2022] proved that m ≳ d²/log^O(1) d via a certain natural, explicit construction. In this work, we give a substantially tighter analysis of their construction to prove that m ≳ d²/C for an absolute constant C > 0. This resolves one direction of the SPW conjecture up to a constant. Our analysis proceeds via the method of Graphical Matrix Decomposition that has recently been used to analyze correlated random matrices arising in various areas [Barak et al., 2019; Bafna et al., 2022]. Our key new technical tool is a refined method to prove singular value upper bounds on certain correlated random matrices that are tight up to absolute dimension-independent constants. In contrast, all previous methods that analyze such matrices lose logarithmic factors in the dimension. 
    more » « less
  3. We study entropy flattening: Given a circuit C_X implicitly describing an n-bit source X (namely, X is the output of C_X on a uniform random input), construct another circuit C_Y describing a source Y such that (1) source Y is nearly flat (uniform on its support), and (2) the Shannon entropy of Y is monotonically related to that of X. The standard solution is to have C_Y evaluate C_X altogether Theta(n^2) times on independent inputs and concatenate the results (correctness follows from the asymptotic equipartition property). In this paper, we show that this is optimal among black-box constructions: Any circuit C_Y for entropy flattening that repeatedly queries C_X as an oracle requires Omega(n^2) queries. Entropy flattening is a component used in the constructions of pseudorandom generators and other cryptographic primitives from one-way functions [Johan Håstad et al., 1999; John Rompel, 1990; Thomas Holenstein, 2006; Iftach Haitner et al., 2006; Iftach Haitner et al., 2009; Iftach Haitner et al., 2013; Iftach Haitner et al., 2010; Salil P. Vadhan and Colin Jia Zheng, 2012]. It is also used in reductions between problems complete for statistical zero-knowledge [Tatsuaki Okamoto, 2000; Amit Sahai and Salil P. Vadhan, 1997; Oded Goldreich et al., 1999; Vadhan, 1999]. The Theta(n^2) query complexity is often the main efficiency bottleneck. Our lower bound can be viewed as a step towards proving that the current best construction of pseudorandom generator from arbitrary one-way functions by Vadhan and Zheng (STOC 2012) has optimal efficiency. 
    more » « less
  4. The Lovász Local Lemma (LLL) is a cornerstone principle in the probabilistic method of combinatorics, and a seminal algorithm of Moser & Tardos (2010) provides an efficient randomized algorithm to implement it. This algorithm can be parallelized to give an algorithm that uses polynomially many processors and runs in O(log3 n) time, stemming from O(log n) adaptive computations of a maximal independent set (MIS). Chung et al. (2014) developed faster local and parallel algorithms, potentially running in time O (log^2 n), but these algorithms work under significantly more stringent conditions than the LLL. We give a new parallel algorithm that works under essentially the same conditions as the original algorithm of Moser & Tardos but uses only a single MIS computation, thus running in O(log^2 n) time. This conceptually new algorithm also gives a clean combinatorial description of a satisfying assignment which might be of independent interest. Our techniques extend to the deterministic LLL algorithm given by Chandrasekaran et al. (2013) leading to an NC-algorithm running in time O(log^2 n) as well. We also provide improved bounds on the runtimes of the sequential and parallel resampling-based algorithms originally developed by Moser & Tardos. Our bounds extend to any problem instance in which the tighter Shearer LLL criterion is satisfied. We also improve on the analysis of Kolipaka & Szegedy (2011) to give tighter concentration results. 
    more » « less
  5. In the Euclidean k-Means problem we are given a collection of n points D in an Euclidean space and a positive integer k. Our goal is to identify a collection of k points in the same space (centers) so as to minimize the sum of the squared Euclidean distances between each point in D and the closest center. This problem is known to be APX-hard and the current best approximation ratio is a primal-dual 6.357 approximation based on a standard LP for the problem [Ahmadian et al. FOCS'17, SICOMP'20]. In this note we show how a minor modification of Ahmadian et al.'s analysis leads to a slightly improved 6.12903 approximation. As a related result, we also show that the mentioned LP has integrality gap at least (16+Sqrt(5))/15 > 1.2157. . 
    more » « less