skip to main content

This content will become publicly available on January 1, 2023

Title: Hardness and Algorithms for Robust and Sparse Optimization 162:17926-17944, 2022.
We explore algorithms and limitations for sparse optimization problems such as sparse linear regression and robust linear regression. The goal of the sparse linear regression problem is to identify a small number of key features, while the goal of the robust linear regression problem is to identify a small number of erroneous measurements. Specifically, the sparse linear regression problem seeks a k-sparse vector x ∈ Rd to minimize ‖Ax − b‖2, given an input matrix A ∈ Rn×d and a target vector b ∈ Rn, while the robust linear regression problem seeks a set S that ignores at most k rows and a vector x to minimize ‖(Ax − b)S ‖2. We first show bicriteria, NP-hardness of approximation for robust regression building on the work of [OWZ15] which implies a similar result for sparse regression. We further show fine-grained hardness of robust regression through a reduction from the minimum-weight k-clique conjecture. On the positive side, we give an algorithm for robust regression that achieves arbitrarily accurate additive error and uses runtime that closely matches the lower bound from the fine-grained hardness result, as well as an algorithm for sparse regression with similar runtime. Both our upper and lower bounds rely more » on a general reduction from robust linear regression to sparse regression that we introduce. Our algorithms, inspired by the 3SUM problem, use approximate nearest neighbor data structures and may be of independent interest for solving sparse optimization problems. For instance, we demonstrate that our techniques can also be used for the well-studied sparse PCA problem. « less
Authors:
; ;
Award ID(s):
2022448
Publication Date:
NSF-PAR ID:
10341762
Journal Name:
Proceedings of the 39th International Conference on Machine Learning (PMLR)
Issue:
162
Page Range or eLocation-ID:
17926-17944,
Sponsoring Org:
National Science Foundation
More Like this
  1. We investigate the approximability of the following optimization problem. The input is an n× n matrix A=(Aij) with real entries and an origin-symmetric convex body K⊂ ℝn that is given by a membership oracle. The task is to compute (or approximate) the maximum of the quadratic form ∑i=1n∑j=1n Aij xixj=⟨ x,Ax⟩ as x ranges over K. This is a rich and expressive family of optimization problems; for different choices of matrices A and convex bodies K it includes a diverse range of optimization problems like max-cut, Grothendieck/non-commutative Grothendieck inequalities, small set expansion and more. While the literature studied these special cases using case-specific reasoning, here we develop a general methodology for treatment of the approximability and inapproximability aspects of these questions. The underlying geometry of K plays a critical role; we show under commonly used complexity assumptions that polytime constant-approximability necessitates that K has type-2 constant that grows slowly with n. However, we show that even when the type-2 constant is bounded, this problem sometimes exhibits strong hardness of approximation. Thus, even within the realm of type-2 bodies, the approximability landscape is nuanced and subtle. However, the link that we establish between optimization and geometry of Banach spaces allows usmore »to devise a generic algorithmic approach to the above problem. We associate to each convex body a new (higher dimensional) auxiliary set that is not convex, but is approximately convex when K has a bounded type-2 constant. If our auxiliary set has an approximate separation oracle, then we design an approximation algorithm for the original quadratic optimization problem, using an approximate version of the ellipsoid method. Even though our hardness result implies that such an oracle does not exist in general, this new question can be solved in specific cases of interest by implementing a range of classical tools from functional analysis, most notably the deep factorization theory of linear operators. Beyond encompassing the scenarios in the literature for which constant-factor approximation algorithms were found, our generic framework implies that that for convex sets with bounded type-2 constant, constant factor approximability is preserved under the following basic operations: (a) Subspaces, (b) Quotients, (c) Minkowski Sums, (d) Complex Interpolation. This yields a rich family of new examples where constant factor approximations are possible, which were beyond the reach of previous methods. We also show (under commonly used complexity assumptions) that for symmetric norms and unitarily invariant matrix norms the type-2 constant nearly characterizes the approximability of quadratic maximization.« less
  2. We study the A-optimal design problem where we are given vectors υ1, …, υn ∊ ℝd, an integer k ≥ d, and the goal is to select a set S of k vectors that minimizes the trace of (∑i∊Svivi⊺)−1. Traditionally, the problem is an instance of optimal design of experiments in statistics [35] where each vector corresponds to a linear measurement of an unknown vector and the goal is to pick k of them that minimize the average variance of the error in the maximum likelihood estimate of the vector being measured. The problem also finds applications in sensor placement in wireless networks [22], sparse least squares regression [8], feature selection for k-means clustering [9], and matrix approximation [13, 14, 5]. In this paper, we introduce proportional volume sampling to obtain improved approximation algorithms for A-optimal design. Given a matrix, proportional volume sampling involves picking a set of columns S of size k with probability proportional to µ(S) times det(∑i∊Svivi⊺) for some measure µ. Our main result is to show the approximability of the A-optimal design problem can be reduced to approximate independence properties of the measure µ. We appeal to hardcore distributions as candidate distributions µ that allow usmore »to obtain improved approximation algorithms for the A-optimal design. Our results include a d-approximation when k = d, an (1 + ∊)-approximation when and -approximation when repetitions of vectors are allowed in the solution. We also consider generalization of the problem for k ≤ d and obtain a k-approximation. We also show that the proportional volume sampling algorithm gives approximation algorithms for other optimal design objectives (such as D-optimal design [36] and generalized ratio objective [27]) matching or improving previous best known results. Interestingly, we show that a similar guarantee cannot be obtained for the E-optimal design problem. We also show that the A-optimal design problem is NP-hard to approximate within a fixed constant when k = d.« less
  3. https://arxiv.org/abs/2007.14539 As in standard linear regression, in truncated linear regression, we are given access to observations (Ai,yi)i whose dependent variable equals yi=ATi⋅x∗+ηi, where x∗ is some fixed unknown vector of interest and ηi is independent noise; except we are only given an observation if its dependent variable yi lies in some "truncation set" S⊂ℝ. The goal is to recover x∗ under some favorable conditions on the Ai's and the noise distribution. We prove that there exists a computationally and statistically efficient method for recovering k-sparse n-dimensional vectors x∗ from m truncated samples, which attains an optimal ℓ2 reconstruction error of O((klogn)/m‾‾‾‾‾‾‾‾‾‾√). As a corollary, our guarantees imply a computationally efficient and information-theoretically optimal algorithm for compressed sensing with truncation, which may arise from measurement saturation effects. Our result follows from a statistical and computational analysis of the Stochastic Gradient Descent (SGD) algorithm for solving a natural adaptation of the LASSO optimization problem that accommodates truncation. This generalizes the works of both: (1) [Daskalakis et al. 2018], where no regularization is needed due to the low-dimensionality of the data, and (2) [Wainright 2009], where the objective function is simple due to the absence of truncation. In order to deal with bothmore »truncation and high-dimensionality at the same time, we develop new techniques that not only generalize the existing ones but we believe are of independent interest.« less
  4. We consider partially-specified optimization problems where the goal is to actively, but efficiently, acquire missing information about the problem in order to solve it. An algo- rithm designer wishes to solve a linear pro- gram (LP), maxcT x s.t. Ax ≤ b,x ≥ 0, but does not initially know some of the pa- rameters. The algorithm can iteratively choose an unknown parameter and gather information in the form of a noisy sample centered at the parameter’s (unknown) value. The goal is to find an approximately feasible and optimal so- lution to the underlying LP with high proba- bility while drawing a small number of sam- ples. We focus on two cases. (1) When the parameters b of the constraints are initially un- known, we propose an efficient algorithm com- bining techniques from the ellipsoid method for LP and confidence-bound approaches from bandit algorithms. The algorithm adaptively gathers information about constraints only as needed in order to make progress. We give sample complexity bounds for the algorithm and demonstrate its improvement over a naive approach via simulation. (2) When the param- eters c of the objective are initially unknown, we take an information-theoretic approach and give roughly matching upper and lowermore »sam- ple complexity bounds, with an (inefficient) successive-elimination algorithm.« less
  5. Quantum computational supremacy arguments, which describe a way for a quantum computer to perform a task that cannot also be done by a classical computer, typically require some sort of computational assumption related to the limitations of classical computation. One common assumption is that the polynomial hierarchy ( P H ) does not collapse, a stronger version of the statement that P ≠ N P , which leads to the conclusion that any classical simulation of certain families of quantum circuits requires time scaling worse than any polynomial in the size of the circuits. However, the asymptotic nature of this conclusion prevents us from calculating exactly how many qubits these quantum circuits must have for their classical simulation to be intractable on modern classical supercomputers. We refine these quantum computational supremacy arguments and perform such a calculation by imposing fine-grained versions of the non-collapse conjecture. Our first two conjectures poly3-NSETH( a ) and per-int-NSETH( b ) take specific classical counting problems related to the number of zeros of a degree-3 polynomial in n variables over F 2 or the permanent of an n × n integer-valued matrix, and assert that any non-deterministic algorithm that solves them requires 2 c nmore »time steps, where c ∈ { a , b } . A third conjecture poly3-ave-SBSETH( a ′ ) asserts a similar statement about average-case algorithms living in the exponential-time version of the complexity class S B P . We analyze evidence for these conjectures and argue that they are plausible when a = 1 / 2 , b = 0.999 and a ′ = 1 / 2 .Imposing poly3-NSETH(1/2) and per-int-NSETH(0.999), and assuming that the runtime of a hypothetical quantum circuit simulation algorithm would scale linearly with the number of gates/constraints/optical elements, we conclude that Instantaneous Quantum Polynomial-Time (IQP) circuits with 208 qubits and 500 gates, Quantum Approximate Optimization Algorithm (QAOA) circuits with 420 qubits and 500 constraints and boson sampling circuits (i.e. linear optical networks) with 98 photons and 500 optical elements are large enough for the task of producing samples from their output distributions up to constant multiplicative error to be intractable on current technology. Imposing poly3-ave-SBSETH(1/2), we additionally rule out simulations with constant additive error for IQP and QAOA circuits of the same size. Without the assumption of linearly increasing simulation time, we can make analogous statements for circuits with slightly fewer qubits but requiring 10 4 to 10 7 gates.« less