Population risk is always of primary interest in machine learning; however, learning algorithms only have access to the empirical risk. Even for applications with nonconvex nonsmooth losses (such as modern deep networks), the population risk is generally significantly more well-behaved from an optimization point of view than the empirical risk. In particular, sampling can create many spurious local minima. We consider a general framework which aims to optimize a smooth nonconvex function F (population risk) given only access to an approximation f (empirical risk) that is pointwise close to F (i.e., ||F − f||_\infty ≤ ν). Our objective is to find the \eps-approximate local minima of the underlying function F while avoiding the shallow local minima—arising because of the tolerance ν—which exist only in f. We propose a simple algorithm based on stochastic gradient descent (SGD) on a smoothed version of f that is guaranteed to achieve our goal as long as ν ≤ O(\eps^{1.5}/d). We also provide an almost matching lower bound showing that our algorithm achieves optimal error tolerance ν among all algorithms making a polynomial number of queries of f. As a concrete example, we show that our results can be directly used to give sample complexities for learning a ReLU unit.
more »
« less
On the energy landscape of symmetric quantum signal processing
Symmetric quantum signal processing provides a parameterized representation of a real polynomial, which can be translated into an efficient quantum circuit for performing a wide range of computational tasks on quantum computers. For a given polynomial f , the parameters (called phase factors) can be obtained by solving an optimization problem. However, the cost function is non-convex, and has a very complex energy landscape with numerous global and local minima. It is therefore surprising that the solution can be robustly obtained in practice, starting from a fixed initial guess Φ 0 that contains no information of the input polynomial. To investigate this phenomenon, we first explicitly characterize all the global minima of the cost function. We then prove that one particular global minimum (called the maximal solution) belongs to a neighborhood of Φ 0 , on which the cost function is strongly convex under the condition ‖ f ‖ ∞ = O ( d − 1 ) with d = d e g ( f ) . Our result provides a partial explanation of the aforementioned success of optimization algorithms.
more »
« less
- Award ID(s):
- 2016245
- PAR ID:
- 10424843
- Date Published:
- Journal Name:
- Quantum
- Volume:
- 6
- ISSN:
- 2521-327X
- Page Range / eLocation ID:
- 850
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We give new quantum algorithms for evaluating composed functions whose inputs may be shared between bottom-level gates. Let f be an m -bit Boolean function and consider an n -bit function F obtained by applying f to conjunctions of possibly overlapping subsets of n variables. If f has quantum query complexity Q ( f ) , we give an algorithm for evaluating F using O ~ ( Q ( f ) ⋅ n ) quantum queries. This improves on the bound of O ( Q ( f ) ⋅ n ) that follows by treating each conjunction independently, and our bound is tight for worst-case choices of f . Using completely different techniques, we prove a similar tight composition theorem for the approximate degree of f .By recursively applying our composition theorems, we obtain a nearly optimal O ~ ( n 1 − 2 − d ) upper bound on the quantum query complexity and approximate degree of linear-size depth- d AC 0 circuits. As a consequence, such circuits can be PAC learned in subexponential time, even in the challenging agnostic setting. Prior to our work, a subexponential-time algorithm was not known even for linear-size depth-3 AC 0 circuits.As an additional consequence, we show that AC 0 ∘ ⊕ circuits of depth d + 1 require size Ω ~ ( n 1 / ( 1 − 2 − d ) ) ≥ ω ( n 1 + 2 − d ) to compute the Inner Product function even on average. The previous best size lower bound was Ω ( n 1 + 4 − ( d + 1 ) ) and only held in the worst case (Cheraghchi et al., JCSS 2018).more » « less
-
We introduce and study the problem of dueling optimization with a monotone adversary, a generalization of (noiseless) dueling convex optimization. The goal is to design an online algorithm to find a minimizer x* for a function f:X→R, for X \subseteq R^d. In each round, the algorithm submits a pair of guesses x1 and x2, and the adversary responds with any point in the space that is at least as good as both guesses. The cost of each query is the suboptimality of the worst of the two guesses; i.e., max(f(x1) − f(x*),f(x2) − f(x*)). The goal is to minimize the number of iterations required to find an ε-optimal point and to minimize the total cost (regret) of the guesses over many rounds. Our main result is an efficient randomized algorithm for several natural choices of the function f and set X that incurs cost O(d) and iteration complexity O(d log(1/ε)^2). Moreover, our dependence on d is asymptotically optimal, as we show examples in which any randomized algorithm for this problem must incur Ω(d) cost and iteration complexity.more » « less
-
Constraints solvers play a significant role in the analysis, synthesis, and formal verification of complex cyber-physical systems. In this article, we study the problem of designing a scalable constraints solver for an important class of constraints named polynomial constraint inequalities (also known as nonlinear real arithmetic theory). In this article, we introduce a solver named PolyARBerNN that uses convex polynomials as abstractions for highly nonlinears polynomials. Such abstractions were previously shown to be powerful to prune the search space and restrict the usage of sound and complete solvers to small search spaces. Compared with the previous efforts on using convex abstractions, PolyARBerNN provides three main contributions namely (i) a neural network guided abstraction refinement procedure that helps selecting the right abstraction out of a set of pre-defined abstractions, (ii) a Bernstein polynomial-based search space pruning mechanism that can be used to compute tight estimates of the polynomial maximum and minimum values which can be used as an additional abstraction of the polynomials, and (iii) an optimizer that transforms polynomial objective functions into polynomial constraints (on the gradient of the objective function) whose solutions are guaranteed to be close to the global optima. These enhancements together allowed the PolyARBerNN solver to solve complex instances and scales more favorably compared to the state-of-the-art nonlinear real arithmetic solvers while maintaining the soundness and completeness of the resulting solver. In particular, our test benches show that PolyARBerNN achieved 100X speedup compared with Z3 8.9, Yices 2.6, and PVS (a solver that uses Bernstein expansion to solve multivariate polynomial constraints) on a variety of standard test benches. Finally, we implemented an optimizer called PolyAROpt that uses PolyARBerNN to solve constrained polynomial optimization problems. Numerical results show that PolyAROpt is able to solve high-dimensional and high order polynomial optimization problems with higher speed compared to the built-in optimizer in the Z3 8.9 solver.more » « less
-
While recent work suggests that quantum computers can speed up the solution of semidefinite programs, little is known about the quantum complexity of more general convex optimization. We present a quantum algorithm that can optimize a convex function over an n -dimensional convex body using O ~ ( n ) queries to oracles that evaluate the objective function and determine membership in the convex body. This represents a quadratic improvement over the best-known classical algorithm. We also study limitations on the power of quantum computers for general convex optimization, showing that it requires Ω ~ ( n ) evaluation queries and Ω ( n ) membership queries.more » « less
An official website of the United States government

