skip to main content


Title: Inequality constrained stochastic nonlinear optimization via active-set sequential quadratic programming
Abstract

We study nonlinear optimization problems with a stochastic objective and deterministic equality and inequality constraints, which emerge in numerous applications including finance, manufacturing, power systems and, recently, deep neural networks. We propose an active-set stochastic sequential quadratic programming (StoSQP) algorithm that utilizes a differentiable exact augmented Lagrangian as the merit function. The algorithm adaptively selects the penalty parameters of the augmented Lagrangian, and performs a stochastic line search to decide the stepsize. The global convergence is established: for any initialization, the KKT residuals converge to zeroalmost surely. Our algorithm and analysis further develop the prior work of Na et al. (Math Program, 2022.https://doi.org/10.1007/s10107-022-01846-z). Specifically, we allow nonlinear inequality constraintswithoutrequiring the strict complementary condition; refine some of designs in Na et al. (2022) such as the feasibility error condition and the monotonically increasing sample size; strengthen the global convergence guarantee; and improve the sample complexity on the objective Hessian. We demonstrate the performance of the designed algorithm on a subset of nonlinear problems collected in CUTEst test set and on constrained logistic regression problems.

 
more » « less
NSF-PAR ID:
10399905
Author(s) / Creator(s):
; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Mathematical Programming
Volume:
202
Issue:
1-2
ISSN:
0025-5610
Format(s):
Medium: X Size: p. 279-353
Size(s):
["p. 279-353"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    “Classical shadows” are estimators of an unknown quantum state, constructed from suitably distributed random measurements on copies of that state (Huang et al. in Nat Phys 16:1050, 2020,https://doi.org/10.1038/s41567-020-0932-7). In this paper, we analyze classical shadows obtained using random matchgate circuits, which correspond to fermionic Gaussian unitaries. We prove that the first three moments of the Haar distribution over thecontinuousgroup of matchgate circuits are equal to those of thediscreteuniform distribution over only the matchgate circuits that are also Clifford unitaries; thus, the latter forms a “matchgate 3-design.” This implies that the classical shadows resulting from the two ensembles are functionally equivalent. We show how one can use these matchgate shadows to efficiently estimate inner products between an arbitrary quantum state and fermionic Gaussian states, as well as the expectation values of local fermionic operators and various other quantities, thus surpassing the capabilities of prior work. As a concrete application, this enables us to apply wavefunction constraints that control the fermion sign problem in the quantum-classical auxiliary-field quantum Monte Carlo algorithm (QC-AFQMC) (Huggins et al. in Nature 603:416, 2022,https://doi.org/10.1038/s41586-021-04351-z), without the exponential post-processing cost incurred by the original approach.

     
    more » « less
  2. Abstract

    In this paper we disprove part of a conjecture of Lieb and Thirring concerning the best constant in their eponymous inequality. We prove that the best Lieb–Thirring constant when the eigenvalues of a Schrödinger operator$$-\Delta +V(x)$$-Δ+V(x)are raised to the power$$\kappa $$κis never given by the one-bound state case when$$\kappa >\max (0,2-d/2)$$κ>max(0,2-d/2)in space dimension$$d\ge 1$$d1. When in addition$$\kappa \ge 1$$κ1we prove that this best constant is never attained for a potential having finitely many eigenvalues. The method to obtain the first result is to carefully compute the exponentially small interaction between two Gagliardo–Nirenberg optimisers placed far away. For the second result, we study the dual version of the Lieb–Thirring inequality, in the same spirit as in Part I of this work Gontier et al. (The nonlinear Schrödinger equation for orthonormal functions I. Existence of ground states. Arch. Rat. Mech. Anal, 2021.https://doi.org/10.1007/s00205-021-01634-7). In a different but related direction, we also show that the cubic nonlinear Schrödinger equation admits no orthonormal ground state in 1D, for more than one function.

     
    more » « less
  3. Abstract

    We consider the problem of covering multiple submodular constraints. Given a finite ground setN, a weight function$$w: N \rightarrow \mathbb {R}_+$$w:NR+,rmonotone submodular functions$$f_1,f_2,\ldots ,f_r$$f1,f2,,froverNand requirements$$k_1,k_2,\ldots ,k_r$$k1,k2,,krthe goal is to find a minimum weight subset$$S \subseteq N$$SNsuch that$$f_i(S) \ge k_i$$fi(S)kifor$$1 \le i \le r$$1ir. We refer to this problem asMulti-Submod-Coverand it was recently considered by Har-Peled and Jones (Few cuts meet many point sets. CoRR.arxiv:abs1808.03260Har-Peled and Jones 2018) who were motivated by an application in geometry. Even with$$r=1$$r=1Multi-Submod-Covergeneralizes the well-known Submodular Set Cover problem (Submod-SC), and it can also be easily reduced toSubmod-SC. A simple greedy algorithm gives an$$O(\log (kr))$$O(log(kr))approximation where$$k = \sum _i k_i$$k=ikiand this ratio cannot be improved in the general case. In this paper, motivated by several concrete applications, we consider two ways to improve upon the approximation given by the greedy algorithm. First, we give a bicriteria approximation algorithm forMulti-Submod-Coverthat covers each constraint to within a factor of$$(1-1/e-\varepsilon )$$(1-1/e-ε)while incurring an approximation of$$O(\frac{1}{\epsilon }\log r)$$O(1ϵlogr)in the cost. Second, we consider the special case when each$$f_i$$fiis a obtained from a truncated coverage function and obtain an algorithm that generalizes previous work on partial set cover (Partial-SC), covering integer programs (CIPs) and multiple vertex cover constraints Bera et al. (Theoret Comput Sci 555:2–8 Bera et al. 2014). Both these algorithms are based on mathematical programming relaxations that avoid the limitations of the greedy algorithm. We demonstrate the implications of our algorithms and related ideas to several applications ranging from geometric covering problems to clustering with outliers. Our work highlights the utility of the high-level model and the lens of submodularity in addressing this class of covering problems.

     
    more » « less
  4. We consider a class of nonsmooth convex composite optimization problems, where the objective function is given by the sum of a continuously differentiable convex term and a potentially non-differentiable convex regularizer. In [1], the authors introduced the proximal augmented Lagrangian method and derived the resulting continuous-time primal-dual dynamics that converge to the optimal solution. In this paper, we extend these dynamics from continuous to discrete time via the forward Euler discretization. We prove explicit bounds on the exponential convergence rates of our proposed algorithm with a sufficiently small step size. Since a larger step size can improve the convergence speed, we further develop a linear matrix inequality (LMI) condition which can be numerically solved to provide rate certificates with general step size choices. In addition, we prove that a large range of step size values can guarantee exponential convergence. We close the paper by demonstrating the performance of the proposed algorithm via computational experiments. 
    more » « less
  5. Abstract

    Bohnenblust–Hille inequalities for Boolean cubes have been proven with dimension-free constants that grow subexponentially in the degree (Defant et al. in Math Ann 374(1):653–680, 2019). Such inequalities have found great applications in learning low-degree Boolean functions (Eskenazis and Ivanisvili in Proceedings of the 54th annual ACM SIGACT symposium on theory of computing, pp 203–207, 2022). Motivated by learning quantum observables, a qubit analogue of Bohnenblust–Hille inequality for Boolean cubes was recently conjectured in Rouzé et al. (Quantum Talagrand, KKL and Friedgut’s theorems and the learnability of quantum Boolean functions, 2022. arXiv preprintarXiv:2209.07279). The conjecture was resolved in Huang et al. (Learning to predict arbitrary quantum processes, 2022. arXiv preprintarXiv:2210.14894). In this paper, we give a new proof of these Bohnenblust–Hille inequalities for qubit system with constants that are dimension-free and of exponential growth in the degree. As a consequence, we obtain a junta theorem for low-degree polynomials. Using similar ideas, we also study learning problems of low degree quantum observables and Bohr’s radius phenomenon on quantum Boolean cubes.

     
    more » « less