- Award ID(s):
- 1718258
- NSF-PAR ID:
- 10274338
- Date Published:
- Journal Name:
- Operations Research
- Volume:
- 68
- Issue:
- 5
- ISSN:
- 0030-364X
- Page Range / eLocation ID:
- 1517 to 1537
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
null (Ed.)Abstract We study a 1-dimensional discrete signal denoising problem that consists of minimizing a sum of separable convex fidelity terms and convex regularization terms, the latter penalize the differences of adjacent signal values. This problem generalizes the total variation regularization problem. We provide here a unified approach to solve the problem for general convex fidelity and regularization functions that is based on the Karush–Kuhn–Tucker optimality conditions. This approach is shown here to lead to a fast algorithm for the problem with general convex fidelity and regularization functions, and a faster algorithm if, in addition, the fidelity functions are differentiable and the regularization functions are strictly convex. Both algorithms achieve the best theoretical worst case complexity over existing algorithms for the classes of objective functions studied here. Also in practice, our C++ implementation of the method is considerably faster than popular C++ nonlinear optimization solvers for the problem.more » « less
-
Abstract This paper studies several solution paths of sparse quadratic minimization problems as a function of the weighing parameter of the bi-objective of estimation loss versus solution sparsity. Three such paths are considered: the “
-path” where the discontinuous$$\ell _0$$ -function provides the exact sparsity count; the “$$\ell _0$$ -path” where the$$\ell _1$$ -function provides a convex surrogate of sparsity count; and the “capped$$\ell _1$$ -path” where the nonconvex nondifferentiable capped$$\ell _1$$ -function aims to enhance the$$\ell _1$$ -approximation. Serving different purposes, each of these three formulations is different from each other, both analytically and computationally. Our results deepen the understanding of (old and new) properties of the associated paths, highlight the pros, cons, and tradeoffs of these sparse optimization models, and provide numerical evidence to support the practical superiority of the capped$$\ell _1$$ -path. Our study of the capped$$\ell _1$$ -path is interesting in its own right as the path pertains to computable directionally stationary (= strongly locally minimizing in this context, as opposed to globally optimal) solutions of a parametric nonconvex nondifferentiable optimization problem. Motivated by classical parametric quadratic programming theory and reinforced by modern statistical learning studies, both casting an exponential perspective in fully describing such solution paths, we also aim to address the question of whether some of them can be fully traced in strongly polynomial time in the problem dimensions. A major conclusion of this paper is that a path of directional stationary solutions of the capped$$\ell _1$$ -regularized problem offers interesting theoretical properties and practical compromise between the$$\ell _1$$ -path and the$$\ell _0$$ -path. Indeed, while the$$\ell _1$$ -path is computationally prohibitive and greatly handicapped by the repeated solution of mixed-integer nonlinear programs, the quality of$$\ell _0$$ -path, in terms of the two criteria—loss and sparsity—in the estimation objective, is inferior to the capped$$\ell _1$$ -path; the latter can be obtained efficiently by a combination of a parametric pivoting-like scheme supplemented by an algorithm that takes advantage of the Z-matrix structure of the loss function.$$\ell _1$$ -
Statistical relational learning models are powerful tools that combine ideas from first-order logic with probabilistic graphical models to represent complex dependencies. Despite their success in encoding large problems with a compact set of weighted rules, performing inference over these models is often challenging. In this paper, we show how to effectively combine two powerful ideas for scaling inference for large graphical models. The first idea, lifted inference, is a wellstudied approach to speeding up inference in graphical models by exploiting symmetries in the underlying problem. The second idea is to frame Maximum a posteriori (MAP) inference as a convex optimization problem and use alternating direction method of multipliers (ADMM) to solve the problem in parallel. A well-studied relaxation to the combinatorial optimization problem defined for logical Markov random fields gives rise to a hinge-loss Markov random field (HLMRF) for which MAP inference is a convex optimization problem. We show how the formalism introduced for coloring weighted bipartite graphs using a color refinement algorithm can be integrated with the ADMM optimization technique to take advantage of the sparse dependency structures of HLMRFs. Our proposed approach, lifted hinge-loss Markov random fields (LHL-MRFs), preserves the structure of the original problem after lifting and solves lifted inference as distributed convex optimization with ADMM. In our empirical evaluation on real-world problems, we observe up to a three times speed up in inference over HL-MRFs.more » « less
-
Abstract In this paper, we study multistage stochastic mixed-integer nonlinear programs (MS-MINLP). This general class of problems encompasses, as important special cases, multistage stochastic convex optimization with
non-Lipschitzian value functions and multistage stochastic mixed-integer linear optimization. We develop stochastic dual dynamic programming (SDDP) type algorithms with nested decomposition, deterministic sampling, and stochastic sampling. The key ingredient is a new type of cuts based on generalized conjugacy. Several interesting classes of MS-MINLP are identified, where the new algorithms are guaranteed to obtain the global optimum without the assumption of complete recourse. This significantly generalizes the classic SDDP algorithms. We also characterize the iteration complexity of the proposed algorithms. In particular, for a -stage stochastic MINLP satisfying$$(T+1)$$ L -exact Lipschitz regularization withd -dimensional state spaces, to obtain an -optimal root node solution, we prove that the number of iterations of the proposed deterministic sampling algorithm is upper bounded by$$\varepsilon $$ , and is lower bounded by$${\mathcal {O}}((\frac{2LT}{\varepsilon })^d)$$ for the general case or by$${\mathcal {O}}((\frac{LT}{4\varepsilon })^d)$$ for the convex case. This shows that the obtained complexity bounds are rather sharp. It also reveals that the iteration complexity depends$${\mathcal {O}}((\frac{LT}{8\varepsilon })^{d/2-1})$$ polynomially on the number of stages. We further show that the iteration complexity dependslinearly onT , if all the state spaces are finite sets, or if we seek a -optimal solution when the state spaces are infinite sets, i.e. allowing the optimality gap to scale with$$(T\varepsilon )$$ T . To the best of our knowledge, this is the first work that reports global optimization algorithms as well as iteration complexity results for solving such a large class of multistage stochastic programs. The iteration complexity study resolves a conjecture by the late Prof. Shabbir Ahmed in the general setting of multistage stochastic mixed-integer optimization. -
Abstract We establish an equivalence between a family of adversarial training problems for non-parametric binary classification and a family of regularized risk minimization problems where the regularizer is a nonlocal perimeter functional. The resulting regularized risk minimization problems admit exact convex relaxations of the type $L^1+\text{(nonlocal)}\operatorname{TV}$, a form frequently studied in image analysis and graph-based learning. A rich geometric structure is revealed by this reformulation which in turn allows us to establish a series of properties of optimal solutions of the original problem, including the existence of minimal and maximal solutions (interpreted in a suitable sense) and the existence of regular solutions (also interpreted in a suitable sense). In addition, we highlight how the connection between adversarial training and perimeter minimization problems provides a novel, directly interpretable, statistical motivation for a family of regularized risk minimization problems involving perimeter/total variation. The majority of our theoretical results are independent of the distance used to define adversarial attacks.