skip to main content


Title: Enhanced alternating energy minimization methods for stochastic galerkin matrix equations
In uncertainty quantification, it is commonly required to solve a forward model consisting of a partial differential equation (PDE) with a spatially varying uncertain coefficient that is represented as an affine function of a set of random variables, or parameters. Discretizing such models using stochastic Galerkin finite element methods (SGFEMs) leads to very high-dimensional discrete problems that can be cast as linear multi-term matrix equations (LMTMEs). We develop efficient computational methods for approximating solutions of such matrix equations in low rank. To do this, we follow an alternating energy minimization (AEM) framework, wherein the solution is represented as a product of two matrices, and approximations to each component are sought by solving certain minimization problems repeatedly. Inspired by proper generalized decomposition methods, the iterative solution algorithms we present are based on a rank-adaptive variant of AEM methods that successively computes a rank-one solution component at each step. We introduce and evaluate new enhancement procedures to improve the accuracy of the approximations these algorithms deliver. The efficiency and accuracy of the enhanced AEM methods is demonstrated through numerical experiments with LMTMEs associated with SGFEM discretizations of parameterized linear elliptic PDEs.  more » « less
Award ID(s):
1819115
NSF-PAR ID:
10318341
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
BIT numerical mathematics
ISSN:
1572-9125
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Low-rank matrix recovery is a fundamental problem in machine learning with numerous applications. In practice, the problem can be solved by convex optimization namely nuclear norm minimization, or by non-convex optimization as it is well-known that for low-rank matrix problems like matrix sensing and matrix completion, all local optima of the natural non-convex objectives are also globally optimal under certain ideal assumptions. In this paper, we study new approaches for matrix sensing in a semi-random model where an adversary can add any number of arbitrary sensing matrices. More precisely, the problem is to recover a low-rank matrix $X^\star$ from linear measurements $b_i = \langle A_i, X^\star \rangle$, where an unknown subset of the sensing matrices satisfies the Restricted Isometry Property (RIP) and the rest of the $A_i$'s are chosen adversarially. It is known that in the semi-random model, existing non-convex objectives can have bad local optima. To fix this, we present a descent-style algorithm that provably recovers the ground-truth matrix $X^\star$. For the closely-related problem of semi-random matrix completion, prior work [CG18] showed that all bad local optima can be eliminated by reweighting the input data. However, the analogous approach for matrix sensing requires reweighting a set of matrices to satisfy RIP, which is a condition that is NP-hard to check. Instead, we build on the framework proposed in [KLL$^+$23] for semi-random sparse linear regression, where the algorithm in each iteration reweights the input based on the current solution, and then takes a weighted gradient step that is guaranteed to work well locally. Our analysis crucially exploits the connection between sparsity in vector problems and low-rankness in matrix problems, which may have other applications in obtaining robust algorithms for sparse and low-rank problems. 
    more » « less
  2. Determinant maximization problem gives a general framework that models problems arising in as diverse fields as statistics [Puk06], convex geometry [Kha96], fair allocations [AGSS16], combinatorics [AGV18], spectral graph theory [NST19a], network design, and random processes [KT12]. In an instance of a determinant maximization problem, we are given a collection of vectors U = {v1, . . . , vn} ⊂ Rd , and a goal is to pick a subset S ⊆ U of given vectors to maximize the determinant of the matrix ∑i∈S vivi^T. Often, the set S of picked vectors must satisfy additional combinatorial constraints such as cardinality constraint (|S| ≤ k) or matroid constraint (S is a basis of a matroid defined on the vectors). In this paper, we give a polynomial-time deterministic algorithm that returns a r O(r)-approximation for any matroid of rank r ≤ d. This improves previous results that give e O(r^2)-approximation algorithms relying on e^O(r)-approximate estimation algorithms [NS16, AG17,AGV18, MNST20] for any r ≤ d. All previous results use convex relaxations and their relationship to stable polynomials and strongly log-concave polynomials. In contrast, our algorithm builds on combinatorial algorithms for matroid intersection, which iteratively improve any solution by finding an alternating negative cycle in the exchange graph defined by the matroids. While the det(.) function is not linear, we show that taking appropriate linear approximations at each iteration suffice to give the improved approximation algorithm. 
    more » « less
  3. Kernel matrices, as well as weighted graphs represented by them, are ubiquitous objects in machine learning, statistics and other related fields. The main drawback of using kernel methods (learning and inference using kernel matrices) is efficiency – given n input points, most kernel-based algorithms need to materialize the full n × n kernel matrix before performing any subsequent computation, thus incurring Ω(n^2) runtime. Breaking this quadratic barrier for various problems has therefore, been a subject of extensive research efforts. We break the quadratic barrier and obtain subquadratic time algorithms for several fundamental linear-algebraic and graph processing primitives, including approximating the top eigenvalue and eigenvector, spectral sparsification, solving lin- ear systems, local clustering, low-rank approximation, arboricity estimation and counting weighted triangles. We build on the recently developed Kernel Density Estimation framework, which (after preprocessing in time subquadratic in n) can return estimates of row/column sums of the kernel matrix. In particular, we de- velop efficient reductions from weighted vertex and weighted edge sampling on kernel graphs, simulating random walks on kernel graphs, and importance sampling on matrices to Kernel Density Estimation and show that we can generate samples from these distributions in sublinear (in the support of the distribution) time. Our reductions are the central ingredient in each of our applications and we believe they may be of independent interest. We empirically demonstrate the efficacy of our algorithms on low-rank approximation (LRA) and spectral sparsi- fication, where we observe a 9x decrease in the number of kernel evaluations over baselines for LRA and a 41x reduction in the graph size for spectral sparsification. 
    more » « less
  4. Bayesian inference provides a systematic framework for integration of data with mathematical models to quantify the uncertainty in the solution of the inverse problem. However, the solution of Bayesian inverse problems governed by complex forward models described by partial differential equations (PDEs) remains prohibitive with black-box Markov chain Monte Carlo (MCMC) methods. We present hIPPYlib-MUQ, an extensible and scalable software framework that contains implementations of state-of-the art algorithms aimed to overcome the challenges of high-dimensional, PDE-constrained Bayesian inverse problems. These algorithms accelerate MCMC sampling by exploiting the geometry and intrinsic low-dimensionality of parameter space via derivative information and low rank approximation. The software integrates two complementary open-source software packages, hIPPYlib and MUQ. hIPPYlib solves PDE-constrained inverse problems using automatically-generated adjoint-based derivatives, but it lacks full Bayesian capabilities. MUQ provides a spectrum of powerful Bayesian inversion models and algorithms, but expects forward models to come equipped with gradients and Hessians to permit large-scale solution. By combining these two complementary libraries, we created a robust, scalable, and efficient software framework that realizes the benefits of each and allows us to tackle complex large-scale Bayesian inverse problems across a broad spectrum of scientific and engineering disciplines. To illustrate the capabilities of hIPPYlib-MUQ, we present a comparison of a number of MCMC methods available in the integrated software on several high-dimensional Bayesian inverse problems. These include problems characterized by both linear and nonlinear PDEs, various noise models, and different parameter dimensions. The results demonstrate that large (∼ 50×) speedups over conventional black box and gradient-based MCMC algorithms can be obtained by exploiting Hessian information (from the log-posterior), underscoring the power of the integrated hIPPYlib-MUQ framework. 
    more » « less
  5. Abstract Obtaining lightweight and accurate approximations of discretized objective functional Hessians in inverse problems governed by partial differential equations (PDEs) is essential to make both deterministic and Bayesian statistical large-scale inverse problems computationally tractable. The cubic computational complexity of dense linear algebraic tasks, such as Cholesky factorization, that provide a means to sample Gaussian distributions and determine solutions of Newton linear systems is a computational bottleneck at large-scale. These tasks can be reduced to log-linear complexity by utilizing hierarchical off-diagonal low-rank (HODLR) matrix approximations. In this work, we show that a class of Hessians that arise from inverse problems governed by PDEs are well approximated by the HODLR matrix format. In particular, we study inverse problems governed by PDEs that model the instantaneous viscous flow of ice sheets. In these problems, we seek a spatially distributed basal sliding parameter field such that the flow predicted by the ice sheet model is consistent with ice sheet surface velocity observations. We demonstrate the use of HODLR Hessian approximation to efficiently sample the Laplace approximation of the posterior distribution with covariance further approximated by HODLR matrix compression. Computational studies are performed which illustrate ice sheet problem regimes for which the Gauss–Newton data-misfit Hessian is more efficiently approximated by the HODLR matrix format than the low-rank (LR) format. We then demonstrate that HODLR approximations can be favorable, when compared to global LR approximations, for large-scale problems by studying the data-misfit Hessian associated with inverse problems governed by the first-order Stokes flow model on the Humboldt glacier and Greenland ice sheet. 
    more » « less