skip to main content

Title: Enhanced alternating energy minimization methods for stochastic galerkin matrix equations
In uncertainty quantification, it is commonly required to solve a forward model consisting of a partial differential equation (PDE) with a spatially varying uncertain coefficient that is represented as an affine function of a set of random variables, or parameters. Discretizing such models using stochastic Galerkin finite element methods (SGFEMs) leads to very high-dimensional discrete problems that can be cast as linear multi-term matrix equations (LMTMEs). We develop efficient computational methods for approximating solutions of such matrix equations in low rank. To do this, we follow an alternating energy minimization (AEM) framework, wherein the solution is represented as a product of two matrices, and approximations to each component are sought by solving certain minimization problems repeatedly. Inspired by proper generalized decomposition methods, the iterative solution algorithms we present are based on a rank-adaptive variant of AEM methods that successively computes a rank-one solution component at each step. We introduce and evaluate new enhancement procedures to improve the accuracy of the approximations these algorithms deliver. The efficiency and accuracy of the enhanced AEM methods is demonstrated through numerical experiments with LMTMEs associated with SGFEM discretizations of parameterized linear elliptic PDEs.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
BIT numerical mathematics
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Determinant maximization problem gives a general framework that models problems arising in as diverse fields as statistics [Puk06], convex geometry [Kha96], fair allocations [AGSS16], combinatorics [AGV18], spectral graph theory [NST19a], network design, and random processes [KT12]. In an instance of a determinant maximization problem, we are given a collection of vectors U = {v1, . . . , vn} ⊂ Rd , and a goal is to pick a subset S ⊆ U of given vectors to maximize the determinant of the matrix ∑i∈S vivi^T. Often, the set S of picked vectors must satisfy additional combinatorial constraints such as cardinality constraint (|S| ≤ k) or matroid constraint (S is a basis of a matroid defined on the vectors). In this paper, we give a polynomial-time deterministic algorithm that returns a r O(r)-approximation for any matroid of rank r ≤ d. This improves previous results that give e O(r^2)-approximation algorithms relying on e^O(r)-approximate estimation algorithms [NS16, AG17,AGV18, MNST20] for any r ≤ d. All previous results use convex relaxations and their relationship to stable polynomials and strongly log-concave polynomials. In contrast, our algorithm builds on combinatorial algorithms for matroid intersection, which iteratively improve any solution by finding an alternating negative cycle in the exchange graph defined by the matroids. While the det(.) function is not linear, we show that taking appropriate linear approximations at each iteration suffice to give the improved approximation algorithm. 
    more » « less
  2. Kernel matrices, as well as weighted graphs represented by them, are ubiquitous objects in machine learning, statistics and other related fields. The main drawback of using kernel methods (learning and inference using kernel matrices) is efficiency – given n input points, most kernel-based algorithms need to materialize the full n × n kernel matrix before performing any subsequent computation, thus incurring Ω(n^2) runtime. Breaking this quadratic barrier for various problems has therefore, been a subject of extensive research efforts. We break the quadratic barrier and obtain subquadratic time algorithms for several fundamental linear-algebraic and graph processing primitives, including approximating the top eigenvalue and eigenvector, spectral sparsification, solving lin- ear systems, local clustering, low-rank approximation, arboricity estimation and counting weighted triangles. We build on the recently developed Kernel Density Estimation framework, which (after preprocessing in time subquadratic in n) can return estimates of row/column sums of the kernel matrix. In particular, we de- velop efficient reductions from weighted vertex and weighted edge sampling on kernel graphs, simulating random walks on kernel graphs, and importance sampling on matrices to Kernel Density Estimation and show that we can generate samples from these distributions in sublinear (in the support of the distribution) time. Our reductions are the central ingredient in each of our applications and we believe they may be of independent interest. We empirically demonstrate the efficacy of our algorithms on low-rank approximation (LRA) and spectral sparsi- fication, where we observe a 9x decrease in the number of kernel evaluations over baselines for LRA and a 41x reduction in the graph size for spectral sparsification. 
    more » « less
  3. Bayesian inference provides a systematic framework for integration of data with mathematical models to quantify the uncertainty in the solution of the inverse problem. However, the solution of Bayesian inverse problems governed by complex forward models described by partial differential equations (PDEs) remains prohibitive with black-box Markov chain Monte Carlo (MCMC) methods. We present hIPPYlib-MUQ, an extensible and scalable software framework that contains implementations of state-of-the art algorithms aimed to overcome the challenges of high-dimensional, PDE-constrained Bayesian inverse problems. These algorithms accelerate MCMC sampling by exploiting the geometry and intrinsic low-dimensionality of parameter space via derivative information and low rank approximation. The software integrates two complementary open-source software packages, hIPPYlib and MUQ. hIPPYlib solves PDE-constrained inverse problems using automatically-generated adjoint-based derivatives, but it lacks full Bayesian capabilities. MUQ provides a spectrum of powerful Bayesian inversion models and algorithms, but expects forward models to come equipped with gradients and Hessians to permit large-scale solution. By combining these two complementary libraries, we created a robust, scalable, and efficient software framework that realizes the benefits of each and allows us to tackle complex large-scale Bayesian inverse problems across a broad spectrum of scientific and engineering disciplines. To illustrate the capabilities of hIPPYlib-MUQ, we present a comparison of a number of MCMC methods available in the integrated software on several high-dimensional Bayesian inverse problems. These include problems characterized by both linear and nonlinear PDEs, various noise models, and different parameter dimensions. The results demonstrate that large (∼ 50×) speedups over conventional black box and gradient-based MCMC algorithms can be obtained by exploiting Hessian information (from the log-posterior), underscoring the power of the integrated hIPPYlib-MUQ framework. 
    more » « less
  4. Abstract Obtaining lightweight and accurate approximations of discretized objective functional Hessians in inverse problems governed by partial differential equations (PDEs) is essential to make both deterministic and Bayesian statistical large-scale inverse problems computationally tractable. The cubic computational complexity of dense linear algebraic tasks, such as Cholesky factorization, that provide a means to sample Gaussian distributions and determine solutions of Newton linear systems is a computational bottleneck at large-scale. These tasks can be reduced to log-linear complexity by utilizing hierarchical off-diagonal low-rank (HODLR) matrix approximations. In this work, we show that a class of Hessians that arise from inverse problems governed by PDEs are well approximated by the HODLR matrix format. In particular, we study inverse problems governed by PDEs that model the instantaneous viscous flow of ice sheets. In these problems, we seek a spatially distributed basal sliding parameter field such that the flow predicted by the ice sheet model is consistent with ice sheet surface velocity observations. We demonstrate the use of HODLR Hessian approximation to efficiently sample the Laplace approximation of the posterior distribution with covariance further approximated by HODLR matrix compression. Computational studies are performed which illustrate ice sheet problem regimes for which the Gauss–Newton data-misfit Hessian is more efficiently approximated by the HODLR matrix format than the low-rank (LR) format. We then demonstrate that HODLR approximations can be favorable, when compared to global LR approximations, for large-scale problems by studying the data-misfit Hessian associated with inverse problems governed by the first-order Stokes flow model on the Humboldt glacier and Greenland ice sheet. 
    more » « less
  5. This work considers the minimization of a general convex function f (X) over the cone of positive semi-definite matrices whose optimal solution X* is of low-rank. Standard first-order convex solvers require performing an eigenvalue decomposition in each iteration, severely limiting their scalability. A natural nonconvex reformulation of the problem factors the variable X into the product of a rectangular matrix with fewer columns and its transpose. For a special class of matrix sensing and completion problems with quadratic objective functions, local search algorithms applied to the factored problem have been shown to be much more efficient and, in spite of being nonconvex, to converge to the global optimum. The purpose of this work is to extend this line of study to general convex objective functions f (X) and investigate the geometry of the resulting factored formulations. Specifically, we prove that when f (X) satisfies the restricted well-conditioned assumption, each critical point of the factored problem either corresponds to the optimal solution X* or a strict saddle where the Hessian matrix has a strictly negative eigenvalue. Such a geometric structure of the factored formulation ensures that many local search algorithms can converge to the global optimum with random initializations. 
    more » « less