skip to main content


Title: Adaptive Low-Rank Approximations for Operator Equations: Accuracy Control and Computational Complexity
The challenge of mastering computational tasks of enormous size tends to frequently override questioning the quality of the numerical outcome in terms of accuracy. By this we do not mean the accuracy within the discrete setting, which itself may also be far from evident for ill-conditioned problems or when iterative solvers are involved. By accuracy-controlled computation we mean the deviation of the numerical approximation from the exact solution of an underlying continuous problem in a relevant metric, which has been the initiating interest in the first place. Can the accuracy of a numerical result be rigorously certified – a question that is particularly important in the context of uncertainty quantification, when many possible sources of uncertainties inter- act. This is the guiding question throughout this article, which reviews recent developments of low-rank approximation methods for problems in high spatial dimensions. In particular, we highlight the role of adaptivity when dealing with such strongly nonlinear methods that integrate in a natural way issues of discrete and continuous accuracy.  more » « less
Award ID(s):
1720297
NSF-PAR ID:
10229895
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Contemporary mathematics
Volume:
754
ISSN:
0271-4132
Page Range / eLocation ID:
1 - 44
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Gradient descent-based optimization methods underpin the parameter training of neural networks, and hence comprise a significant component in the impressive test results found in a number of applications. Introducing stochasticity is key to their success in practical problems, and there is some understanding of the role of stochastic gradient descent in this context. Momentum modifications of gradient descent such as Polyak’s Heavy Ball method (HB) and Nesterov’s method of accelerated gradients (NAG), are also widely adopted. In this work our focus is on understanding the role of momentum in the training of neural networks, concentrating on the common situation in which the momentum contribution is fixed at each step of the algorithm. To expose the ideas simply we work in the deterministic setting. Our approach is to derive continuous time approximations of the discrete algorithms; these continuous time approximations provide insights into the mechanisms at play within the discrete algorithms. We prove three such approximations. Firstly we show that standard implementations of fixed momentum methods approximate a time-rescaled gradient descent flow, asymptotically as the learning rate shrinks to zero; this result does not distinguish momentum methods from pure gradient descent, in the limit of vanishing learning rate. We then proceed to prove two results aimed at understanding the observed practical advantages of fixed momentum methods over gradient descent, when implemented in the non-asymptotic regime with fixed small, but non-zero, learning rate. We achieve this by proving approximations to continuous time limits in which the small but fixed learning rate appears as a parameter; this is known as the method of modified equations in the numerical analysis literature, recently rediscovered as the high resolution ODE approximation in the machine learning context. In our second result we show that the momentum method is approximated by a continuous time gradient flow, with an additional momentum-dependent second order time-derivative correction, proportional to the learning rate; this may be used to explain the stabilizing effect of momentum algorithms in their transient phase. Furthermore in a third result we show that the momentum methods admit an exponentially attractive invariant manifold on which the dynamics reduces, approximately, to a gradient flow with respect to a modified loss function, equal to the original loss function plus a small perturbation proportional to the learning rate; this small correction provides convexification of the loss function and encodes additional robustness present in momentum methods, beyond the transient phase. 
    more » « less
  2. Summary

    While system dynamics are usually derived in continuous time, respective model‐based optimal control problems can only be solved numerically, ie, as discrete‐time approximations. Thus, the performance of control methods depends on the choice of numerical integration scheme. In this paper, we present a first‐order discretization of linear quadratic optimal control problems for mechanical systems that is structure preserving and hence preferable to standard methods. Our approach is based on symplectic integration schemes and thereby inherits structure from the original continuous‐time problem. Starting from a symplectic discretization of the system dynamics, modified discrete‐time Riccati equations are derived, which preserve the Hamiltonian structure of optimal control problems in addition to the mechanical structure of the control system. The method is extended to optimal tracking problems for nonlinear mechanical systems and evaluated in several numerical examples. Compared to standard discretization, it improves the approximation quality by orders of magnitude. This enables low‐bandwidth control and sensing in real‐time autonomous control applications.

     
    more » « less
  3. Positron emission tomography (PET) is traditionally modeled as discrete systems. Such models may be viewed as piecewise constant approximations of the underlying continuous model for the physical processes and geometry of the PET imaging. Due to the low accuracy of piecewise constant approximations, discrete models introduce an irreducible modeling error which fundamentally limits the quality of reconstructed images. To address this bottleneck, we propose an integral equation model for the PET imaging based on the physical and geometrical considerations, which describes accurately the true coincidences. We show that the proposed integral equation model is equivalent to the existing idealized model in terms of line integrals which is accurate but not suitable for numerical approximation. The proposed model allows us to discretize it using higher accuracy approximation methods. In particular, we discretize the integral equation by using the collocation principle with piecewise linear polynomials. The discretization leads to new ill-conditioned discrete systems for the PET reconstruction, which are further regularized by a novel wavelet-based regularizer. The resulting non-smooth optimization problem is then solved by a preconditioned proximity fixed-point algorithm. Convergence of the algorithm is established for a range of parameters involved in the algorithm. The proposed integral equation model combined with the discretization, regularization, and optimization algorithm provides a new PET image reconstruction method. Numerical results reveal that the proposed model substantially outperforms the conventional discrete model in terms of the consistency to simulated projection data and reconstructed image quality. This indicates that the proposed integral equation model with appropriate discretization and regularizer can significantly reduce modeling errors and suppress noise, which leads to improved image quality and projection data estimation. 
    more » « less
  4. Abstract

    This article presents high order accurate discontinuous Galerkin (DG) methods for wave problems on moving curved meshes with general choices of basis and quadrature. The proposed method adopts an arbitrary Lagrangian–Eulerian formulation to map the wave equation from a time‐dependent moving physical domain onto a fixed reference domain. For moving curved meshes, weighted mass matrices must be assembled and inverted at each time step when using explicit time‐stepping methods. We avoid this step by utilizing an easily invertible weight‐adjusted approximation. The resulting semi‐discrete weight‐adjusted DG scheme is provably energy stable up to a term that (for a fixed time interval) converges to zero with the same rate as the optimal error estimate. Numerical experiments using both polynomial and B‐spline bases verify the high order accuracy and energy stability of proposed methods.

     
    more » « less
  5. The development of data-informed predictive models for dynamical systems is of widespread interest in many disciplines. We present a unifying framework for blending mechanistic and machine-learning approaches to identify dynamical systems from noisily and partially observed data. We compare pure data-driven learning with hybrid models which incorporate imperfect domain knowledge, referring to the discrepancy between an assumed truth model and the imperfect mechanistic model as model error. Our formulation is agnostic to the chosen machine learning model, is presented in both continuous- and discrete-time settings, and is compatible both with model errors that exhibit substantial memory and errors that are memoryless. First, we study memoryless linear (w.r.t. parametric-dependence) model error from a learning theory perspective, defining excess risk and generalization error. For ergodic continuous-time systems, we prove that both excess risk and generalization error are bounded above by terms that diminish with the square-root of T T , the time-interval over which training data is specified. Secondly, we study scenarios that benefit from modeling with memory, proving universal approximation theorems for two classes of continuous-time recurrent neural networks (RNNs): both can learn memory-dependent model error, assuming that it is governed by a finite-dimensional hidden variable and that, together, the observed and hidden variables form a continuous-time Markovian system. In addition, we connect one class of RNNs to reservoir computing, thereby relating learning of memory-dependent error to recent work on supervised learning between Banach spaces using random features. Numerical results are presented (Lorenz ’63, Lorenz ’96 Multiscale systems) to compare purely data-driven and hybrid approaches, finding hybrid methods less datahungry and more parametrically efficient. We also find that, while a continuous-time framing allows for robustness to irregular sampling and desirable domain- interpretability, a discrete-time framing can provide similar or better predictive performance, especially when data are undersampled and the vector field defining the true dynamics cannot be identified. Finally, we demonstrate numerically how data assimilation can be leveraged to learn hidden dynamics from noisy, partially-observed data, and illustrate challenges in representing memory by this approach, and in the training of such models. 
    more » « less