skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 8:00 PM ET on Friday, March 21 until 8:00 AM ET on Saturday, March 22 due to maintenance. We apologize for the inconvenience.


Title: Synthesizing Precise Static Analyzers for Automatic Differentiation

We present Pasado, a technique for synthesizing precise static analyzers for Automatic Differentiation. Our technique allows one to automatically construct a static analyzer specialized for the Chain Rule, Product Rule, and Quotient Rule computations for Automatic Differentiation in a way that abstracts all of the nonlinear operations of each respective rule simultaneously. By directly synthesizing an abstract transformer for the composite expressions of these 3 most common rules of AD, we are able to obtain significant precision improvement compared to prior works which compose standard abstract transformers together suboptimally. We prove our synthesized static analyzers sound and additionally demonstrate the generality of our approach by instantiating these AD static analyzers with different nonlinear functions, different abstract domains (both intervals and zonotopes) and both forward-mode and reverse-mode AD.

We evaluate Pasado on multiple case studies, namely soundly computing bounds on a neural network’s local Lipschitz constant, soundly bounding the sensitivities of financial models, certifying monotonicity, and lastly, bounding sensitivities of the solutions of differential equations from climate science and chemistry for verified ranges of initial conditions and parameters. The local Lipschitz constants computed by Pasado on our largest CNN are up to 2750× more precise compared to the existing state-of-the-art zonotope analysis. The bounds obtained on the sensitivities of the climate, chemical, and financial differential equation solutions are between 1.31 − 2.81× more precise (on average) compared to a state-of-the-art zonotope analysis.

 
more » « less
Award ID(s):
1846354 2217144 1956374 2008883
PAR ID:
10495992
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
38th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages and Applications
Date Published:
Journal Name:
Proceedings of the ACM on Programming Languages
Volume:
7
Issue:
OOPSLA2
ISSN:
2475-1421
Page Range / eLocation ID:
1964 to 1992
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present a novel, general construction to abstractly interpret higher-order automatic differentiation (AD). Our construction allows one to instantiate an abstract interpreter for computing derivatives up to a chosen order. Furthermore, since our construction reduces the problem of abstractly reasoning about derivatives to abstractly reasoning about real-valued straight-line programs, it can be instantiated with almost any numerical abstract domain, both relational and non-relational. We formally establish the soundness of this construction. We implement our technique by instantiating our construction with both the non-relational interval domain and the relational zonotope domain to compute both first and higher-order derivatives. In the latter case, we are the first to apply a relational domain to automatic differentiation for abstracting higher-order derivatives, and hence we are also the first abstract interpretation work to track correlations across not only different variables, but different orders of derivatives. We evaluate these instantiations on multiple case studies, namely robustly explaining a neural network and more precisely computing a neural network’s Lipschitz constant. For robust interpretation, first and second derivatives computed via zonotope AD are up to 4.76× and 6.98× more precise, respectively, compared to interval AD. For Lipschitz certification, we obtain bounds that are up to 11,850× more precise with zonotopes, compared to the state-of-the-art interval-based tool. 
    more » « less
  2. Abstract

    Algorithmic differentiation (AD), also known as automatic differentiation, is a technology for accurate and efficient evaluation of derivatives of a function given as a computer model. The evaluations of such models are essential building blocks in numerous scientific computing and data analysis applications, including optimization, parameter identification, sensitivity analysis, uncertainty quantification, nonlinear equation solving, and integration of differential equations. We provide an introduction to AD and present its basic ideas and techniques, some of its most important results, the implementation paradigms it relies on, the connection it has to other domains including machine learning and parallel computing, and a few of the major open problems in the area. Topics we discuss include: forward mode and reverse mode of AD, higher‐order derivatives, operator overloading and source transformation, sparsity exploitation, checkpointing, cross‐country mode, and differentiating iterative processes.

    This article is categorized under:

    Algorithmic Development > Scalable Statistical Methods

    Technologies > Data Preprocessing

     
    more » « less
  3. We present a novel abstraction for bounding the Clarke Jacobian of a Lipschitz continuous, but not necessarily differentiable function over a local input region. To do so, we leverage a novel abstract domain built upon dual numbers, adapted to soundly over-approximate all first derivatives needed to compute the Clarke Jacobian. We formally prove that our novel forward-mode dual interval evaluation produces a sound, interval domain-based over-approximation of the true Clarke Jacobian for a given input region. Due to the generality of our formalism, we can compute and analyze interval Clarke Jacobians for a broader class of functions than previous works supported – specifically, arbitrary compositions of neural networks with Lipschitz, but non-differentiable perturbations. We implement our technique in a tool called DeepJ and evaluate it on multiple deep neural networks and non-differentiable input perturbations to showcase both the generality and scalability of our analysis. Concretely, we can obtain interval Clarke Jacobians to analyze Lipschitz robustness and local optimization landscapes of both fully-connected and convolutional neural networks for rotational, contrast variation, and haze perturbations, as well as their compositions. 
    more » « less
  4. Computing derivatives is key to many algorithms in scientific computing and machine learning such as optimization, uncertainty quantification, and stability analysis. Enzyme is a LLVM compiler plugin that performs reverse-mode automatic differentiation (AD) and thus generates high performance gradients of programs in languages including C/C++, Fortran, Julia, and Rust. Prior to this work, Enzyme and other AD tools were not capable of generating gradients of GPU kernels. Our paper presents a combination of novel techniques that make Enzyme the first fully automatic reversemode AD tool to generate gradients of GPU kernels. Since unlike other tools Enzyme performs automatic differentiation within a general-purpose compiler, we are able to introduce several novel GPU and AD-specific optimizations. To show the generality and efficiency of our approach, we compute gradients of five GPU-based HPC applications, executed on NVIDIA and AMD GPUs. All benchmarks run within an order of magnitude of the original program's execution time. Without GPU and AD-specific optimizations, gradients of GPU kernels either fail to run from a lack of resources or have infeasible overhead. Finally, we demonstrate that increasing the problem size by either increasing the number of threads or increasing the work per thread, does not substantially impact the overhead from differentiation. 
    more » « less
  5. null (Ed.)
    We present a novel method for working with the physicist's method of amortized resource analysis, which we call the quantum physicist's method. These principles allow for more precise analyses of resources that are not monotonically consumed, like stack. This method takes its name from its two major features, worldviews and resource tunneling, which behave analogously to quantum superposition and quantum tunneling. We use the quantum physicist's method to extend the Automatic Amortized Resource Analysis (AARA) type system, enabling the derivation of resource bounds based on tree depth. In doing so, we also introduce remainder contexts, which aid bookkeeping in linear type systems. We then evaluate this new type system's performance by bounding stack use of functions in the Set module of OCaml's standard library. Compared to state-of-the-art implementations of AARA, our new system derives tighter bounds with only moderate overhead. 
    more » « less