skip to main content


Title: Leveraging Compiler Intermediate Representation for Multi- and Cross-Language Verification
Developers nowadays regularly use numerous programming languages with different characteristics and trade-offs. Unfortunately, implementing a software verifier for a new language from scratch is a large and tedious undertaking, requiring expert knowledge in multiple domains, such as compilers, verification, and constraint solving. Hence, only a tiny fraction of the used languages has readily available software verifiers to aid in the development of correct programs. In the past decade, there has been a trend of leveraging popular compiler intermediate representations (IRs), such as LLVM IR, when implementing software verifiers. Processing IR promises out-of-the-box multi- and cross-language verification since, at least in theory, a verifier ought to be able to handle a program in any programming language (and their combination) that can be compiled into the IR. In practice though, to the best of our knowledge, nobody has explored the feasibility and ease of such integration of new languages. In this paper, we provide a procedure for adding support for a new language into an IR-based verification toolflow. Using our procedure, we extend the SMACK verifier with prototypical support for 6 additional languages. We assess the quality of our extensions through several case studies, and we describe our experience in detail to guide future efforts in this area.  more » « less
Award ID(s):
1837051
NSF-PAR ID:
10188486
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the 21st International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI)
Volume:
11990
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Cryptographic tools like proof systems, multi-party computation, and fully homomorphic encryption are usually applied to computations expressed as systems of arithmetic constraints. In practice, this means that these applications rely on compilers from high-level programming languages (like C) to such constraints. This compilation task is challenging, but not entirely new: the software verification community has a rich literature on compiling programs to logical constraints (like SAT or SMT). In this work, we show that building shared compiler infrastructure for compiling to constraint representations is possible, because these representations share a common abstraction: stateless, non-uniform, non-deterministic computations that we call existentially quantified circuits, or EQCs. Moreover, we show that this shared infrastructure is useful, because it allows compilers for proof systems to benefit from decades of work on constraint compilation techniques for software verification. To make our approach concrete we create CirC, an infrastructure for building compilers to EQCs. CirC makes it easy to compile to new EQCs: we build support for three, R1CS (used for proof systems), SMT (used for verification and bug-finding), and ILP (used for optimization), in ≈2000 LOC. It’s also easy to extend CirC to support new source languages: we build a feature-complete compiler for a cryptographic language in one week and ≈900 LOC, whereas the reference compiler for the same language took years to write, comprises ≈24000 LOC, and produces worse-performing output than our compiler. Finally, CirC enables novel applications that combine multiple EQCs. For example, we build the first pipeline that (1) automatically identifies bugs in programs, then (2) automatically constructs cryptographic proofs of the bugs’ existence. 
    more » « less
  2. To guard against machine failures, modern internet services store multiple replicas of the same application data within and across data centers, which introduces the problem of keeping geodistributed replicas consistent with one another in the face of network partitions and unpredictable message latency. To avoid costly and conservative synchronization protocols, many real-world systems provide only weak consistency guarantees (e.g., eventual, causal, or PRAM consistency), which permit certain kinds of disagreement among replicas. There has been much recent interest in language support for specifying and verifying such consistency properties. Although these properties are usually beyond the scope of what traditional type checkers or compiler analyses can guarantee, solver-aided languages are up to the task. Inspired by systems like Liquid Haskell [43] and Rosette [42], we believe that close integration between a language and a solver is the right path to consistent-by-construction distributed applications. Unfortunately, verifying distributed consistency properties requires reasoning about transitive relations (e.g., causality or happens-before), partial orders (e.g., the lattice of replica states under a convergent merge operation), and properties relevant to message processing or API invocation (e.g., commutativity and idempotence) that cannot be easily or efficiently carried out by general-purpose SMT solvers that lack native support for this kind of reasoning. We argue that domain-specific SMT-based tools that exploit the mathematical foundations of distributed consistency would enable both more efficient verification and improved ease of use for domain experts. The principle of exploiting domain knowledge for efficiency and expressivity that has borne fruit elsewhere — such as in the development of high-performance domain-specific languages that trade off generality to gain both performance and productivity — also applies here. Languages augmented with domain-specific, consistency-aware solvers would support the rapid implementation of formally verified programming abstractions that guarantee distributed consistency. In the long run, we aim to democratize the development of such domain-specific solvers by creating a framework for domain-specific solver development that brings new theory solver implementation within the reach of programmers who are not necessarily SMT solver internals experts. 
    more » « less
  3. Automated deductive program synthesis promises to generate executable programs from concise specifications, along with proofs of correctness that can be independently verified using third-party tools. However, an attempt to exercise this promise using existing proof-certification frameworks reveals significant discrepancies in how proof derivations are structured for two different purposes: program synthesis and program verification. These discrepancies make it difficult to use certified verifiers to validate synthesis results, forcing one to write an ad-hoc translation procedure from synthesis proofs to correctness proofs for each verification backend. In this work, we address this challenge in the context of the synthesis and verification of heap-manipulating programs. We present a technique for principled translation of deductive synthesis derivations (a.k.a. source proofs) into deductive target proofs about the synthesised programs in the logics of interactive program verifiers. We showcase our technique by implementing three different certifiers for programs generated via SuSLik, a Separation Logic-based tool for automated synthesis of programs with pointers, in foundational verification frameworks embedded in Coq: Hoare Type Theory (HTT), Iris, and Verified Software Toolchain (VST), producing concise and efficient machine-checkable proofs for characteristic synthesis benchmarks. 
    more » « less
  4. Neural networks can learn complex, non-convex functions, and it is challenging to guarantee their correct behavior in safety-critical contexts. Many approaches exist to find failures in networks (e.g., adversarial examples), but these cannot guarantee the absence of failures. Verification algorithms address this need and provide formal guarantees about a neural network by answering "yes or no" questions. For example, they can answer whether a violation exists within certain bounds. However, individual "yes or no" questions cannot answer qualitative questions such as “what is the largest error within these bounds”; the answers to these lie in the domain of optimization. Therefore, we propose strategies to extend existing verifiers to perform optimization and find: (i) the most extreme failure in a given input region and (ii) the minimum input perturbation required to cause a failure. A naive approach using a bisection search with an off-the-shelf verifier results in many expensive and overlapping calls to the verifier. Instead, we propose an approach that tightly integrates the optimization process into the verification procedure, achieving better runtime performance than the naive approach. We evaluate our approach implemented as an extension of Marabou, a state-of-the-art neural network verifier, and compare its performance with the bisection approach and MIPVerify, an optimization-based verifier. We observe complementary performance between our extension of Marabou and MIPVerify 
    more » « less
  5. Most software domains rely on compilers to translate high-level code to multiple different machine languages, with performance not too much worse than what developers would have the patience to write directly in assembly language. However, cryptography has been an exception, where many performance-critical routines have been written directly in assembly (sometimes through metaprogramming layers). Some past work has shown how to do formal verification of that assembly, and other work has shown how to generate C code automatically along with formal proof, but with consequent performance penalties vs. the best- known assembly. We present CryptOpt, the first compilation pipeline that specializes high-level cryptographic functional programs into assembly code significantly faster than what GCC or Clang produce, with mechanized proof (in Coq) whose final theorem statement mentions little beyond the input functional program and the operational semantics of x86-64 assembly. On the optimization side, we apply randomized search through the space of assembly programs, with repeated automatic benchmarking on target CPUs. On the formal-verification side, we connect to the Fiat Cryptography framework (which translates functional programs into C-like IR code) and extend it with a new formally verified program-equivalence checker, incorporating a modest subset of known features of SMT solvers and symbolic-execution engines. The overall prototype is quite practical, e.g. producing new fastest-known implementations of finite-field arithmetic for both Curve25519 (part of the TLS standard) and the Bitcoin elliptic curve secp256k1 for the Intel 12𝑡ℎ and 13𝑡ℎ generations. 
    more » « less