skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: WARio: efficient code generation for intermittent computing
Intermittently operating embedded computing platforms powered by energy harvesting require software frameworks to protect from errors caused by Write After Read (WAR) dependencies. A powerful method of code protection for systems with non-volatile main memory utilizes compiler analysis to insert a checkpoint inside each WAR violation in the code. However, such software frameworks are oblivious to the code structure---and therefore, inefficient---when many consecutive WAR violations exist. Our insight is that by transforming the input code, i.e., moving individual write operations from unique WARs close to each other, we can significantly reduce the number of checkpoints. This idea is the foundation for WARio: a set of compiler transformations for efficient code generation for intermittent computing. WARio, on average, reduces checkpoint overhead by 58%, and up to 88%, compared to the state of the art across various benchmarks.  more » « less
Award ID(s):
1908488 2145584 1850496
PAR ID:
10344721
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation
Page Range / eLocation ID:
777 to 791
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Recent trends in high-performance computing show an increase in the adoption of performance portable frameworks such as Kokkos and interpreted languages such as Python. PyKokkos follows these trends and enables programmers to write performance-portable kernels in Python which greatly increases productivity. One issue that programmers still face is how to organize parallel code, as splitting code into separate kernels simplifies testing and debugging but may result in suboptimal performance. To enable programmers to organize kernels in any way they prefer while ensuring good performance, we present PyFuser, a program analysis framework for automatic fusion of performance portable PyKokkos kernels. PyFuser dynamically traces kernel calls and lazily fuses them once the result is requested by the application. PyFuser generates fused kernels that execute faster due to better reuse of data, improved compiler optimizations, and reduced kernel launch overhead, while not requiring any changes to existing PyKokkos code. We also introduce automated code transformations that further optimize the fused kernels generated by PyFuser. Our experiments show that on average PyFuser achieves speedups compared to unfused kernels of 3.8x on NVIDIA and AMD GPUs, as well as Intel and AMD CPUs. 
    more » « less
  2. Aldrich, Jonathan; Salvaneschi, Guido (Ed.)
    Tensor processing infrastructures such as deep learning frameworks and specialized hardware accelerators have revolutionized how computationally intensive code from domains such as deep learning and image processing is executed and optimized. These infrastructures provide powerful and expressive abstractions while ensuring high performance. However, to utilize them, code must be written specifically using the APIs / ISAs of such software frameworks or hardware accelerators. Importantly, given the fast pace of innovation in these domains, code written today quickly becomes legacy as new frameworks and accelerators are developed, and migrating such legacy code manually is a considerable effort. To enable developers in leveraging such DSLs while preserving their current programming paradigm, we present Tenspiler, a verified-lifting-based compiler that uses program synthesis to translate sequential programs written in general-purpose programming languages (e.g., C++ or Python code that does not leverage any specialized framework or accelerator) into tensor operations. Central to Tenspiler is our carefully crafted yet simple intermediate language, named TensIR, that expresses tensor operations. TensIR enables efficient lifting, verification, and code generation. Unlike classical pattern-matching-based compilers, Tenspiler uses program synthesis to translate input code into TensIR, which is then compiled to the target API / ISA. Currently, Tenspiler already supports six DSLs, spanning a broad spectrum of software and hardware environments. Furthermore, we show that new backends can be easily supported by Tenspiler by adding simple pattern-matching rules for TensIR. Using 10 real-world code benchmark suites, our experimental evaluation shows that by translating code to be executed on 6 different software frameworks and hardware devices, Tenspiler offers on average 105× kernel and 9.65× end-to-end execution time improvement over the fully-optimized sequential implementation of the same benchmarks. 
    more » « less
  3. Trends indicate that emerging SmartNICs, either from different vendors or generations from the same vendor, exhibit substantial differences in hardware parallelism and memory interconnects. These variations make porting programs across NICs highly complex and time-consuming, requiring programmers to significantly refactor code for performance based on each target NIC’s hardware characteristics. We argue that an ideal SmartNIC compilation framework should allow developers to write target-independent programs, with the compiler automatically managing cross-NIC porting and performance optimization. We present such a framework, Alkali, that achieves this by (1) proposing a new intermediate representation for building flexible compiler infrastructure for multiple NIC targets and (2) developing a new iterative parallelism optimization algorithm that automatically ports and parallelizes the input programs based on the target NIC’s hardware characteristics. Experiments across a wide range of NIC applications demonstrate that Alkali enables developers to easily write portable, high-performance NIC programs. Our compiler optimization passes can automatically port these programs and make them run efficiently across all targets, achieving performance within 9.8% of hand-tuned expert implementations. 
    more » « less
  4. Intermittent computing systems execute long-running tasks under a transient power supply such as an energy harvesting power source. During a power loss, they save intermediate program state as a checkpoint into write-efficient non-volatile memory. When the power is restored, the system state is reconstructed from the checkpoint, and the long-running computation continues. We analyze the security risks when power interruption is used as an attack vector, and we demonstrate the need to protect the integrity, authenticity, confidentiality, continuity, and freshness of checkpointed data. We propose a secure checkpointing technique called the Se-cure Intermittent Computing Protocol (SICP). The proposed protocol has the following properties. First, it associates every checkpoint with a unique power-on state to checkpoint replay. Second, every checkpoint is cryptographically chained to its predecessor, providing continuity, which enables the programmer to carry run-time security properties such as attested program images across power loss events. Third, SICP is atomic and resistant to power loss. We demonstrate a prototype implementation of SICP on an MSP430 microcontroller, and we investigate the overhead of SICP for several cryptographic kernels. To the best of our knowledge, this is the first work to provide a robust solution to secure intermittent computing. 
    more » « less
  5. Compiler technologies in deep learning and domain-specific hardware acceleration are increasingly adopting extensible compiler frameworks such as Multi-Level Intermediate Representation (MLIR) to facilitate more efficient development. With MLIR, compiler developers can easily define their own custom IRs in the form of MLIR dialects. However, the diversity and rapid evolution of such custom IRs make it impractical to manually write a custom test generator for each dialect. To address this problem, we design a new test generator called SynthFuzz that combines grammar-based fuzzing with custom mutation synthesis. The key essence of SynthFuzz is two fold: (1) It automatically infers parameterized context-dependent custom mutations from existing test cases. (2) It then concretizes the mutation's content depending on the target context and reduces the chance of inserting invalid edits by performing k - ancestor and prefix/postfix matching. It obviates the need to manually define custom mutation operators for each dialect. We compare SynthFuzz to three baselines: Grammarinator-a grammar-based fuzzer without custom mutations, MLIRSmith-a custom test generator for MLIR core dialects, and NeuRI-a custom test generator for ML models with parameterization of tensor shapes. We conduct this comprehensive comparison on four different MLIR projects. Each project defines a new set of MLIR dialects where manually writing a custom test generator would take weeks of effort. Our evaluation shows that SynthFuzz on average improves MLIR dialect pair coverage by 1.75 ×, which increases branch coverage by 1.22 ×. Further, we show that our context dependent custom mutation increases the proportion of valid tests by up to 1.11 ×, indicating that SynthFuzz correctly concretizes its parameterized mutations with respect to the target context. Parameterization of the mutations reduces the fraction of tests violating the base MLIR constraints by 0.57 ×, increasing the time spent fuzzing dialect-specific code. 
    more » « less