skip to main content


Title: Certified concurrent abstraction layers
Concurrent abstraction layers are ubiquitous in modern computer systems because of the pervasiveness of multithreaded programming and multicore hardware. Abstraction layers are used to hide the implementation details (e.g., fine-grained synchronization) and reduce the complex dependencies among components at different levels of abstraction. Despite their obvious importance, concurrent abstraction layers have not been treated formally. This severely limits the applicability of layer-based techniques and makes it difficult to scale verification across multiple concurrent layers. In this paper, we present CCAL---a fully mechanized programming toolkit developed under the CertiKOS project---for specifying, composing, compiling, and linking certified concurrent abstraction layers. CCAL consists of three technical novelties: a new game-theoretical, strategy-based compositional semantic model for concurrency (and its associated program verifiers), a set of formal linking theorems for composing multithreaded and multicore concurrent layers, and a new CompCertX compiler that supports certified thread-safe compilation and linking. The CCAL toolkit is implemented in Coq and supports layered concurrent programming in both C and assembly. It has been successfully applied to build a fully certified concurrent OS kernel with fine-grained locking.  more » « less
Award ID(s):
1715154
NSF-PAR ID:
10066494
Author(s) / Creator(s):
; ; ;  ; ; ; ; ;
Date Published:
Journal Name:
PLDI'18: 2018 ACM SIGPLAN Conference on Programming Language Design and Implementation
Page Range / eLocation ID:
646 to 661
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Large-scale software verification relies critically on the use of compositional languages, semantic models, specifications, and verification techniques. Recent work on certified abstraction layers synthesizes game semantics, the refinement calculus, and algebraic effects to enable the composition of heterogeneous components into larger certified systems. However, in existing models of certified abstraction layers, compositionality is restricted by the lack of encapsulation of state. In this paper, we present a novel game model for certified abstraction layers where the semantics of layer interfaces and implementations are defined solely based on their observable behaviors. Our key idea is to leverage Reddy's pioneer work on modeling the semantics of imperative languages not as functions on global states but as objects with their observable behaviors. We show that a layer interface can be modeled as an object type (i.e., a layer signature) plus an object strategy. A layer implementation is then essentially a regular map, in the sense of Reddy, from an object with the underlay signature to that with the overlay signature. A layer implementation is certified when its composition with the underlay object strategy implements the overlay object strategy. We also describe an extension that allows for non-determinism in layer interfaces. After formulating layer implementations as regular maps between object spaces, we move to concurrency and design a notion of concurrent object space, where sequential traces may be identified modulo permutation of independent operations. We show how to express protected shared object concurrency, and a ticket lock implementation, in a simple model based on regular maps between concurrent object spaces. 
    more » « less
  2. The binary-forking model is a parallel computation model, formally defined by Blelloch et al., in which a thread can fork a concurrent child thread, recursively and asynchronously. The model incurs a cost of Theta(log n) to spawn or synchronize n tasks or threads. The binary-forking model realistically captures the performance of parallel algorithms implemented using modern multithreaded programming languages on multicore shared-memory machines. In contrast, the widely studied theoretical PRAM model does not consider the cost of spawning and synchronizing threads, and as a result, algorithms achieving optimal performance bounds in the PRAM model may not be optimal in the binary-forking model. Often, algorithms need to be redesigned to achieve optimal performance bounds in the binary-forking model and the non-constant synchronization cost makes the task challenging. In this paper, we show that in the binary-forking model we can achieve optimal or near-optimal span with negligible or no asymptotic blowup in work for comparison-based sorting, Strassen's matrix multiplication (MM), and the Fast Fourier Transform (FFT). Our major results are as follows: (1) A randomized comparison-based sorting algorithm with optimal O(log n) span and O(nlog n) work, both w.h.p. in n. (2) An optimal O(log n) span algorithm for Strassen's matrix multiplication (MM) with only a loglog n - factor blow-up in work as well as a near-optimal O(log n loglog log n) span algorithm with no asymptotic blow-up in work. (3) A near-optimal O(log n logloglog n) span Fast Fourier Transform (FFT) algorithm with less than a log n-factor blow-up in work for all practical values of n (i.e., n le 10 ^10,000) 
    more » « less
  3. null (Ed.)
    The binary-forking model is a parallel computation model, formally defined by Blelloch et al., in which a thread can fork a concurrent child thread, recursively and asynchronously. The model incurs a cost of Theta(log n) to spawn or synchronize n tasks or threads. The binary-forking model realistically captures the performance of parallel algorithms implemented using modern multithreaded programming languages on multicore shared-memory machines. In contrast, the widely studied theoretical PRAM model does not consider the cost of spawning and synchronizing threads, and as a result, algorithms achieving optimal performance bounds in the PRAM model may not be optimal in the binary-forking model. Often, algorithms need to be redesigned to achieve optimal performance bounds in the binary-forking model and the non-constant synchronization cost makes the task challenging. In this paper, we show that in the binary-forking model we can achieve optimal or near-optimal span with negligible or no asymptotic blowup in work for comparison-based sorting, Strassen's matrix multiplication (MM), and the Fast Fourier Transform (FFT). Our major results are as follows: (1) A randomized comparison-based sorting algorithm with optimal O(log n) span and O(nlog n) work, both w.h.p. in n. (2) An optimal O(log n) span algorithm for Strassen's matrix multiplication (MM) with only a loglog n - factor blow-up in work as well as a near-optimal O(log n loglog log n) span algorithm with no asymptotic blow-up in work. (3) A near-optimal O(log n logloglog n) span Fast Fourier Transform (FFT) algorithm with less than a log n-factor blow-up in work for all practical values of n (i.e., n le 10 ^10,000 ). 
    more » « less
  4. null (Ed.)
    Because of its many desirable properties, such as its ability to control effects and thus potentially disastrous race conditions, functional programming offers a viable approach to programming modern multicore computers. Over the past decade several parallel functional languages, typically based on dialects of ML and Haskell, have been developed. These languages, however, have traditionally underperformed procedural languages (such as C and Java). The primary reason for this is their hunger for memory, which only grows with parallelism, causing traditional memory management techniques to buckle under increased demand for memory. Recent work opened a new angle of attack on this problem by identifying a memory property of determinacy-race-free parallel programs, called disentanglement, which limits the knowledge of concurrent computations about each other’s memory allocations. The work has showed some promise in delivering good time scalability. In this paper, we present provably space-efficient automatic memory management techniques for determinacy-race-free functional parallel programs, allowing both pure and imperative programs where memory may be destructively updated. We prove that for a program with sequential live memory of R * , any P -processor garbage-collected parallel run requires at most O ( R * · P ) memory. We also prove a work bound of O ( W + R * P ) for P -processor executions, accounting also for the cost of garbage collection. To achieve these results, we integrate thread scheduling with memory management. The idea is to coordinate memory allocation and garbage collection with thread scheduling decisions so that each processor can allocate memory without synchronization and independently collect a portion of memory by consulting a collection policy, which we formulate. The collection policy is fully distributed and does not require communicating with other processors. We show that the approach is practical by implementing it as an extension to the MPL compiler for Parallel ML. Our experimental results confirm our theoretical bounds and show that the techniques perform and scale well. 
    more » « less
  5. Multicore systems should support both speculative and non-speculative parallelism. Speculative parallelism is easy to use and is crucial to scale many challenging applications, while non-speculative parallelism is more efficient and allows parallel irrevocable actions (e.g., parallel I/O). Unfortunately, prior techniques are far from this goal. Hardware transactional memory (HTM) systems support speculative (transactional) and non-speculative (non-transactional) work, but lack coordination mechanisms between the two, and are limited to unordered parallelism. Prior work has extended HTMs to avoid the limitations of speculative execution, e.g., through escape actions and open-nested transactions. But these mechanisms are incompatible with systems that exploit ordered parallelism, which parallelize a broader range of applications and are easier to use. We contribute two techniques that enable seamlessly composing and coordinating speculative and non-speculative work in the context of ordered parallelism: (i) a task-based execution model that efficiently coordinates concurrent speculative and non-speculative ordered tasks, allowing them to create tasks of either kind and to operate on shared data; and (ii) a safe way for speculative tasks to invoke software-managed speculative actions that avoid hardware version management and conflict detection. These contributions improve efficiency and enable new capabilities. Across several benchmarks, they allow the system to dynamically choose whether to execute tasks speculatively or non-speculatively, avoid needless conflicts among speculative tasks, and allow speculative tasks to safely invoke irrevocable actions. 
    more » « less