skip to main content


Title: Low-overhead deadlock prediction
Multithreaded programs can have deadlocks, even after deployment, so users may want to run deadlock tools on deployed programs. However, current deadlock predictors such as MagicLock and UnDead have large overheads that make them impractical for end-user deployment and confine their use to development time. Such overhead stems from running an exponential-time algorithm on a large execution trace. In this paper, we present the first low-overhead deadlock predictor, called AirLock, that is fit for both in-house testing and deployed programs. AirLock maintains a small predictive lock reachability graph, searches the graph for cycles, and runs an exponential-time algorithm only for each cycle. This approach lets AirLock find the same deadlocks as MagicLock and UnDead but with much less overhead because the number of cycles is small in practice. Our experiments with real-world benchmarks show that the average time overhead of AirLock is 3.5%, which is three orders of magnitude less than that of MagicLock and UnDead. AirLock's low overhead makes it suitable for use with fuzz testers like AFL and on-the-fly after deployment.  more » « less
Award ID(s):
1815496
NSF-PAR ID:
10359293
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
42nd International Conference on Software Engineering (ICSE ’20)
Page Range / eLocation ID:
1298 to 1309
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Deadlocks are notorious bugs in multithreaded programs, causing serious reliability issues. However, they are difficult to be fully expunged before deployment, as their appearances typically depend on specific inputs and thread schedules, which require the assistance of dynamic tools. However, existing deadlock detection tools mainly focus on locks, but cannot detect deadlocks related to condition variables. This paper presents a novel approach to fill this gap. It extends the classic lock dependency to generalized dependency by abstracting the signal for the condition variable as a special resource so that communication deadlocks can be modeled as hold-and-wait cycles as well. It further designs multiple practical mechanisms to record and analyze generalized dependencies. In the end, this paper presents the implementation of the tool, called UnHang. Experimental results on real applications show that UnHang is able to find all known deadlocks and uncover two new deadlocks. Overall, UnHang only imposes around 3% performance overhead and 8% memory overhead, making it a practical tool for the deployment environment. 
    more » « less
  2. Many algorithms for analyzing parallel programs, for example to detect deadlocks or data races or to calculate the execution cost, are based on a model variously known as a cost graph, computation graph or dependency graph, which captures the parallel structure of threads in a program. In modern parallel programs, computation graphs are highly dynamic and depend greatly on the program inputs and execution details. As such, most analyses that use these graphs are either dynamic analyses or are specialized static analyses that gather a subset of dependency information for a specific purpose. This paper introduces graph types, which compactly represent all of the graphs that could arise from program execution. Graph types are inferred from a parallel program using a graph type system and inference algorithm, which we present drawing on ideas from Hindley-Milner type inference, affine logic and region type systems. We have implemented the inference algorithm over a subset of OCaml, extended with parallelism primitives, and we demonstrate how graph types can be used to accelerate the development of new graph-based static analyses by presenting proof-of-concept analyses for deadlock detection and cost analysis. 
    more » « less
  3. Deadlocks are one of the most notorious concurrency bugs, and significant research has focused on detecting them efficiently. Dynamic predictive analyses work by observing concurrent executions, and reason about alternative interleavings that can witness concurrency bugs. Such techniques offer scalability and sound bug reports, and have emerged as an effective approach for concurrency bug detection, such as data races. Effective dynamic deadlock prediction, however, has proven a challenging task, as no deadlock predictor currently meets the requirements of soundness, high-precision, and efficiency.

    In this paper, we first formally establish that this tradeoff is unavoidable, by showing that (a) sound and complete deadlock prediction is intractable, in general, and (b) even the seemingly simpler task of determining the presence of potential deadlocks, which often serve as unsound witnesses for actual predictable deadlocks, is intractable. The main contribution of this work is a new class of predictable deadlocks, called sync(hronization)-preserving deadlocks. Informally, these are deadlocks that can be predicted by reordering the observed execution while preserving the relative order of conflicting critical sections. We present two algorithms for sound deadlock prediction based on this notion. Our first algorithm SPDOffline detects all sync-preserving deadlocks, with running time that is linear per abstract deadlock pattern, a novel notion also introduced in this work. Our second algorithm SPDOnline predicts all sync-preserving deadlocks that involve two threads in a strictly online fashion, runs in overall linear time, and is better suited for a runtime monitoring setting.

    We implemented both our algorithms and evaluated their ability to perform offline and online deadlock-prediction on a large dataset of standard benchmarks. Our results indicate that our new notion of sync-preserving deadlocks is highly effective, as (i) it can characterize the vast majority of deadlocks and (ii) it can be detected using an online, sound, complete and highly efficient algorithm.

     
    more » « less
  4. IoT devices can be used to complete a wide array of physical tasks, but due to factors such as low computational resources and distributed physical deployment, they are susceptible to a wide array of faulty behaviors. Many devices deployed in homes, vehicles, industrial sites, and hospitals carry a great risk of damage to property, harm to a person, or breach of security if they behave faultily. We propose a general fault handling system named IoTRepair, which shows promising results for effectiveness with limited latency and power overhead in an IoT environment. IoTRepair dynamically organizes and customizes fault-handling techniques to address the unique problems associated with heterogeneous IoT deployments. We evaluate IoTRepair by creating a physical implementation mirroring a typical home environment to motivate the effectiveness of this system. Our evaluation showed that each of our fault-handling functions could be completed within 100 milliseconds after fault identification, which is a fraction of the time that state-of-the-art fault-identification methods take (measured in minutes). The power overhead is equally small, with the computation and device action consuming less than 30 milliwatts. This evaluation shows that IoTRepair not only can be deployed in a physical system, but offers significant benefits at a low overhead. 
    more » « less
  5. Connected Autonomous Vehicles (CAVs) are expected to enable reliable, efficient, and intelligent transportation systems. Most motion planning algorithms for multi-agent systems implicitly assume that all vehicles/agents will execute the expected plan with a small error and evaluate their safety constraints based on this fact. This assumption, however, is hard to keep for CAVs since they may have to change their plan (e.g., to yield to another vehicle) or are forced to stop (e.g., A CAV may break down). While it is desired that a CAV never gets involved in an accident, it may be hit by other vehicles and sometimes, preventing the accident is impossible (e.g., getting hit from behind while waiting behind the red light). Responsibility-Sensitive Safety (RSS) is a set of safety rules that defines the objective of CAV to blame, instead of safety. Thus, instead of developing a CAV algorithm that will avoid any accident, it ensures that the ego vehicle will not be blamed for any accident it is a part of. Original RSS rules, however, are hard to evaluate for merge, intersection, and unstructured road scenarios, plus RSS rules do not prevent deadlock situations among vehicles. In this paper, we propose a new formulation for RSS rules that can be applied to any driving scenario. We integrate the proposed RSS rules with the CAV’s motion planning algorithm to enable cooperative driving of CAVs. We use Control Barrier Functions to enforce safety constraints and compute the energy optimal trajectory for the ego CAV. Finally, to ensure liveness, our approach detects and resolves deadlocks in a decentralized manner. We have conducted different experiments to verify that the ego CAV does not cause an accident no matter when other CAVs slow down or stop. We also showcase our deadlock detection and resolution mechanism using our simulator. Finally, we compare the average velocity and fuel consumption of vehicles when they drive autonomously with the case that they are autonomous and connected. 
    more » « less