This paper considers the callback reachability problem --- determining if a callback can be called by an event-driven framework in an unexpected state. Event-driven programming frameworks are pervasive for creating user-interactive applications (apps) on just about every modern platform. Control flow between callbacks is determined by the framework and largely opaque to the programmer. This opacity of the callback control flow not only causes difficulty for the programmer but is also difficult for those developing static analysis. Previous static analysis techniques address this opacity either by assuming an arbitrary framework implementation or attempting to eagerly specify all possible callback control flow, but this is either too coarse to prove properties requiring callback-ordering constraints or too burdensome and tricky to get right. Instead, we present a middle way where the callback control flow can be gradually refined in a targeted manner to prove assertions of interest. The key insight to get this middle way is by reasoning about the history of method invocations at the boundary between app and framework code --- enabling a decoupling of the specification of callback control flow from the analysis of app code. We call the sequence of such boundary-method invocations message histories and develop message-history logics to do this reasoning. In particular, we define the notion of an application-only transition system with boundary transitions, a message-history program logic for programs with such transitions, and a temporal specification logic for capturing callback control flow in a targeted and compositional manner. Then to utilize the logics in a goal-directed verifier, we define a way to combine after-the-fact an assertion about message histories with a specification of callback control flow. We implemented a prototype message history-based verifier called Historia and provide evidence that our approach is uniquely capable of distinguishing between buggy and fixed versions on challenging examples drawn from real-world issues and that our targeted specification approach enables proving the absence of multi-callback bug patterns in real-world open-source Android apps.
more »
« less
Detecting Callback Related Deep Vulnerabilities in Linux Device Drivers
Extensibility is an important design goal for software frameworks that are expected to evolve in a variety of dimensions. Callback mechanism is utilized extensively in large frameworks to achieve extensibility. However, callback mechanism introduces implicit control-flow dependencies that make program comprehension and analysis difficult. This paper presents an automated approach for detecting deep bugs/vulnerabilities that involve callbacks. Our approach consists of several stages to balance scalability and precision. Specifically, it uses a light-weight static analysis for extracting callback related interactions between the application modules and the framework modules. This information is used to extend the basic call graph of the application modules to incorporate implicit call chains due to callbacks. The second stage, summary mode, summarizes bug relevant data-flow facts for paths that start at callbacks. The third stage, summary-aware mode, uses the extended call graph to incorporate data-flow facts due to implicit paths that lead to the callbacks and detects deep bugs. We have implemented the presented model extraction and bug detection approach in a framework called MOXCAFE and applied it to Linux device drivers. Using our approach, we could detect several deep vulnerabilities.
more »
« less
- Award ID(s):
- 1815883
- PAR ID:
- 10178645
- Date Published:
- Journal Name:
- IEEE Cybersecurity Development Conference (SecDev)
- Page Range / eLocation ID:
- 62 to 75
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Deep learning has gained substantial popularity in recent years. Developers mainly rely on libraries and tools to add deep learning capabilities to their software. What kinds of bugs are frequently found in such software? What are the root causes of such bugs? What impacts do such bugs have? Which stages of deep learning pipeline are more bug prone? Are there any antipatterns? Understanding such characteristics of bugs in deep learning software has the potential to foster the development of better deep learning platforms, debugging mechanisms, development practices, and encourage the development of analysis and verification frameworks. Therefore, we study 2716 high-quality posts from Stack Overflow and 500 bug fix commits from Github about five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand the types of bugs, root causes of bugs, impacts of bugs, bug-prone stage of deep learning pipeline as well as whether there are some common antipatterns found in this buggy software. The key findings of our study include: data bug and logic bug are the most severe bug types in deep learning software appearing more than 48% of the times, major root causes of these bugs are Incorrect Model Parameter (IPS) and Structural Inefficiency (SI) showing up more than 43% of the times.We have also found that the bugs in the usage of deep learning libraries have some common antipatterns.more » « less
-
Increasingly popular Robot Operating System (ROS) framework allows building robotic systems by integrating newly developed and/or reused modules, where the modules can use different versions of the framework (e.g., ROS1 or ROS2) and programming language (e.g. C++ or Python). The majority of such robotic systems' work happens in callbacks. The framework provides various elements for initializing callbacks and for setting up the execution of callbacks. It is the responsibility of developers to compose callbacks and their execution setup elements, and hence can lead to inconsistencies related to the setup of callback execution due to developer's incomplete knowledge of the semantics of elements in various versions of the framework. Some of these inconsistencies do not throw errors at runtime, making their detection difficult for developers. We propose a static approach to detecting such inconsistencies by extracting a static view of the composition of robotic system's callbacks and their execution setup, and then checking it against the composition conventions based on the elements' semantics. We evaluate our ROSCallBaX prototype on the dataset created from the posts on developer forums and ROS projects that are publicly available. The evaluation results show that our approach can detect real inconsistencies.more » « less
-
Significant interest in applying Deep Neural Network (DNN) has fueled the need to support engineering of software that uses DNNs. Repairing software that uses DNNs is one such unmistakable SE need where automated tools could be very helpful; however, we do not fully understand challenges to repairing and patterns that are utilized when manually repairing them. What challenges should automated repair tools address? What are the repair patterns whose automation could help developers? Which repair patterns should be assigned a higher priority for automation? This work presents a comprehensive study of bug fix patterns to address these questions. We have studied 415 repairs from Stack Overflow and 555 repairs from GitHub for five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand challenges in repairs and bug repair patterns. Our key findings reveal that DNN bug fix patterns are distinctive compared to traditional bug fix patterns; the most common bug fix patterns are fixing data dimension and neural network connectivity; DNN bug fixes have the potential to introduce adversarial vulnerabilities; DNN bug fixes frequently introduce new bugs; and DNN bug localization, reuse of trained model, and coping with frequent releases are major challenges faced by developers when fixing bugs. We also contribute a benchmark of 667 DNN (bug, repair) instances.more » « less
-
Indirect function calls are widely used in building system software like OS kernels for their high flexibility and performance. Statically resolving indirect-call targets has been known to be a hard problem, which is a fundamental requirement for various program analysis and protection tasks. The state-of-the-art techniques, which use type analysis, are still imprecise. In this paper, we present a new approach, TFA, that precisely identifies indirect-call targets. The intuition behind TFA is that type-based analysis and data-flow analysis are inherently complementary in resolving indirect-call targets. TFA incorporates a co-analysis system that makes the best use of both type information and data-flow information. The co-analysis keeps refining the global call graph iteratively, allowing us to achieve an optimal indirect call analysis. We have implemented TFA in LLVM and evaluated it against five famous large-scale programs. The experimental results show that TFA eliminates additional 24% to 59% of indirect-call targets compared with the state-of-the-art approaches, without introducing new false negatives. With the precise indirect-call analysis, we further developed a strengthened fine-grained forward-edge control-flow integrity scheme and applied it to the Linux kernel. We have also used the refined indirect-call analysis results in bug detection, where we found 8 deep bugs in the Linux kernel. As a generic technique, the precise indirect-call analysis of TFA can also benefit other applications such as compiler optimization and software debloating.more » « less
An official website of the United States government

