Significant interest in applying Deep Neural Network (DNN) has fueled the need to support engineering of software that uses DNNs. Repairing software that uses DNNs is one such unmistakable SE need where automated tools could be very helpful; however, we do not fully understand challenges to repairing and patterns that are utilized when manually repairing them. What challenges should automated repair tools address? What are the repair patterns whose automation could help developers? Which repair patterns should be assigned a higher priority for automation? This work presents a comprehensive study of bug fix patterns to address these questions. We have studied 415 repairs from Stack Overflow and 555 repairs from GitHub for five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand challenges in repairs and bug repair patterns. Our key findings reveal that DNN bug fix patterns are distinctive compared to traditional bug fix patterns; the most common bug fix patterns are fixing data dimension and neural network connectivity; DNN bug fixes have the potential to introduce adversarial vulnerabilities; DNN bug fixes frequently introduce new bugs; and DNN bug localization, reuse of trained model, and coping with frequent releases are major challenges faced by developers when fixing bugs. We also contribute a benchmark of 667 DNN (bug, repair) instances.
more »
« less
Design by Contract for Deep Learning APIs
Deep Learning (DL) techniques are increasingly being incorporated in critical software systems today. DL software is buggy too. Recent work in SE has characterized these bugs, studied fix patterns, and proposed detection and localization strategies. In this work, we introduce a preventative measure. We propose design by contract for DL libraries, DL Contract for short, to document the properties of DL libraries and provide developers with a mechanism to identify bugs during development. While DL Contract builds on the traditional design by contract techniques, we need to address unique challenges. In particular, we need to document properties of the training process that are not visible at the functional interface of the DL libraries. To solve these problems, we have introduced mechanisms that allow developers to specify properties of the model architecture, data, and training process. We have designed and implemented DL Contract for Python-based DL libraries and used it to document the properties of Keras, a well-known DL library. We evaluate DL Contract in terms of effectiveness, runtime overhead, and usability. To evaluate the utility of DL Contract, we have developed 15 sample contracts specifically for training problems and structural bugs. We have adopted four well-vetted benchmarks from prior works on DL bug detection and repair. For the effectiveness, DL Contract correctly detects 259 bugs in 272 real-world buggy programs, from well-vetted benchmarks provided in prior work on DL bug detection and repair. We found that the DL Contract overhead is fairly minimal for the used benchmarks. Lastly, to evaluate the usability, we conducted a survey of twenty participants who have used DL Contract to find and fix bugs. The results reveal that DL Contract can be very helpful to DL application developers when debugging their code.
more »
« less
- PAR ID:
- 10540742
- Publisher / Repository:
- Association for Computing Machinery
- Date Published:
- ISBN:
- 9798400703270
- Subject(s) / Keyword(s):
- API contracts, Deep learning, specification language
- Format(s):
- Medium: X Size: 1.4MB Other: .pdf
- Size(s):
- 1.4MB
- Location:
- ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, CA, USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Deep learning has gained substantial popularity in recent years. Developers mainly rely on libraries and tools to add deep learning capabilities to their software. What kinds of bugs are frequently found in such software? What are the root causes of such bugs? What impacts do such bugs have? Which stages of deep learning pipeline are more bug prone? Are there any antipatterns? Understanding such characteristics of bugs in deep learning software has the potential to foster the development of better deep learning platforms, debugging mechanisms, development practices, and encourage the development of analysis and verification frameworks. Therefore, we study 2716 high-quality posts from Stack Overflow and 500 bug fix commits from Github about five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand the types of bugs, root causes of bugs, impacts of bugs, bug-prone stage of deep learning pipeline as well as whether there are some common antipatterns found in this buggy software. The key findings of our study include: data bug and logic bug are the most severe bug types in deep learning software appearing more than 48% of the times, major root causes of these bugs are Incorrect Model Parameter (IPS) and Structural Inefficiency (SI) showing up more than 43% of the times.We have also found that the bugs in the usage of deep learning libraries have some common antipatterns.more » « less
-
Gonnord, Laure ; Titolo, Laura (Ed.)Just-in-Time (JIT) compilers are widely used to improve the performance of interpreter-based language implementations by creating optimized code at runtime. However, bugs in the JIT compiler’s code manipulation and optimization can result in the generation of incorrect code. Such bugs can be difficult to diagnose and fix, and can result in exploitable vulnerabilities. Unfortunately, existing approaches to automatic bug localization do not carry over well to such bugs. This paper discusses a different approach to analyzing JIT compiler optimization behaviors, based on using dynamic analysis to construct abstract models of the JIT compiler’s optimizer and back end. By comparing the models obtained for buggy and non-buggy executions of the JIT compiler, we can pinpoint the components of the JIT compiler’s internal representation that have been affected by the bug; this can then be mapped back to identify the buggy code. Our ex- periments with two real bugs for Google V8 JIT compiler, TurboFan, show the utility and practicality of our approach.more » « less
-
Enterprise software updates depend on the interaction between user and developer organizations. This interaction becomes especially complex when a single developer organization writes software that services hundreds of different user organizations. Miscommunication during patching and deployment efforts lead to insecure or malfunctioning software installations. While developers oversee the code, the update process starts and ends outside their control. Since developer test suites may fail to capture buggy behavior finding and fixing these bugs starts with user generated bug reports and 3rd party disclosures. The process ends when the fixed code is deployed in production. Any friction between user, and developer results in a delay patching critical bugs. Two common causes for friction are a failure to replicate user specific circumstances that cause buggy behavior and incompatible software releases that break critical functionality. Existing test generation techniques are insufficient. They fail to test candidate patches for post-deployment bugs and to test whether the new release adversely effects customer workloads. With existing test generation and deployment techniques, users can't choose (nor validate) compatible portions of new versions and retain their previous version's functionality. We present two new technologies to alleviate this friction. First, Test Generation for Ad Hoc Circumstances transforms buggy executions into test cases. Second, Binary Patch Decomposition allows users to select the compatible pieces of update releases. By sharing specific context around buggy behavior and developers can create specific test cases that demonstrate if their fixes are appropriate. When fixes are distributed by including extra context users can incorporate only updates that guarantee compatibility between buggy and fixed versions. We use change analysis in combination with binary rewriting to transform the old executable and buggy execution into a test case including the developer's prospective changes that let us generate and run targeted tests for the candidate patch. We also provide analogous support to users, to selectively validate and patch their production environments with only the desired bug-fixes from new version releases. This paper presents a new patching workflow that allows developers to validate prospective patches and users to select which updates they would like to apply, along with two new technologies that make it possible. We demonstrate our technique constructs tests cases more effectively and more efficiently than traditional test case generation on a collection of real world bugs compared to traditional test generation techniques, and provides the ability for flexible updates in real world scenarios.more » « less
-
Static analysis tools have demonstrated effectiveness at finding bugs in real world code. Such tools are increasingly widely adopted to improve software quality in practice. Automated Program Repair (APR) has the potential to further cut down on the cost of improving software quality. However, there is a disconnect between these effective bug-finding tools and APR. Recent advances in APR rely on test cases, making them inapplicable to newly discovered bugs or bugs difficult to test for deterministically (like memory leaks). Additionally, the quality of patches generated to satisfy a test suite is a key challenge. We address these challenges by adapting advances in practical static analysis and verification techniques to enable a new technique that finds and then accurately fixes real bugs without test cases. We present a new automated program repair technique using Separation Logic. At a high-level, our technique reasons over semantic effects of existing program fragments to fix faults related to general pointer safety properties: resource leaks, memory leaks, and null dereferences. The procedure automatically translates identified fragments into source-level patches, and verifies patch correctness with respect to reported faults. In this work we conduct the largest study of automatically fixing undiscovered bugs in real-world code to date. We demonstrate our approach by correctly fixing 55 bugs, including 11 previously undiscovered bugs, in 11 real-world projects.more » « less