Defect prediction aims to automatically identify potential defective code with minimal human intervention and has been widely studied in the literature. Just-in-Time (JIT) defect prediction focuses on program changes rather than whole programs, and has been widely adopted in continuous testing. CC2Vec, state-of-the-art JIT defect prediction tool, first constructs a hierarchical attention network (HAN) to learn distributed vector representations of both code additions and deletions, and then concatenates them with two other embedding vectors representing commit messages and overall code changes extracted by the existing DeepJIT approach to train a model for predicting whether a given commit is defective. Although CC2Vec has been shown to be the state of the art for JIT defect prediction, it was only evaluated on a limited dataset and not compared with all representative baselines. Therefore, to further investigate the efficacy and limitations of CC2Vec, this paper performs an extensive study of CC2Vec on a large-scale dataset with over 310,370 changes (8.3 X larger than the original CC2Vec dataset). More specifically, we also empirically compare CC2Vec against DeepJIT and representative traditional JIT defect prediction techniques. The experimental results show that CC2Vec cannot consistently outperform DeepJIT, and neither of them can consistently outperform traditional JIT defect prediction. We also investigate the impact of individual traditional defect prediction features and find that the added-line-number feature outperforms other traditional features. Inspired by this finding, we construct a simplistic JIT defect prediction approach which simply adopts the added-line- number feature with the logistic regression classifier. Surprisingly, such a simplistic approach can outperform CC2Vec and DeepJIT in defect prediction, and can be 81k X/120k X faster in training/testing. Furthermore, the paper also provides various practical guidelines for advancing JIT defect prediction in the near future.
more »
« less
ConPredictor: Concurrency Defect Prediction in Real-World Applications
Concurrent programs are difficult to test due to their inherent non-determinism. To address this problem, testing often requires the exploration of thread schedules of a program; this can be time-consuming when applied to real-world programs. Software defect prediction has been used to help developers find faults and prioritize their testing efforts. Prior studies have used machine learning to build such predicting models based on designed features that encode the characteristics of programs. However, research has focused on sequential programs; to date, no work has considered defect prediction for concurrent programs, with program characteristics distinguished from sequential programs. In this paper, we present ConPredictor, an approach to predict defects specific to concurrent programs by combining both static and dynamic program metrics. Specifically, we propose a set of novel static code metrics based on the unique properties of concurrent programs. We also leverage additional guidance from dynamic metrics constructed based on mutation analysis. Our evaluation on four large open source projects shows that ConPredictor improved both within-project defect prediction and cross-project defect prediction compared to traditional features.
more »
« less
- Award ID(s):
- 1511117
- PAR ID:
- 10056328
- Date Published:
- Journal Name:
- IEEE Transactions on Software Engineering
- ISSN:
- 0098-5589
- Page Range / eLocation ID:
- 1 to 1
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Computational modeling of the human sequential design process and successful prediction of future design decisions are fundamental to design knowledge extraction, transfer, and the development of artificial design agents. However, it is often difficult to obtain designer-related attributes (static data) in design practices, and the research based on combining static and dynamic data (design action sequences) in engineering design is still underexplored. This paper presents an approach that combines both static and dynamic data for human design decision prediction using two different methods. The first method directly combines the sequential design actions with static data in a recurrent neural network (RNN) model, while the second method integrates a feed-forward neural network that handles static data separately, yet in parallel with RNN. This study contributes to the field from three aspects: (a) we developed a method of utilizing designers’ cluster information as a surrogate static feature to combine with a design action sequence in order to tackle the challenge of obtaining designer-related attributes; (b) we devised a method that integrates the function–behavior–structure design process model with the one-hot vectorization in RNN to transform design action data to design process stages where the insights into design thinking can be drawn; (c) to the best of our knowledge, it is the first time that two methods of combining static and dynamic data in RNN are compared, which provides new knowledge about the utility of different combination methods in studying sequential design decisions. The approach is demonstrated in two case studies on solar energy system design. The results indicate that with appropriate kernel models, the RNN with both static and dynamic data outperforms traditional models that only rely on design action sequences, thereby better supporting design research where static features, such as human characteristics, often play an important role.more » « less
-
Large-scale software exhibits periods of increased defect discovery when blocks of less thoroughly tested code are introduced into an existing codebase. For example, the mission systems schedule of software intensive government acquisition programs includes multiple overlapping software blocks associated with various capabilities. Software reliability researchers have proposed changepoint models to characterize periods of increased defect discovery. However, these models attempt to identify the location of these changepoints after testing has been performed, which is counter-intuitive because conscious decisions such as testing new functionality drive software changepoints. Existing changepoint models are therefore difficult to employ in a predictive manner. To overcome this limitation, this paper proposes a covariate software defect discovery model capable of explaining changepoints in terms of common software testing activities and metrics such as software size estimation, code coverage, and defect density. The proposed and past changepoint models are compared with respect to their predictive accuracy and computational efficiency. Our results indicate that the proposed approach is more computationally efficient and enables accurate prediction of the time needed to achieve a desired defect discovery intensity or mean time to failure despite the occurrence of changepoints during software testing.more » « less
-
Gradual typing allows programmers to use both static and dynamic typing in a single program. However, a well-known problem with sound gradual typing is that the interactions between static and dynamic code can cause significant performance degradation. These performance pitfalls are hard to predict and resolve, and discourage users from using gradual typing features. For example, when migrating to a more statically typed program, often adding a type annotation will trigger a slowdown that can be resolved by adding more annotations elsewhere, but since it is not clear where the additional annotations must be added, the easier solution is to simply remove the annotation. To address these problems, we develop: (1) a static cost semantics that accurately predicts the overhead of static-dynamic interactions in a gradually typed program, (2) a technique for efficiently inferring such costs for all combinations of inferrable type assignments in a program, and (3) a method for translating the results of this analysis into specific recommendations and explanations that can help programmers understand, debug, and optimize the performance of gradually typed programs. We have implemented our approach in Herder, a tool for statically analyzing the performance of different typing configurations for Reticulated Python programs. An evaluation on 15 Python programs shows that Herder can use this analysis to accurately and efficiently recommend type assignments that optimize the performance of these programs without sacrificing the safety guarantees provided by static typing.more » « less
-
Existing techniques for automating the testing of sequential programming assignments are fundamentally at odds with concurrent programming as they are oblivious to the algorithm used to implement the assignments. We have developed a framework that addresses this limitation for those object-based concurrent assignments whose user-interface (a) is implemented using the observer pattern and (b) makes apparent whether concurrency requirements are met. It has two components. The first component reduces the number of steps a human grader needs to take to interact with and score the user-interfaces of the submitted programs. The second component completely automates assessment by observing the events sent by the student-implemented observable objects. Both components are used to score the final submission and log interaction. The second component is also used to provide feedback during assignment implementation. Our experience shows that the framework is used extensively by students, leads to more partial credit, reduces grading time, and gives statistics about incremental student progressmore » « less
An official website of the United States government

