skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 17, 2026

Title: Covariate Software Defect Discovery Models to Explicitly Characterize Changepoints
Large-scale software exhibits periods of increased defect discovery when blocks of less thoroughly tested code are introduced into an existing codebase. For example, the mission systems schedule of software intensive government acquisition programs includes multiple overlapping software blocks associated with various capabilities. Software reliability researchers have proposed changepoint models to characterize periods of increased defect discovery. However, these models attempt to identify the location of these changepoints after testing has been performed, which is counter-intuitive because conscious decisions such as testing new functionality drive software changepoints. Existing changepoint models are therefore difficult to employ in a predictive manner. To overcome this limitation, this paper proposes a covariate software defect discovery model capable of explaining changepoints in terms of common software testing activities and metrics such as software size estimation, code coverage, and defect density. The proposed and past changepoint models are compared with respect to their predictive accuracy and computational efficiency. Our results indicate that the proposed approach is more computationally efficient and enables accurate prediction of the time needed to achieve a desired defect discovery intensity or mean time to failure despite the occurrence of changepoints during software testing.  more » « less
Award ID(s):
1749635
PAR ID:
10657269
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
World Scientific Publishing
Date Published:
Journal Name:
International Journal of Reliability, Quality and Safety Engineering
ISSN:
0218-5393
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary Many non‐homogeneous Poisson process software reliability growth models (SRGM) are characterized by a single continuous curve. However, failures are driven by factors such as the testing strategy and environment, integration testing and resource allocation, which can introduce one or more changepoint into the fault detection process. Some researchers have proposed non‐homogeneous Poisson process SRGM, but only consider a common failure distribution before and after changepoints. This paper proposes a heterogeneous single changepoint framework for SRGM, which can exhibit different failure distributions before and after the changepoint. Combinations of two simple and distinct curves including an exponential and S‐shaped curve are employed to illustrate the concept. Ten data sets are used to compare these heterogeneous models against their homogeneous counterparts. Experimental results indicate that heterogeneous changepoint models achieve better goodness‐of‐fit measures on 60% and 80% of the data sets with respect to the Akaike information criterion and predictive sum of squares measures. 
    more » « less
  2. ABSTRACT Traditional software reliability growth models (SRGM) characterize defect discovery with the Non‐Homogeneous Poisson Process (NHPP) as a function of testing time or effort. More recently, covariate NHPP SRGM models have substantially improved tracking and prediction of the defect discovery process by explicitly incorporating discrete multivariate time series on the amount of each underlying testing activity performed in successive intervals. Both classes of NHPP models with and without covariates are parametric in nature, imposing assumptions on the defect discovery process, and, while neural networks have been applied to SRGM models without covariates, no such studies have been applied in the context of covariate SRGM models. Therefore, this paper assesses the effectiveness of neural networks in predicting the software defect discovery process, incorporating covariates. Three types of neural networks are considered, including (i) recurrent neural networks (RNNs), (ii) long short‐term memory (LSTM), and (iii) gated recurrent unit (GRU), which are then compared with covariate models to validate tracking and predictive accuracy. Our results suggest that GRU achieved better overall goodness‐of‐fit, such as approximately 3.22 and 1.10 times smaller predictive mean square error, and 5.33 and 1.22 times smaller predictive ratio risk in DS1G and DS2G data sets, respectively, compared to covariate models when of the data is used for training. Moreover, to provide an objective comparison, three different proportions for training data splits were employed to illustrate the advancements between the top‐performing covariate NHPP model and the neural network, in which GRU illustrated a better performance over most of the scenarios. Thus, the neural network model with gated recurrent units may be a suitable alternative to track and predict the number of defects based on covariates associated with the software testing process. 
    more » « less
  3. Abstract We propose the multiple changepoint isolation (MCI) method for detecting multiple changes in the mean and covariance of a functional process. We first introduce a pair of projections to represent the variability “between” and “within” the functional observations. We then present an augmented fused lasso procedure to split the projections into multiple regions robustly. These regions act to isolate each changepoint away from the others so that the powerful univariate CUSUM statistic can be applied region‐wise to identify the changepoints. Simulations show that our method accurately detects the number and locations of changepoints under many different scenarios. These include light and heavy tailed data, data with symmetric and skewed distributions, sparsely and densely sampled changepoints, and mean and covariance changes. We show that our method outperforms a recent multiple functional changepoint detector and several univariate changepoint detectors applied to our proposed projections. We also show that MCI is more robust than existing approaches and scales linearly with sample size. Finally, we demonstrate our method on a large time series of water vapor mixing ratio profiles from atmospheric emitted radiance interferometer measurements. 
    more » « less
  4. Abstract Climate changepoint (homogenization) methods abound today, with a myriad of techniques existing in both the climate and statistics literature. Unfortunately, the appropriate changepoint technique to use remains unclear to many. Further complicating issues, changepoint conclusions are not robust to perturbations in assumptions; for example, allowing for a trend or correlation in the series can drastically change changepoint conclusions. This paper is a review of the topic, with an emphasis on illuminating the models and techniques that allow the scientist to make reliable conclusions. Pitfalls to avoid are demonstrated via actual applications. The discourse begins by narrating the salient statistical features of most climate time series. Thereafter, single- and multiple-changepoint problems are considered. Several pitfalls are discussed en route and good practices are recommended. While most of our applications involve temperatures, a sea ice series is also considered. Significance StatementThis paper reviews the methods used to identify and analyze the changepoints in climate data, with a focus on helping scientists make reliable conclusions. The paper discusses common mistakes and pitfalls to avoid in changepoint analysis and provides recommendations for best practices. The paper also provides examples of how these methods have been applied to temperature and sea ice data. The main goal of the paper is to provide guidance on how to effectively identify the changepoints in climate time series and homogenize the series. 
    more » « less
  5. null (Ed.)
    Traditional software reliability growth models only consider defect discovery data, yet the practical concern of software engineers is the removal of these defects. Most attempts to model the relationship between defect discovery and resolution have been restricted to differential equation-based models associated with these two activities. However, defect tracking databases offer a practical source of information on the defect lifecycle suitable for more complete reliability and performance models. This paper explicitly connects software reliability growth models to software defect tracking. Data from a NASA project has been employed to develop differential equation-based models of defect discovery and resolution as well as distributional and Markovian models of defect resolution. The states of the Markov model represent thirteen unique stages of the NASA software defect lifecycle. Both state transition probabilities and transition time distributions are computed from the defect database. Illustrations compare the predictive and computational performance of alternative approaches. The results suggest that the simple distributional approach achieves the best tradeoff between these two performance measures, but that enhanced data collection practices could improve the utility of the more advanced approaches and the inferences they enable. 
    more » « less