ABSTRACT Traditional software reliability growth models (SRGM) characterize defect discovery with the Non‐Homogeneous Poisson Process (NHPP) as a function of testing time or effort. More recently, covariate NHPP SRGM models have substantially improved tracking and prediction of the defect discovery process by explicitly incorporating discrete multivariate time series on the amount of each underlying testing activity performed in successive intervals. Both classes of NHPP models with and without covariates are parametric in nature, imposing assumptions on the defect discovery process, and, while neural networks have been applied to SRGM models without covariates, no such studies have been applied in the context of covariate SRGM models. Therefore, this paper assesses the effectiveness of neural networks in predicting the software defect discovery process, incorporating covariates. Three types of neural networks are considered, including (i) recurrent neural networks (RNNs), (ii) long short‐term memory (LSTM), and (iii) gated recurrent unit (GRU), which are then compared with covariate models to validate tracking and predictive accuracy. Our results suggest that GRU achieved better overall goodness‐of‐fit, such as approximately 3.22 and 1.10 times smaller predictive mean square error, and 5.33 and 1.22 times smaller predictive ratio risk in DS1G and DS2G data sets, respectively, compared to covariate models when of the data is used for training. Moreover, to provide an objective comparison, three different proportions for training data splits were employed to illustrate the advancements between the top‐performing covariate NHPP model and the neural network, in which GRU illustrated a better performance over most of the scenarios. Thus, the neural network model with gated recurrent units may be a suitable alternative to track and predict the number of defects based on covariates associated with the software testing process.
more »
« less
Connecting Software Reliability Growth Models to Software Defect Tracking
Traditional software reliability growth models only consider defect discovery data, yet the practical concern of software engineers is the removal of these defects. Most attempts to model the relationship between defect discovery and resolution have been restricted to differential equation-based models associated with these two activities. However, defect tracking databases offer a practical source of information on the defect lifecycle suitable for more complete reliability and performance models. This paper explicitly connects software reliability growth models to software defect tracking. Data from a NASA project has been employed to develop differential equation-based models of defect discovery and resolution as well as distributional and Markovian models of defect resolution. The states of the Markov model represent thirteen unique stages of the NASA software defect lifecycle. Both state transition probabilities and transition time distributions are computed from the defect database. Illustrations compare the predictive and computational performance of alternative approaches. The results suggest that the simple distributional approach achieves the best tradeoff between these two performance measures, but that enhanced data collection practices could improve the utility of the more advanced approaches and the inferences they enable.
more »
« less
- Award ID(s):
- 1749635
- PAR ID:
- 10221046
- Date Published:
- Journal Name:
- 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE)
- Page Range / eLocation ID:
- 138 to 147
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Recent research applies soft computing techniques to fit software reliability growth models. However, runtime performance and the distribution of the distance from an optimal solution over multiple runs must be explicitly considered to justify the practical utility of these approaches, promote comparison, and support reproducible research. This paper presents a meta-optimization framework to design stable and efficient multi-phase algorithms for fitting software reliability growth models. The approach combines initial parameter estimation techniques from statistical algorithms, the global search properties of soft computing, and the rapid convergence of numerical methods. Designs that exhibit the best balance between runtime performance and accuracy are identified. The approach is illustrated through nonhomogeneous Poisson process and covariate software reliability growth models, including a cross-validation step on data sets not used to identify designs. The results indicate the nonhomogeneous Poisson process model considered is too simple to benefit from soft computing because it incurs additional runtime with no increase in accuracy attained. However, a multi-phase design for the covariate software reliability growth model consisting of the bat algorithm followed by a numerical method achieves better performance and converges consistently, compared to a numerical method only. The proposed approach supports higher dimensional covariate software reliability growth model fitting suitable for implementation in a tool.more » « less
-
Researchers have proposed several software reliability growth models, many of which possess complex parametric forms. In practice, software reliability growth models should exhibit a balance between predictive accuracy and other statistical measures of goodness of fit, yet past studies have not always performed such balanced assessment. This paper proposes a framework for software reliability growth models possessing a bathtub-shaped fault detection rate and derives stable and efficient expectation conditional maximization algorithms to enable the fitting of these models. The stages of the bathtub are interpreted in the context of the software testing process. The illustrations compare multiple bathtub-shaped and reduced model forms, including classical models with respect to predictive and information theoretic measures. The results indicate that software reliability growth models possessing a bathtub-shaped fault detection rate outperformed classical models on both types of measures. The proposed framework and models may therefore be a practical compromise between model complexity and predictive accuracy.more » « less
-
Alber, Mark (Ed.)Biological systems exhibit complex dynamics that differential equations can often adeptly represent. Ordinary differential equation models are widespread; until recently their construction has required extensive prior knowledge of the system. Machine learning methods offer alternative means of model construction: differential equation models can be learnt from data via model discovery using sparse identification of nonlinear dynamics (SINDy). However, SINDy struggles with realistic levels of biological noise and is limited in its ability to incorporate prior knowledge of the system. We propose a data-driven framework for model discovery and model selection using hybrid dynamical systems: partial models containing missing terms. Neural networks are used to approximate the unknown dynamics of a system, enabling the denoising of the data while simultaneously learning the latent dynamics. Simulations from the fitted neural network are then used to infer models using sparse regression. We show, via model selection, that model discovery using hybrid dynamical systems outperforms alternative approaches. We find it possible to infer models correctly up to high levels of biological noise of different types. We demonstrate the potential to learn models from sparse, noisy data in application to a canonical cell state transition using data derived from single-cell transcriptomics. Overall, this approach provides a practical framework for model discovery in biology in cases where data are noisy and sparse, of particular utility when the underlying biological mechanisms are partially but incompletely known.more » « less
-
null (Ed.)With the increased interest to incorporate machine learning into software and systems, methods to characterize the impact of the reliability of machine learning are needed to ensure the reliability of the software and systems in which these algorithms reside. Towards this end, we build upon the architecture-based approach to software reliability modeling, which represents application reliability in terms of the component reliabilities and the probabilistic transitions between the components. Traditional architecture-based software reliability models consider all components to be deterministic software. We therefore extend this modeling approach to the case, where some components represent learning enabled components. Here, the reliability of a machine learning component is interpreted as the accuracy of its decisions, which is a common measure of classification algorithms. Moreover, we allow these machine learning components to be fault-tolerant in the sense that multiple diverse classifier algorithms are trained to guide decisions and the majority decision taken. We demonstrate the utility of the approach to assess the impact of machine learning on software reliability as well as illustrate the concept of reliability growth in machine learning. Finally, we validate past analytical results for a fault tolerant system composed of correlated components with real machine learning algorithms and data, demonstrating the analytical expression’s ability to accurately estimate the reliability of the fault tolerant machine learning component and subsequently the architecture-based software within which it resides.more » « less
An official website of the United States government

