skip to main content


Title: How does combinatorial testing perform in the real world: an empirical study
Studies have shown that combinatorial testing (CT) can be effective for detecting faults in software systems. By focusing on the interactions between different factors of a system, CT shows its potential for detecting faults, especially those that can be revealed only by the specific combinations of values of multiple factors (multi-factor faults). However, is CT practical enough to be applied in the industry? Can it be more effective than other industry-favored techniques? Are there any challenges when applying CT in practice? These research questions remain in the context of industrial settings. In this paper, we present an empirical study of CT on five industrial systems with real faults. The details of the input space model (ISM) construction, such as factor identification and value assignment, are included. We compared the faults detected by CT with those detected by the inhouse testing teams using other methods, and the results suggest that despite some challenges, CT is an effective technique to detect real faults, especially multi-factor faults, of software systems in industrial settings. Observations and lessons learned are provided to further improve the fault detection effectiveness and overcome various challenges.  more » « less
Award ID(s):
1822137
NSF-PAR ID:
10194693
Author(s) / Creator(s):
Date Published:
Journal Name:
Empirical software engineering
Volume:
25
ISSN:
1382-3256
Page Range / eLocation ID:
2661-2693
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Pressure swing adsorption (PSA) is a widely used technology to separate a gas product from impurities in a variety of fields. Due to the complexity of PSA operations, process and instrument faults can occur at different parts and/or steps of the process. Thus, effective process monitoring is critical for ensuring efficient and safe operations of PSA systems. However, multi-bed PSA processes present several major challenges to process monitoring. First, a PSA process is operated in a periodic or cyclic fashion and never reaches a steady state; Second, the duration of different operation cycles is dynamically controlled in response to various disturbances, which results in a wide range of normal operation trajectories. Third, there is limited data for process monitoring, and bed pressure is usually the only measured variable for process monitoring. These key characteristics of the PSA operation make process monitoring, especially early fault detection, significantly more challenging than that for a continuous process operated at a steady state. To address these challenges, we propose a feature-based statistical process monitoring (SPM) framework for PSA processes, namely feature space monitoring (FSM). Through feature engineering and feature selection, we show that FSM can naturally handle the key challenges in PSA process monitoring and achieve early detection of subtle faults from a wide range of normal operating conditions. The performance of FSM is compared to the conventional SPM methods using both simulated and real faults from an industrial PSA process. The results demonstrate FSM’s superior performance in fault detection and fault diagnosis compared to the traditional SPM methods. In particular, the robust monitoring performance from FSM is achieved without any data preprocessing, trajectory alignment or synchronization required by the conventional SPM methods. 
    more » « less
  2. Context: Addressing women's under-representation in the soft-ware industry, a widely recognized concern, requires attracting as well as retaining more women. Hearing from women practitioners, particularly those positioned in multi-cultural settings, about their challenges and and adopting their lived experienced solutions can support the design of programs to resolve the under-representation issue. Goal: We investigated the challenges women face in global software development teams, particularly what motivates women to leave their company; how those challenges might break down according to demographics; and strategies to mitigate the identified challenges. Method: To achieve this goal, we conducted an ex-ploratory case study in Ericsson, a global technology company. We surveyed 94 women and employed mixed-methods to analyze the data. Results: Our findings reveal that women face socio-cultural challenges, including work-life balance issues, benevolent and hos-tile sexism, lack of recognition and peer parity, impostor syndrome, glass ceiling bias effects, the prove-it-again phenomenon, and the maternal wall. The participants of our research provided different suggestions to address/mitigate the reported challenges, including sabbatical policies, flexibility of location and time, parenthood support, soft skills training for managers, equality of payment and opportunities between genders, mentoring and role models to sup-port career growth, directives to hire more women, inclusive groups and events, women's empowerment, and recognition for women's success. The framework of challenges and suggestions can inspire further initiatives both in academia and industry to onboard and retain women. Women represent less than 24% of employees in software development industry and experience various types of prejudice and bias. Even in companies that care about Diversity & Inclusion, “untying the mooring ropes” of socio-cultural problems is hard. Hearing from women, especially those working in a multi-cultural organization, about their challenges and adopting their suggestions can be vital to design programs and resolve the under-representation issue. In this work we work closely with a large software development or-ganization which invests and believes in diversity and inclusion. We listened to women and the challenges they face in global soft-ware development teams of this company and what these women suggest reduce the problems and increase retention. Our research showed that women face work-life balance issues and encounter invisible barriers that prevent them from rising to top positions. They also suffer micro-aggression and sexism, need to show com-petence constantly, be supervised in essential tasks, and receive less work after becoming mothers. Moreover, women miss having more female colleagues, lack self-confidence and recognition. The women from the company suggested sabbatical policies, the flexibil-ity of location and time, parenthood support, soft skills training for managers, equality of opportunities, role models to support career growth, directives to hire more women, support groups, and more interaction between women, inclusive groups and events, women's empowerment by publishing their success stories in media and recognizing their achievements. Our results had been shared with the company Human Resources department and management and they considered the diagnosis helpful and will work on actions to mitigate the challenges that women still perceive. 
    more » « less
  3. A number of criteria have been proposed to judge test suite adequacy. While search-based test generation has improved greatly at criteria coverage, the produced suites are still often ineffective at detecting faults. Efficacy may be limited by the single-minded application of one criterion at a time when generating suites - a sharp contrast to human testers, who simultaneously explore multiple testing strategies. We hypothesize that automated generation can be improved by selecting and simultaneously exploring multiple criteria. To address this hypothesis, we have generated multi-criteria test suites, measuring efficacy against the Defects4J fault database. We have found that multi-criteria suites can be up to 31.15% more effective at detecting complex, real-world faults than suites generated to satisfy a single criterion and 70.17% more effective than the default combination of all eight criteria. Given a fixed search budget, we recommend pairing a criterion focused on structural exploration - such as Branch Coverage - with targeted supplemental strategies aimed at the type of faults expected from the system under test. Our findings offer lessons to consider when selecting such combinations. 
    more » « less
  4. Abstract Particle filters avoid parametric estimates for Bayesian posterior densities, which alleviates Gaussian assumptions in nonlinear regimes. These methods, however, are more sensitive to sampling errors than Gaussian-based techniques such as ensemble Kalman filters. A recent study by the authors introduced an iterative strategy for particle filters that match posterior moments—where iterations improve the filter’s ability to draw samples from non-Gaussian posterior densities. The iterations follow from a factorization of particle weights, providing a natural framework for combining particle filters with alternative filters to mitigate the impact of sampling errors. The current study introduces a novel approach to forming an adaptive hybrid data assimilation methodology, exploiting the theoretical strengths of nonparametric and parametric filters. At each data assimilation cycle, the iterative particle filter performs a sequence of updates while the prior sample distribution is non-Gaussian, then an ensemble Kalman filter provides the final adjustment when Gaussian distributions for marginal quantities are detected. The method employs the Shapiro–Wilk test to determine when to make the transition between filter algorithms, which has outstanding power for detecting departures from normality. Experiments using low-dimensional models demonstrate that the approach has a significant value, especially for nonhomogeneous observation networks and unknown model process errors. Moreover, hybrid factors are extended to consider marginals of more than one collocated variables using a test for multivariate normality. Findings from this study motivate the use of the proposed method for geophysical problems characterized by diverse observation networks and various dynamic instabilities, such as numerical weather prediction models. Significance Statement Data assimilation statistically processes observation errors and model forecast errors to provide optimal initial conditions for the forecast, playing a critical role in numerical weather forecasting. The ensemble Kalman filter, which has been widely adopted and developed in many operational centers, assumes Gaussianity of the prior distribution and solves a linear system of equations, leading to bias in strong nonlinear regimes. On the other hand, particle filters avoid many of those assumptions but are sensitive to sampling errors and are computationally expensive. We propose an adaptive hybrid strategy that combines their advantages and minimizes the disadvantages of the two methods. The hybrid particle filter–ensemble Kalman filter is achieved with the Shapiro–Wilk test to detect the Gaussianity of the ensemble members and determine the timing of the transition between these filter updates. Demonstrations in this study show that the proposed method is advantageous when observations are heterogeneous and when the model has an unknown bias. Furthermore, by extending the statistical hypothesis test to the test for multivariate normality, we consider marginals of more than one collocated variable. These results encourage further testing for real geophysical problems characterized by various dynamic instabilities, such as real numerical weather prediction models. 
    more » « less
  5. In the oil and gas industry, exploration is largely dependent on the study of the subsurface hundreds or thousands of feet below. Most of the data used for this purpose is collected using borehole logging tools. Although sophisticated, these tools are limited as to how precisely they can measure the subsurface in terms of vertical resolution. There is one method of studying the subsurface that provides unlimited vertical resolution – core samples. Although core samples provide scientists the opportunity to generate a full, continuous data set, lab analysis work is normally done at one-foot intervals, as anything more would be prohibitively expensive. This means at best, a representative data set is generated. However, if the subsurface is not homogeneous, it is difficult to generate a representative data set with lab analysis done at one-foot intervals. This is a void that artificial intelligence can fill. More specifically, a properly trained neural network can analyze high-resolution core images continuously from top to bottom and generate a continuous analysis. It is also important to note that geologic interpretation tied to core analysis can introduce human error and subjectivity. Here too, a properly trained neural network can generate results with extreme levels of accuracy and precision. One core analysis expert believes that core analysis done manually is flawed about 70% of the time. This flawed analysis can result from lack of experience and or a lack of knowledge of the geologic formation. We are not the first to attempt to analyze core samples with vision algorithms. A group of Stanford researchers used micro-computed tomography (micro-CT) and Scanning Electron Microscopy (SEM) images of core samples to characterize the porous media. While promising, SEM and micro-CT imaging is expensive, and more importantly it is not a standard practice in the oil and gas industry to collect these types of images, making these images rare. One other work applied convolutional neural networks to a GIS based regional saturation system, but our work is significantly different. It is well known that training a neural network requires abundant data, thankfully with the method of core analysis we are proposing that will not be a problem. Through industrial partnerships we’ve obtained hundreds to thousands of core images sufficient to train a neural network, as well as core interpretations tied to those images coming from a core analysis expert with over 40 years of experience. We are the first to propose automatic hydrocarbon saturation as well as lithology prediction from core slab images. We propose the use of convolutional neural networks to analyze core samples at a single site. We plan to conduct experiments using a variety of neural networks to determine the best practices, and explore how such a service can be offered to the industry via the software-as-a-service paradigm. In the past, automated analysis through core slab images has not been possible simply because images of the required resolution were not common, but that has changed. If implemented successfully, this proposed method could become the new standard for core evaluation. 
    more » « less