skip to main content


Title: Harvesting Wasted Clock Cycles for Efficient Online Testing
Mission-critical systems often require some testing to occur while the system is running. In many cases, this involves taking parts of the system off-line temporarily to apply the tests. However, hazards that occur during regular processor execution require the addition of stall cycles to maintain program correctness. These stall cycles generally perform no other function. In this paper, we focus on testing the ALU during those stall cycles to identify new errors or defects that arise during program execution due to aging and increased temperature that may slow down the circuitry or cause permanent defects. We investigate the time to detection of a fault (both stuck-at and transition) that may have caused silent data corruption. In addition, we identify the relationship between the programs running and the list of functional faults and how this impacts the test set length. Finally, we discuss area and performance impacts for the physical implementation of the approach.  more » « less
Award ID(s):
1814928 1812777
NSF-PAR ID:
10473502
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
2023 IEEE European Test Symposium
Page Range / eLocation ID:
1 to 6
Format(s):
Medium: X
Location:
Venezia, Italy
Sponsoring Org:
National Science Foundation
More Like this
  1. Motivated by a wide range of applications, research on agent-based models of contagion propagation over networks has attracted a lot of attention in the literature. Many of the available software systems for simulating such agent-based models require users to download software, build the executable, and set up execution environments. Further, running the resulting executable may require access to high performance computing clusters. Our work describes an open access software system (NetSimS) that works under the “Modeling and Simulation as a Service” (MSaaS) paradigm. It enables users to run simulations by selecting models and parameter values, initial conditions, and networks through a web interface. The system supports a variety of models and networks with millions of nodes and edges. In addition to the simulator, the system includes components that enable users to choose initial conditions for simulations in a variety of ways, to analyze the data generated through simulations, and to produce plots from the data. We describe the components of NetSimS and carry out a performance evaluation of the system. We also discuss two case studies carried out on large networks using the system. NetSimS is a major component within net.science, a cyberinfrastructure for network science. 
    more » « less
  2. Motivated by a wide range of applications, research on agent-based models of contagion propagation over networks has attracted a lot of attention in the literature. Many of the available software systems for simulating such agent-based models require users to download software, build the executable and set up execution environments. Further, running the resulting executable may require access to high performance computing clusters. Our work describes an open access software system (NetSimS) that works under the “Modeling and Simulation as a Service” (MSaaS) paradigm. It allows users to run simulations by selecting agent-based models and parameters, initial conditions, and networks through a web interface. The system supports a variety of models and networks with millions of nodes and edges. In addition to the simulator, the system includes components that allow users to choose initial conditions for simulations in a variety of ways, to analyze the data generated through simulations, and to produce plots from the data. We describe the components of NetSimS and carry out a performance evaluation of the system. We also discuss two case studies carried out on large networks using the system. NetSimS is a major component within net.science, a cyberinfrastructure for network science. Index Terms—Agent-Based Simulation, Contagion, Networks, Modeling and Simulation as a Service, Cyberinfrastructure 
    more » « less
  3. Mutation testing is a powerful technique for evaluating the quality of test suite which plays a key role in ensuring software quality. The concept of mutation testing has also been widely used in other software engineering studies, e.g., test generation, fault localization, and program repair. During the process of mutation testing, large number of mutants may be generated and then executed against the test suite to examine whether they can be killed, making the process extremely computational expensive. Several techniques have been proposed to speed up this process, including selective, weakened, and predictive mutation testing. Among those techniques, Predictive Mutation Testing (PMT) tries to build a classification model based on an amount of mutant execution records to predict whether coming new mutants would be killed or alive without mutant execution, and can achieve significant mutation cost reduction. In PMT, each mutant is represented as a list of features related to the mutant itself and the test suite, transforming the mutation testing problem to a binary classification problem. In this paper, we perform an extensive study on the effectiveness and efficiency of the promising PMT technique under the cross-project setting using a total 654 real world projects with more than 4 Million mutants. Our work also complements the original PMT work by considering more features and the powerful deep learning models. The experimental results show an average of over 0.85 prediction accuracy on 654 projects using cross validation, demonstrating the effectiveness of PMT. Meanwhile, a clear speed up is also observed with an average of 28.7× compared to traditional mutation testing with 5 threads. In addition, we analyze the importance of different groups of features in classification model, which provides important implications for the future research. 
    more » « less
  4. Multithreaded programs can have deadlocks, even after deployment, so users may want to run deadlock tools on deployed programs. However, current deadlock predictors such as MagicLock and UnDead have large overheads that make them impractical for end-user deployment and confine their use to development time. Such overhead stems from running an exponential-time algorithm on a large execution trace. In this paper, we present the first low-overhead deadlock predictor, called AirLock, that is fit for both in-house testing and deployed programs. AirLock maintains a small predictive lock reachability graph, searches the graph for cycles, and runs an exponential-time algorithm only for each cycle. This approach lets AirLock find the same deadlocks as MagicLock and UnDead but with much less overhead because the number of cycles is small in practice. Our experiments with real-world benchmarks show that the average time overhead of AirLock is 3.5%, which is three orders of magnitude less than that of MagicLock and UnDead. AirLock's low overhead makes it suitable for use with fuzz testers like AFL and on-the-fly after deployment. 
    more » « less
  5. Typical cybersecurity solutions emphasize on achieving defense functionalities. However, execution efficiency and scalability are equally important, especially for real-world deployment. Straightforward mappings of cybersecurity applications onto HPC platforms may significantly underutilize the HPC devices’ capacities. On the other hand, the sophisticated implementations are quite difficult: they require both in-depth understandings of cybersecurity domain-specific characteristics and HPC architecture and system model. In our work, we investigate three sub-areas in cybersecurity, including mobile software security, network security, and system security. They have the following performance issues, respectively: 1) The flow- and context-sensitive static analysis for the large and complex Android APKs are incredibly time-consuming. Existing CPU-only frameworks/tools have to set a timeout threshold to cease the program analysis to trade the precision for performance. 2) Network intrusion detection systems (NIDS) use automata processing as its searching core and requires line-speed processing. However, achieving high-speed automata processing is exceptionally difficult in both algorithm and implementation aspects. 3) It is unclear how the cache configurations impact time-driven cache side-channel attacks’ performance. This question remains open because it is difficult to conduct comparative measurement to study the impacts. In this dissertation, we demonstrate how application-specific characteristics can be leveraged to optimize implementations on various types of HPC for faster and more scalable cybersecurity executions. For example, we present a new GPU-assisted framework and a collection of optimization strategies for fast Android static data-flow analysis that achieve up to 128X speedups against the plain GPU implementation. For network intrusion detection systems (IDS), we design and implement an algorithm capable of eliminating the state explosion in out-of-order packet situations, which reduces up to 400X of the memory overhead. We also present tools for improving the usability of Micron’s Automata Processor. To study the cache configurations’ impact on time-driven cache side-channel attacks’ performance, we design an approach to conducting comparative measurement. We propose a quantifiable success rate metric to measure the performance of time-driven cache attacks and utilize the GEM5 platform to emulate the configurable cache. 
    more » « less