A key strength of managed runtimes over hardware is the ability to gain detailed insight into the dynamic execution of programs with instrumentation. Analyses such as code coverage, execution frequency, tracing, and debugging, are all made easier in a virtual setting. As a portable, low-level byte-code, WebAssembly offers inexpensive in-process sandboxing with high performance. Yet to date, Wasm engines have not offered much insight into executing programs, supporting at best bytecode-level stepping and basic source maps, but no instrumentation capabilities. In this paper, we show the first non-intrusive dynamic instrumentation system for WebAssembly in the open-source Wizard Research Engine. Our innovative design offers a flexible, complete hierarchy of instrumentation primitives that support building high-level, complex analyses in terms of low-level, programmable probes. In contrast to emulation or machine code instrumentation, injecting probes at the bytecode level increases expressiveness and vastly simplifies the implementation by reusing the engine's JIT compiler, interpreter, and deoptimization mechanism rather than building new ones. Wizard supports both dynamic instrumentation insertion and removal while providing consistency guarantees, which is key to composing multiple analyses without interference. We detail a fully-featured implementation in a high-performance multi-tier Wasm engine, show novel optimizations specifically designed to minimize instrumentation overhead, and evaluate performance characteristics under load from various analyses. This design is well-suited for production engine adoption as probes can be implemented to have no impact on production performance when not in use.
more »
« less
Unveiling Heisenbugs with Diversified Execution
Heisenbugs, notorious for their ability to change behavior and elude reproducibility under observation, are among the toughest challenges in debugging programs. They often evade static detection tools, making them especially prevalent in cyber-physical edge systems characterized by complex dynamics and unpredictable interactions with physical environments. Although dynamic detection tools work much better, most still struggle to meet low enough jitter and overhead performance requirements, impeding their adoption. More importantly however, dynamic tools currently lack metrics to determine an observed bug's difficulty or heisen-ness undermining their ability to make any claims regarding their effectiveness against heisenbugs. This paper proposes a methodology for detecting and identifying heisenbugs with low overheads at scale, actualized through the lens of dynamic data-race detection. In particular, we establish the critical impact of execution diversity across both instrumentation density and hardware platforms for detecting heisenbugs; the benefits of which outweigh any reduction in efficiency from limited instrumentation or weaker devices. We develop an experimental WebAssembly-backed dynamic data-race detection framework, Beanstalk, which exploits this diversity to show superior bug detection capability compared to any homogeneous instrumentation strategy on a fixed compute budget. Beanstalk's approach also gains power with scale, making it suitable for low-overhead deployments across numerous compute nodes. Finally, based on a rigorous statistical treatment of bugs observed by Beanstalk, we propose a novel metric, the heisen factor, that similar detectors can utilize to categorize heisenbugs and measure effectiveness. We reflect on our analysis of Beanstalk to provide insight on effective debugging strategies for both in-house and in deployment settings.
more »
« less
- Award ID(s):
- 2148301
- PAR ID:
- 10642079
- Publisher / Repository:
- Association for Computing Machinery (ACM)
- Date Published:
- Journal Name:
- Proceedings of the ACM on Programming Languages
- Volume:
- 9
- Issue:
- OOPSLA1
- ISSN:
- 2475-1421
- Format(s):
- Medium: X Size: p. 393-420
- Size(s):
- p. 393-420
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Happens before-based dynamic analysis is the go-to technique for detecting data races in large scale software projects due to the absence of false positive reports. However, such analyses are expensive since they employ expensive vector clock updates at each event, rendering them usable only for in-house testing. In this paper, we present a sampling-based, randomized race detector that processes onlyconstantly manyevents of the input trace even in the worst case. This is the firstsub-lineartime (i.e., running ino(n) time wherenis the length of the trace) dynamic race detection algorithm; previous sampling based approaches like run in linear time (i.e.,O(n)). Our algorithm is a property tester for -race detection — it is sound in that it never reports any false positive, and on traces that are far, with respect to hamming distance, from any race-free trace, the algorithm detects an -race with high probability. Our experimental evaluation of the algorithm and its comparison with state-of-the-art deterministic and sampling based race detectors shows that the algorithm does indeed have significantly low running time, and detects races quite often.more » « less
-
This paper proposes, EFTSanitizer, a fast shadow execution framework for detecting and debugging numerical errors during late stages of testing especially for long-running applications. Any shadow execution framework needs an oracle to compare against the floating point (FP) execution. This paper makes a case for using error free transformations, which is a sequence of operations to compute the error of a primitive operation with existing hardware supported FP operations, as an oracle for shadow execution. Although the error of a single correctly rounded FP operation is bounded, the accumulation of errors across operations can result in exceptions, slow convergences, and even crashes. To ease the job of debugging such errors, EFTSanitizer provides a directed acyclic graph (DAG) that highlights the propagation of errors, which results in exceptions or crashes. Unlike prior work, DAGs produced by EFTSanitizer include operations that span various function calls while keeping the memory usage bounded. To enable the use of such shadow execution tools with long-running applications, EFTSanitizer also supports starting the shadow execution at an arbitrary point in the dynamic execution, which we call selective shadow execution. EFTSanitizer is an order of magnitude faster than prior state-of-art shadow execution tools such as FPSanitizer and Herbgrind. We have discovered new numerical errors and debugged them using EFTSanitizer.more » « less
-
Popular platforms for teaching physical computing like the LilyPad Arduino and Adafruit Circuit Playground have simplified programming and wiring, enabling students to quickly engineer physical computing projects. But enabling students to rapidly design and build is a double-edged sword: Students can create functioning prototypes without fully understanding the underlying principles. With limited knowledge and experience, students struggle to locate and fix bugs, or errors, in their projects. Absent appropriate debugging tools, students rely on their instructor for locating errors, or worse, turn toward destructive tactics such as tearing apart and rebuilding their project, hoping the bug fixes itself. Students need tools targeted to their ability that scaffold debugging and help them locate bugs in the mixed hardware/software environment of physical computing. I developed Circuit Check to scaffold the debugging process for students. It enables students to observe real-time sensor data and test hardware components through a novel adaptation of the traditional breakpoint for physical computing.more » « less
-
Gresalfi, M.; Horn, I. (Ed.)Much attention has focused on student learning while making physical computational artifacts such as robots or electronic textiles, but little is known about how students engage with the hardware and software debugging issues that often arise. In order to better understand students’ debugging strategies and practices, we conducted and video-recorded eight think- aloud sessions (~45 minutes each) of high school student pairs debugging electronic textiles projects with researcher-designed programming and circuitry/crafting bugs. We analyzed each video to understand pairs’ debugging strategies and practices in navigating the multi- representational problem space. Our findings reveal the importance of employing system-level strategies while debugging physical computing systems, and of coordinating between various components of physical computing systems, for instance between the physical artifact, representations on paper, and the onscreen programming environment. We discuss the implications of our findings for future research and designing instruction and tools for learning with and debugging physical computing systems.more » « less
An official website of the United States government
