Debugging is a critical but challenging task for programmers. This paper proposes ChatDBG, an AI-powered debugging assistant. ChatDBG integrates large language models (LLMs) to significantly enhance the capabilities and user-friendliness of conventional debuggers. ChatDBG lets programmers engage in a collaborative dialogue with the debugger, allowing them to pose complex questions about program state, perform root cause analysis for crashes or assertion failures, and explore open-ended queries like why is x null?. To handle these queries, ChatDBG grants the LLM autonomy to take the wheel: it can act as an independent agent capable of querying and controlling the debugger to navigate through stacks and inspect program state. It then reports its findings and yields back control to the programmer. By leveraging the real-world knowledge embedded in LLMs, ChatDBG can diagnose issues identifiable only through the use of domain-specific reasoning. Our ChatDBG prototype integrates with standard debuggers including LLDB and GDB for native code and Pdb for Python. Our evaluation across a diverse set of code, including C/C++ code with known bugs and a suite of Python code including standalone scripts and Jupyter notebooks, demonstrates that ChatDBG can successfully analyze root causes, explain bugs, and generate accurate fixes for a wide range of real-world errors. For the Python programs, a single query led to an actionable bug fix 67% of the time; one additional follow-up query increased the success rate to 85%. ChatDBG has seen rapid uptake; it has already been downloaded more than 75,000 times.
more »
« less
Assume(), Capture(), Verify(), Establish(): A Vocabulary for Static Program Analysis
We propose a set of functions that a user can invoke to analyze a program written in a C-like language: Assume() refers to a label in the source code or to a program part, and enables the user to make an assumption about the state of the program at some label or the function of some program part; Capture() refers to a label or a program part and returns an assertion about the state of the program at the label or the function of the program part; Verify() refers to a label or a program part and tests a unary assertion about the state of the program at the label or a binary assertion about the function of the program part; Establish() refers to a label or a program part and modifies the program code to make Verify() return TRUE at that label or program part, if it did not originally. We discuss the foundations of this tool as well as a preliminary implementation.
more »
« less
- Award ID(s):
- 2043104
- PAR ID:
- 10538758
- Publisher / Repository:
- IEEE
- Date Published:
- ISBN:
- 979-8-3503-5939-8
- Page Range / eLocation ID:
- 845 to 846
- Subject(s) / Keyword(s):
- Assume(), Capture(), Verify(), Establish() Symbolic execution while loops Mathematica (C) Wolfran Research invariant relations.
- Format(s):
- Medium: X
- Location:
- Chiang Mai, Thailand
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Analyzing multithreaded programs is notoriously hard due to the exponential number of thread interleavings. Although race detectors can help developers find and fix such bugs before the code is deployed, multithreaded code may still be buggy due to memory errors and assertion violations that are not due to race conditions. This paper presents a property directed symbolic execution of multithreaded code. Our approach, named SIFT, differs from previous work on detecting errors in multithreaded code by being property directed and by handling both memory safety and assertion checking that can be further customized by the user. SIFT can detect bugs that may or may not be due to data races, and works in an iterative way. In each step, it explores the state space using selective scheduling based on a set of interleaving points that have been inferred in the previous step. We have developed three partitioning strategies for improved effectiveness and performance. We have implemented SIFT on top of the KLEE symbolic execution engine and applied it to various real-world and academic benchmarks. SIFT could detect more vulnerabilities than a state-of-the-art memory vulnerability detector.more » « less
-
This paper considers the callback reachability problem --- determining if a callback can be called by an event-driven framework in an unexpected state. Event-driven programming frameworks are pervasive for creating user-interactive applications (apps) on just about every modern platform. Control flow between callbacks is determined by the framework and largely opaque to the programmer. This opacity of the callback control flow not only causes difficulty for the programmer but is also difficult for those developing static analysis. Previous static analysis techniques address this opacity either by assuming an arbitrary framework implementation or attempting to eagerly specify all possible callback control flow, but this is either too coarse to prove properties requiring callback-ordering constraints or too burdensome and tricky to get right. Instead, we present a middle way where the callback control flow can be gradually refined in a targeted manner to prove assertions of interest. The key insight to get this middle way is by reasoning about the history of method invocations at the boundary between app and framework code --- enabling a decoupling of the specification of callback control flow from the analysis of app code. We call the sequence of such boundary-method invocations message histories and develop message-history logics to do this reasoning. In particular, we define the notion of an application-only transition system with boundary transitions, a message-history program logic for programs with such transitions, and a temporal specification logic for capturing callback control flow in a targeted and compositional manner. Then to utilize the logics in a goal-directed verifier, we define a way to combine after-the-fact an assertion about message histories with a specification of callback control flow. We implemented a prototype message history-based verifier called Historia and provide evidence that our approach is uniquely capable of distinguishing between buggy and fixed versions on challenging examples drawn from real-world issues and that our targeted specification approach enables proving the absence of multi-callback bug patterns in real-world open-source Android apps.more » « less
-
Industrial deployments of automated program repair (APR), e.g., at Facebook and Bloomberg, signal a new milestone for this exciting and potentially impactful technology. In these deployments, developers use APR-generated patch suggestions as part of a human-driven debugging process. Unfortunately, little is known about how using patch suggestions affects developers during debugging. This paper conducts a controlled user study with 40 developers with a median of 6 years of experience. The developers engage in debugging tasks on nine naturally-occurring defects in real-world, open-source, Java projects, using Recoder, SimFix, and TBar, three state-of-the-art APR tools. For each debugging task, the developers either have access to the project's tests, or, also, to code suggestions that make all the tests pass. These suggestions are either developer-written or APR-generated, which can be correct or deceptive. Deceptive suggestions, which are a common APR occurrence, make all the available tests pass but fail to generalize to the intended specification. Through a total of 160 debugging sessions, we find that access to a code suggestion significantly increases the odds of submitting a patch. Correct APR suggestions increase the odds of debugging success by 14,000%, but deceptive suggestions decrease the odds of success by 65%. Correct suggestions also speed up debugging. Surprisingly, we observe no significant difference in how novice and experienced developers are affected by APR, suggesting that APR may find uses across the experience spectrum. Overall, developers come away with a strong positive impression of APR, suggesting promise for APR-mediated, human-driven debugging, despite existing challenges in APR-generated repair quality.more » « less
-
An efficient optimizing compiler can perform many cascading rewrites in a single pass, using auxiliary data structures such as variable binding maps, delayed substitutions, and occurrence counts. Such optimizers often perform transformations according to relatively simple rewrite rules, but the subtle interactions between the data structures needed for efficiency make them tricky to write and trickier to prove correct. We present a system for semi-automatically deriving both an efficient program transformation and its correctness proof from a list of rewrite rules and specifications of the auxiliary data structures it requires. Dependent types ensure that the holes left behind by our system (for the user to fill in) are filled in correctly, allowing the user low-level control over the implementation without having to worry about getting it wrong. We implemented our system in Coq (though it could be implemented in other logics as well), and used it to write optimization passes that perform uncurrying, inlining, dead code elimination, and static evaluation of case expressions and record projections. The generated implementations are sometimes faster, and at most 40% slower, than hand-written counterparts on a small set of benchmarks; in some cases, they require significantly less code to write and prove correct.more » « less
An official website of the United States government

