skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on June 22, 2026

Title: Static Program Reduction via Type-Directed Slicing
A traditional program slicer constructs a smaller variant of a target program that computes the same result with respect to some target variable—that is, program slicing preserves the original program’srun-time semantics. We proposetype-directed slicing, which constructs a smaller program that guarantees that a typechecker will produce the same result on the sliced program when considering only a target program location—that is, a type-directed slicer preserves the target program’scompile-time semantics, from the view of a specific typechecker, with respect to some location. Type-directed slicing is a useful debugging aid for designers and maintainers of typecheckers. When a typechecker produces an unexpected result (a crash, a false positive warning, a missed warning, etc.) on a large codebase, the user typically reports a bug to the maintainers of the typechecker without an accompanying test case. State-of-the-art approaches to thisprogram reduction problemare dynamic: they require repeatedly running the typechecker to validate minimizations. A type-directed slicer solves this problem statically, without rerunning the typechecker, by exploiting the modularity inherent in a typechecker’s type rules. Our prototype type-directed slicer for Java is fully automatic, can operate on incomplete programs, and is fast. It produces a small test case that preserves typechecker misbehavior for 25 of 28 (89%) historical bugs from the issue trackers of three widely-used typecheckers: the Java compiler itself, NullAway, and the Checker Framework; in each of these 25 cases, it preserved the typechecker’s behavior even without the classpath of the target program. And, it runs in under a minute on each benchmark, whose size ranges up to millions of lines of code, on a free-tier CI runner.  more » « less
Award ID(s):
2414110
PAR ID:
10617583
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
Proceedings of the ACM on Software Engineering
Volume:
2
Issue:
ISSTA
ISSN:
2994-970X
Page Range / eLocation ID:
2068 to 2090
Subject(s) / Keyword(s):
program reduction, type systems, typechecker, program minimization
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The goal ofprogrammatic Learning from Demonstration (LfD)is to learn a policy in a programming language that can be used to control a robot’s behavior from a set of user demonstrations. This paper presents a new programmatic LfD algorithm that targetslong-horizon robot taskswhich require synthesizing programs with complex control flow structures, including nested loops with multiple conditionals. Our proposed method first learns a program sketch that captures the target program’s control flow and then completes this sketch using an LLM-guided search procedure that incorporates a novel technique for proving unrealizability of programming-by-demonstration problems. We have implemented our approach in a new tool calledprolexand present the results of a comprehensive experimental evaluation on 120 benchmarks involving complex tasks and environments. We show that, given a 120 second time limit,prolexcan find a program consistent with the demonstrations in 80% of the cases. Furthermore, for 81% of the tasks for which a solution is returned,prolexis able to find the ground truth program with just one demonstration. In comparison, CVC5, a syntaxguided synthesis tool, is only able to solve 25% of the caseseven when given the ground truth program sketch, and an LLM-based approach, GPT-Synth, is unable to solve any of the tasks due to the environment complexity. 
    more » « less
  2. Parallel programs are frequently modeled asdependencyorcostgraphs, which can be used to detect various bugs, or simply to visualize the parallel structure of the code. However, such graphs reflect just one particular execution and are typically constructed in apost-hocmanner.Graph types, which were introduced recently to mitigate this problem, can be assigned statically to a program by a type system and compactly represent the family of all graphs that could result from the program. Unfortunately, prior work is restricted in its treatment offutures, an increasingly common and especially dynamic form of parallelism. In short, each instance of a future must be statically paired with a vertex name. Previously, this led to the restriction that futures could not be placed in collections or be used to construct data structures. Doing so is not a niche exercise: such structures form the basis of numerous algorithms that use forms of pipelining to achieve performance not attainable without futures. All but the most limited of these examples are out of reach of prior graph type systems. In this paper, we propose a graph type system that allows for almost arbitrary combinations of futures and recursive data types. We do so by indexing datatypes with a type-levelvertex structure, a codata structure that supplies unique vertex names to the futures in a data structure. We prove the soundness of the system in a parallel core calculus annotated with vertex structures and associated operations. Although the calculus is annotated, this is merely for convenience in defining the type system. We prove that it is possible to annotate arbitrary recursive types with vertex structures, and show using a prototype inference engine that these annotations can be inferred from OCaml-like source code for several complex parallel algorithms. 
    more » « less
  3. We present a novel symbolic reasoning engine for SQL which can efficiently generate an inputIfornqueriesP1, ⋯,Pn, such that their outputs onIsatisfy a given property (expressed in SMT). This is useful in different contexts, such as disproving equivalence of two SQL queries and disambiguating a set of queries. Our first idea is to reason about an under-approximation of eachPi— that is, a subset ofPi’s input-output behaviors. While it makes our approach both semantics-aware and lightweight, this idea alone is incomplete (as a fixed under-approximation might miss some behaviors of interest). Therefore, our second idea is to perform search over an expressive family of under-approximations (which collectively cover all program behaviors of interest), thereby making our approach complete. We have implemented these ideas in a tool, Polygon, and evaluated it on over 30,000 benchmarks across two tasks (namely, SQL equivalence refutation and query disambiguation). Our evaluation results show that Polygon significantly outperforms all prior techniques. 
    more » « less
  4. The tensor programming abstraction is a foundational paradigm which allows users to write high performance programs via a high-level imperative interface. Recent work onsparse tensor compilershas extended this paradigm to sparse tensors (i.e., tensors where most entries are not explicitly represented). With these systems, users define the semantics of the program and the algorithmic decisions in a concise language that can be compiled to efficient low-level code. However, these systems still require users to make complex decisions about program structure and memory layouts to write efficient programs. This work presents.Galley, a system for declarative tensor programming that allows users to write efficient tensor programs without making complex algorithmic decisions. Galley is the first system to perform cost based lowering of sparse tensor algebra to the imperative language of sparse tensor compilers, and the first to optimize arbitrary operators beyond Σ and *. First, it decomposes the input program into a sequence of aggregation steps through a novel extension of the FAQ framework. Second, Galley optimizes and converts each aggregation step to a concrete program, which is compiled and executed with a sparse tensor compiler. We show that Galley produces programs that are 1-300x faster than competing methods for machine learning over joins and 5-20x faster than a state-of-the-art relational database for subgraph counting workloads with a minimal optimization overhead. 
    more » « less
  5. Nondeterminism introduced by race conditions and message reorderings makes parallel and distributed programming hard. Nevertheless, promising approaches such as LVars and CRDTs address this problem by introducing a partial order structure on shared state that describes how the state evolves over time.Monotoneprograms that respect the order are deterministic. Datalog-inspired languages incorporate this idea of monotonicity in a first-class way but they are not general-purpose. We would like parallel and distributed languages to be as natural to use as any functional language, without sacrificing expressivity, and with a formal basis of study as appealing as the lambda calculus. This paper presents λ, a core language for deterministic parallelism that embodies the ideas above. In λ, values may increase over time according to astreaming orderand all computations are monotone with respect to that order. The streaming order coincides with the approximation order found in Scott semantics and so unifies the foundations of functional programming with the foundations of deterministic distributed computation. The resulting lambda calculus has a computationally adequate model rooted in domain theory. It integrates the compositionality and power of abstraction characteristic of functional programming with the declarative nature of Datalog. 
    more » « less