NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Gauguin, Descartes, Bayes: A Diurnal Golem’s Brain

https://doi.org/10.1145/3759429.3762631

Chandra, Kartik; Liu, Amanda; Ragan-Kelley, Jonathan; Tenenbaum, Joshua B (October 2025, ACM)

A "quine" is a deterministic program that prints itself. In this essay, I will show you a "gauguine": a probabilistic program that infers itself. A gauguine is repeatedly asked to guess its own source code. Initially, its chances of guessing correctly are of course minuscule. But as the gauguine observes more and more of its own previous guesses, it detects patterns of behavior and gains information about its inner workings. This information allows it to bootstrap self-knowledge, and ultimately discover its own source code. We will discuss how—and why—we might write a gauguine, and what we stand to learn by constructing one.
more » « less
Free, publicly-accessible full text available October 9, 2026
Lightweight and Locality-Aware Composition of Black-Box Subroutines

https://doi.org/10.1145/3729292

Bansal, Manya; Sharlet, Dillon; Ragan-Kelley, Jonathan; Amarasinghe, Saman (June 2025, Proceedings of the ACM on Programming Languages)

Subroutines are essential building blocks in software design: users encapsulate common functionality in libraries and write applications by composing calls to subroutines. Unfortunately, performance may be lost at subroutine boundaries due to reduced locality and increased memory consumption. Operator fusion helps recover the performance lost at composition boundaries. Previous solutions fuse operators by manually rewriting code into monolithic fused subroutines, or by relying on heavy-weight compilers to generate code that performs fusion. Both approaches require a semantic understanding of the entire computation, breaking the decoupling necessary for modularity and reusability of subroutines. In this work, we attempt to identify the minimal ingredients required to fuse computations, enabling composition of subroutines without sacrificing performance or modularity. We find that, unlike previous approaches that require a semantic understanding of the computation, most opportunities for fusion require understanding only data production and consumption patterns.Exploiting this insight, we add fusion on top of black-box subroutines by proposing a lightweight enrichment of subroutine declarations to expose data-dependence patterns. We implement our approach in a system called Fern, and demonstrate Fern’s benefits by showing that it is competitive with state-of-the-art, high-performance libraries with manually fused operators, can fuse across library and domain boundaries for unforeseen workloads, and can deliver speedups of up to 5× over unfused code.
more » « less
Free, publicly-accessible full text available June 10, 2026
Exo 2: Growing a Scheduling Language

https://doi.org/10.1145/3669940.3707218

Ikarashi, Yuka; Qian, Kevin; Droubi, Samir; Reinking, Alex; Bernstein, Gilbert Louis; Ragan-Kelley, Jonathan (March 2025, ACM)

Free, publicly-accessible full text available March 30, 2026
A Verified Compiler for a Functional Tensor Language

https://doi.org/10.1145/3656390

Liu, Amanda; Bernstein, Gilbert; Chlipala, Adam; Ragan-Kelley, Jonathan (June 2024, Proceedings of the ACM on Programming Languages)

Producing efficient array code is crucial in high-performance domains like image processing and machine learning. It requires the ability to control factors like compute intensity and locality by reordering computations into different stages and granularities with respect to where they are stored. However, traditional pure, functional tensor languages struggle to do so. In a previous publication, we introduced ATL as a pure, functional tensor language capable of systematically decoupling compute and storage order via a set of high-level combinators known as reshape operators. Reshape operators are a unique functional-programming construct since they manipulate storage location in the generated code by modifying the indices that appear on the left-hand sides of storage expressions. We present a formal correctness proof for an implementation of the compilation algorithm, marking the first verification of a lowering algorithm targeting imperative loop nests from a source functional language that enables separate control of compute and storage ordering. One of the core difficulties of this proof required properly formulating the complex invariants to ensure that these storage-index remappings were well-formed. Notably, this exercise revealed asoundness bugin the original published compilation algorithm regarding the truncation reshape operators. Our fix is a new type system that captures safety conditions that were previously implicit and enables us to prove compiler correctness for well-typed source programs. We evaluate this type system and compiler implementation on a range of common programs and optimizations, including but not limited to those previously studied to demonstrate performance comparable to established compilers like Halide.
more » « less
Inferring the Future by Imagining the Past

Chandra, Kartik; Chen, Tony; Li, Tzu-Mao; Ragan-Kelley, Jonathan; Tenenbaum, Joshua (December 2023, Proceedings of 2023 Conference on Neural Information Processing Systems)

A single panel of a comic book can say a lot: it can depict not only where the characters currently are, but also their motions, their motivations, their emotions, and what they might do next. More generally, humans routinely infer complex sequences of past and future events from a static snapshot of a dynamic scene, even in situations they have never seen before. In this paper, we model how humans make such rapid and flexible inferences. Building on a long line of work in cognitive science, we offer a Monte Carlo algorithm whose inferences correlate well with human intuitions in a wide variety of domains, while only using a small, cognitively-plausible number of samples. Our key technical insight is a surprising connection between our inference problem and Monte Carlo path tracing, which allows us to apply decades of ideas from the computer graphics community to this seemingly-unrelated theory of mind task.
more » « less
Full Text Available
SLANG.D: Fast, Modular and Differentiable Shader Programming

https://doi.org/10.1145/3618353

Bangaru, Sai_Praveen; Wu, Lifan; Li, Tzu-Mao; Munkberg, Jacob; Bernstein, Gilbert; Ragan-Kelley, Jonathan; Durand, Frédo; Lefohn, Aaron; He, Yong (December 2023, ACM Transactions on Graphics)

We introduce SLANG.D, an extension to the Slang shading language that incorporates first-class automatic differentiation support. The new shading language allows us to transform a Direct3D-based path tracer to be fully differentiable with minor modifications to existing code. SLANG.D enables a shared ecosystem between machine learning frameworks and pre-existing graphics hardware API-based rendering systems, promoting the interchange of components and ideas across these two domains. Our contributions include a differentiable type system designed to ensure type safety and semantic clarity in codebases that blend differentiable and non-differentiable code, language primitives that automatically generate both forward and reverse gradient propagation methods, and a compiler architecture that generates efficient derivative propagation shader code for graphics pipelines. Our compiler supports differentiating code that involves arbitrary control-flow, dynamic dispatch, generics and higher-order differentiation, while providing developers flexible control of checkpointing and gradient aggregation strategies for best performance. Our system allows us to differentiate an existing real-time path tracer, Falcor, with minimal change to its shader code. We show that the compiler-generated derivative kernels perform as efficiently as handwritten ones. In several benchmarks, the SLANG.D code achieves significant speedup when compared to prior automatic differentiation systems.
more » « less
Acting as Inverse Inverse Planning

https://doi.org/10.1145/3588432.3591510

Chandra, Kartik; Li, Tzu-Mao; Tenenbaum, Joshua; Ragan-Kelley, Jonathan (July 2023, ACM)

Great storytellers know how to take us on a journey. They direct characters to act—not necessarily in the most rational way—but rather in a way that leads to interesting situations, and ultimately creates an impactful experience for audience members looking on. If audience experience is what matters most, then can we help artists and animators directly craft such experiences, independent of the concrete character actions needed to evoke those experiences? In this paper, we offer a novel computational framework for such tools. Our key idea is to optimize animations with respect to simulated audience members’ experiences. To simulate the audience, we borrow an established principle from cognitive science: that human social intuition can be modeled as “inverse planning,” the task of inferring an agent’s (hidden) goals from its (observed) actions. Building on this model, we treat storytelling as “inverse inverse planning,” the task of choosing actions to manipulate an inverse planner’s inferences. Our framework is grounded in literary theory, naturally capturing many storytelling elements from first principles. We give a series of examples to demonstrate this, with supporting evidence from human subject studies.
more » « less
Fast Instruction Selection for Fast Digital Signal Processing

https://doi.org/10.1145/3623278.3624768

Root, Alexander J; Ahmad, Maaz_Bin Safeer; Sharlet, Dillon; Adams, Andrew; Kamil, Shoaib; Ragan-Kelley, Jonathan (March 2023, ACM)

Modern vector processors support a wide variety of instructions for fixed-point digital signal processing. These instructions support a proliferation of rounding, saturating, and type conversion modes, and are often fused combinations of more primitive operations. While these are common idioms in fixed-point signal processing, it is difficult to use these operations in portable code. It is challenging for programmers to write down portable integer arithmetic in a C-like language that corresponds exactly to one of these instructions, and even more challenging for compilers to recognize when these instructions can be used. Our system, Pitchfork, defines a portable fixed-point intermediate representation, FPIR, that captures common idioms in fixed-point code. FPIR can be used directly by programmers experienced with fixed-point, or Pitchfork can automatically lift from integer operations into FPIR using a term-rewriting system (TRS) composed of verified manual and automatically-synthesized rules. Pitchfork then lowers from FPIR into target-specific fixed-point instructions using a set of target-specific TRSs. We show that this approach improves runtime performance of portably-written fixed-point signal processing code in Halide, across a range of benchmarks, by geomean 1.31× on x86 with AVX2, 1.82× on ARM Neon, and 2.44× on Hexagon HVX compared to a standard LLVM-based compiler flow, while maintaining or improving existing compile times.
more » « less
Full Text Available
Gradient Descent: The Ultimate Optimizer

Chandra, Kartik; Xie, Audrey; Ragan-Kelley, Jonathan; Meijer, Erik (December 2022, Advances in neural information processing systems)

Working with any gradient-based machine learning algorithm involves the tedious task of tuning the optimizer's hyperparameters, such as its step size. Recent work has shown how the step size can itself be optimized alongside the model parameters by manually deriving expressions for "hypergradients" ahead of time. We show how to automatically compute hypergradients with a simple and elegant modification to backpropagation. This allows us to easily apply the method to other optimizers and hyperparameters (eg momentum coefficients). We can even recursively apply the method to its own hyper-hyperparameters, and so on ad infinitum. As these towers of optimizers grow taller, they become less sensitive to the initial choice of hyperparameters. We present experiments validating this for MLPs, CNNs, and RNNs. Finally, we provide a simple PyTorch implementation of this algorithm (see http://people. csail. mit. edu/kach/gradient-descent-the-ultimate-optimizer).
more » « less
Designing Perceptual Puzzles by Differentiating Probabilistic Programs

https://doi.org/10.1145/3528233.3530715

Chandra, Kartik; Li, Tzu-Mao; Tenenbaum, Joshua; Ragan-Kelley, Jonathan (August 2022, ACM SIGGRAPH Conference Proceedings)

Full Text Available

Search for: All records