NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Circuit Optimization using Arithmetic Table Lookups

https://doi.org/10.1145/3729258

Malik, Raghav; Paranjape, Vedant; Kulkarni, Milind (June 2025, Proceedings of the ACM on Programming Languages)

Fully Homomorphic Encryption (FHE) is a cryptographic technique that enables privacy-preserving computation. State-of-the-art Boolean FHE implementations provide a very low-level interface, usually exposing a limited set of Boolean gates that programmers must use to write their FHE applications. This programming model is unnecessarily restrictive: many Boolean FHE schemes supportprogrammable bootstrapping, an operation that allows evaluation of an arbitrary fixed-size lookup table. However, most modern FHE compilers are only capable of reasoning about traditional Boolean circuits, and therefore struggle to take full advantage of programmable bootstrapping. We present COATL, an FHE compiler that makes use of programmable bootstrapping to produce circuits that are smaller and more efficient than their traditional Boolean counterparts. COATL generates circuits usingarithmetic lookup tables, a novel abstraction we introduce for reasoning about computations in Boolean FHE programs. We demonstrate on a variety of benchmarks that COATL can generate circuits that run up to 1.5× faster than those generated by other state-of-the-art compilation strategies.
more » « less
Free, publicly-accessible full text available June 10, 2026
RT-BarnesHut: Accelerating Barnes-Hut Using Ray-Tracing Hardware

Nagarajan, Vani; Gangaraju, Rohan; Sundararajah, Kirshanthan; Pelenitsyn, Artem; Kulkarni, Milind (February 2025, ACM)

Free, publicly-accessible full text available February 28, 2026
SparseAuto: An Auto-scheduler for Sparse Tensor Computations using Recursive Loop Nest Restructuring

https://doi.org/10.1145/3689730

Dias, Adhitha; Anderson, Logan; Sundararajah, Kirshanthan; Pelenitsyn, Artem; Kulkarni, Milind (October 2024, Proceedings of the ACM on Programming Languages)

Automated code generation and performance enhancements for sparse tensor algebra have become essential in many real-world applications, such as quantum computing, physical simulations, computational chemistry, and machine learning. General sparse tensor algebra compilers are not always versatile enough to generate asymptotically optimal code for sparse tensor contractions. This paper shows how to generate asymptotically better schedules for complex sparse tensor expressions using kernel fission and fusion. We present generalized loop restructuring transformations to reduce asymptotic time complexity and memory footprint. Furthermore, we present an auto-scheduler that uses a partially ordered set (poset)-based cost model that uses both time and auxiliary memory complexities to prune the search space of schedules. In addition, we highlight the use of Satisfiability Module Theory (SMT) solvers in sparse auto-schedulers to approximate the Pareto frontier of better schedules to the smallest number of possible schedules, with user-defined constraints available at compile-time. Finally, we show that our auto-scheduler can select better-performing schedules and generate code for them. Our results show that the auto-scheduler provided schedules achieve orders-of-magnitude speedup compared to the code generated by the Tensor Algebra Compiler (TACO) for several computations on different real-world tensors.
more » « less
Full Text Available
Orchard: Heterogeneous Parallelism and Fine-grained Fusion for Complex Tree Traversals

https://doi.org/10.1145/3652605

Singhal, Vidush; Sakka, Laith; Sundararajah, Kirshanthan; Newton, Ryan; Kulkarni, Milind (June 2024, ACM Transactions on Architecture and Code Optimization)

Many applications are designed to perform traversals ontree-likedata structures. Fusing and parallelizing these traversals enhance the performance of applications. Fusing multiple traversals improves the locality of the application. The runtime of an application can be significantly reduced by extracting parallelism and utilizing multi-threading. Prior frameworks have tried to fuse and parallelize tree traversals using coarse-grained approaches, leading to missed fine-grained opportunities for improving performance. Other frameworks have successfully supported fine-grained fusion on heterogeneous tree types but fall short regarding parallelization. We introduce a new frameworkOrchardbuilt on top ofGrafter.Orchard’s novelty lies in allowing the programmer to transform tree traversal applications by automatically applyingfine-grainedfusion and extractingheterogeneousparallelism.Orchardallows the programmer to write general tree traversal applications in a simple and elegant embedded Domain-Specific Language (eDSL). We show that the combination of fine-grained fusion and heterogeneous parallelism performs better than each alone when the conditions are met.
more » « less
Full Text Available
Garbage Collection for Mostly Serialized Heaps

https://doi.org/10.1145/3652024.3665512

Koparkar, Chaitanya S; Singhal, Vidush; Gupta, Aditya; Rainey, Mike; Vollmer, Michael; Pelenitsyn, Artem; Tobin-Hochstadt, Sam; Kulkarni, Milind; Newton, Ryan R (June 2024, ACM)

Full Text Available
Arkade: k-Nearest Neighbor Search With Non-Euclidean Distances using GPU Ray Tracing

https://doi.org/10.1145/3650200.3656601

Mandarapu, Durga Keerthi; Nagarajan, Vani; Pelenitsyn, Artem; Kulkarni, Milind (May 2024, ACM)

Full Text Available
Optimizing Layout of Recursive Datatypes with Marmoset: Or, Algorithms {+} Data Layouts {=} Efficient Programs

https://doi.org/10.4230/LIPICS.ECOOP.2024.38

Singhal, Vidush; Koparkar, Chaitanya; Zullo, Joseph; Pelenitsyn, Artem; Vollmer, Michael; Rainey, Mike; Newton, Ryan; Kulkarni, Milind (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Aldrich, Jonathan; Salvaneschi, Guido (Ed.)
While programmers know that memory representation of data structures can have significant effects on performance, compiler support to optimize the layout of those structures is an under-explored field. Prior work has optimized the layout of individual, non-recursive structures without considering how collections of those objects in linked or recursive data structures are laid out. This work introduces Marmoset, a compiler that optimizes the layouts of algebraic datatypes, with a special focus on producing highly optimized, packed data layouts where recursive structures can be traversed with minimal pointer chasing. Marmoset performs an analysis of how a recursive ADT is used across functions to choose a global layout that promotes simple, strided access for that ADT in memory. It does so by building and solving a constraint system to minimize an abstract cost model, yielding a predicted efficient layout for the ADT. Marmoset then builds on top of Gibbon, a prior compiler for packed, mostly-serial representations, to synthesize optimized ADTs. We show experimentally that Marmoset is able to choose optimal layouts across a series of microbenchmarks and case studies, outperforming both Gibbon’s baseline approach, as well as MLton, a Standard ML compiler that uses traditional pointer-heavy representations.
more » « less
Full Text Available
RT-kNNS Unbound: Using RT Cores to Accelerate Unrestricted Neighbor Search

https://doi.org/10.1145/3577193.3593738

Nagarajan, Vani; Mandarapu, Durga; Kulkarni, Milind (June 2023, ACM)

Full Text Available
RT-DBSCAN: Accelerating DBSCAN using Ray Tracing Hardware

https://doi.org/10.1109/IPDPS54959.2023.00100

Nagarajan, Vani; Kulkarni, Milind (May 2023, IEEE)

Full Text Available

Search for: All records