NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

RT-BarnesHut: Accelerating Barnes-Hut Using Ray-Tracing Hardware

Nagarajan, Vani; Gangaraju, Rohan; Sundararajah, Kirshanthan; Pelenitsyn, Artem; Kulkarni, Milind (February 2025, ACM)

Free, publicly-accessible full text available February 28, 2026
SparseAuto: An Auto-scheduler for Sparse Tensor Computations using Recursive Loop Nest Restructuring

https://doi.org/10.1145/3689730

Dias, Adhitha; Anderson, Logan; Sundararajah, Kirshanthan; Pelenitsyn, Artem; Kulkarni, Milind (October 2024, Proceedings of the ACM on Programming Languages)

Automated code generation and performance enhancements for sparse tensor algebra have become essential in many real-world applications, such as quantum computing, physical simulations, computational chemistry, and machine learning. General sparse tensor algebra compilers are not always versatile enough to generate asymptotically optimal code for sparse tensor contractions. This paper shows how to generate asymptotically better schedules for complex sparse tensor expressions using kernel fission and fusion. We present generalized loop restructuring transformations to reduce asymptotic time complexity and memory footprint. Furthermore, we present an auto-scheduler that uses a partially ordered set (poset)-based cost model that uses both time and auxiliary memory complexities to prune the search space of schedules. In addition, we highlight the use of Satisfiability Module Theory (SMT) solvers in sparse auto-schedulers to approximate the Pareto frontier of better schedules to the smallest number of possible schedules, with user-defined constraints available at compile-time. Finally, we show that our auto-scheduler can select better-performing schedules and generate code for them. Our results show that the auto-scheduler provided schedules achieve orders-of-magnitude speedup compared to the code generated by the Tensor Algebra Compiler (TACO) for several computations on different real-world tensors.
more » « less
Full Text Available
Arkade: k-Nearest Neighbor Search With Non-Euclidean Distances using GPU Ray Tracing

https://doi.org/10.1145/3650200.3656601

Mandarapu, Durga Keerthi; Nagarajan, Vani; Pelenitsyn, Artem; Kulkarni, Milind (May 2024, ACM)

Full Text Available
Garbage Collection for Mostly Serialized Heaps

https://doi.org/10.1145/3652024.3665512

Koparkar, Chaitanya S; Singhal, Vidush; Gupta, Aditya; Rainey, Mike; Vollmer, Michael; Pelenitsyn, Artem; Tobin-Hochstadt, Sam; Kulkarni, Milind; Newton, Ryan R (June 2024, ACM)

Full Text Available
Optimizing Layout of Recursive Datatypes with Marmoset: Or, Algorithms {+} Data Layouts {=} Efficient Programs

https://doi.org/10.4230/LIPICS.ECOOP.2024.38

Singhal, Vidush; Koparkar, Chaitanya; Zullo, Joseph; Pelenitsyn, Artem; Vollmer, Michael; Rainey, Mike; Newton, Ryan; Kulkarni, Milind (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Aldrich, Jonathan; Salvaneschi, Guido (Ed.)
While programmers know that memory representation of data structures can have significant effects on performance, compiler support to optimize the layout of those structures is an under-explored field. Prior work has optimized the layout of individual, non-recursive structures without considering how collections of those objects in linked or recursive data structures are laid out. This work introduces Marmoset, a compiler that optimizes the layouts of algebraic datatypes, with a special focus on producing highly optimized, packed data layouts where recursive structures can be traversed with minimal pointer chasing. Marmoset performs an analysis of how a recursive ADT is used across functions to choose a global layout that promotes simple, strided access for that ADT in memory. It does so by building and solving a constraint system to minimize an abstract cost model, yielding a predicted efficient layout for the ADT. Marmoset then builds on top of Gibbon, a prior compiler for packed, mostly-serial representations, to synthesize optimized ADTs. We show experimentally that Marmoset is able to choose optimal layouts across a series of microbenchmarks and case studies, outperforming both Gibbon’s baseline approach, as well as MLton, a Standard ML compiler that uses traditional pointer-heavy representations.
more » « less
Full Text Available
Type stability in Julia: avoiding performance pathologies in JIT compilation

https://doi.org/10.1145/3485527

Pelenitsyn, Artem; Belyakova, Julia; Chung, Benjamin; Tate, Ross; Vitek, Jan (October 2021, Proceedings of the ACM on Programming Languages)

As a scientific programming language, Julia strives for performance but also provides high-level productivity features. To avoid performance pathologies, Julia users are expected to adhere to a coding discipline that enables so-called type stability. Informally, a function is type stable if the type of the output depends only on the types of the inputs, not their values. This paper provides a formal definition of type stability as well as a stronger property of type groundedness, shows that groundedness enables compiler optimizations, and proves the compiler correct. We also perform a corpus analysis to uncover how these type-related properties manifest in practice.
more » « less

Search for: All records