skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on June 17, 2026

Title: Galley: Modern Query Optimization for Sparse Tensor Programs
The tensor programming abstraction is a foundational paradigm which allows users to write high performance programs via a high-level imperative interface. Recent work onsparse tensor compilershas extended this paradigm to sparse tensors (i.e., tensors where most entries are not explicitly represented). With these systems, users define the semantics of the program and the algorithmic decisions in a concise language that can be compiled to efficient low-level code. However, these systems still require users to make complex decisions about program structure and memory layouts to write efficient programs. This work presents.Galley, a system for declarative tensor programming that allows users to write efficient tensor programs without making complex algorithmic decisions. Galley is the first system to perform cost based lowering of sparse tensor algebra to the imperative language of sparse tensor compilers, and the first to optimize arbitrary operators beyond Σ and *. First, it decomposes the input program into a sequence of aggregation steps through a novel extension of the FAQ framework. Second, Galley optimizes and converts each aggregation step to a concrete program, which is compiled and executed with a sparse tensor compiler. We show that Galley produces programs that are 1-300x faster than competing methods for machine learning over joins and 5-20x faster than a state-of-the-art relational database for subgraph counting workloads with a minimal optimization overhead.  more » « less
Award ID(s):
2314527 2312195
PAR ID:
10627045
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
ACM SIGMOD
Date Published:
Journal Name:
Proceedings of the ACM on Management of Data
Volume:
3
Issue:
3
ISSN:
2836-6573
Page Range / eLocation ID:
1 to 24
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Fully Homomorphic Encryption (FHE) is a cryptographic technique that enables privacy-preserving computation. State-of-the-art Boolean FHE implementations provide a very low-level interface, usually exposing a limited set of Boolean gates that programmers must use to write their FHE applications. This programming model is unnecessarily restrictive: many Boolean FHE schemes supportprogrammable bootstrapping, an operation that allows evaluation of an arbitrary fixed-size lookup table. However, most modern FHE compilers are only capable of reasoning about traditional Boolean circuits, and therefore struggle to take full advantage of programmable bootstrapping. We present COATL, an FHE compiler that makes use of programmable bootstrapping to produce circuits that are smaller and more efficient than their traditional Boolean counterparts. COATL generates circuits usingarithmetic lookup tables, a novel abstraction we introduce for reasoning about computations in Boolean FHE programs. We demonstrate on a variety of benchmarks that COATL can generate circuits that run up to 1.5× faster than those generated by other state-of-the-art compilation strategies. 
    more » « less
  2. Tensor programs often need to process large tensors (vectors, matrices, or higher order tensors) that require a specialized storage format for their memory layout. Several such layouts have been proposed in the literature, such as the Coordinate Format, the Compressed Sparse Row format, and many others, that were especially designed to optimally store tensors with specific sparsity properties. However, existing tensor processing systems require specialized extensions in order to take advantage of every new storage format. In this paper we describe a system that allows users to define flexible storage formats in a declarative tensor query language, similar to the language used by the tensor program. The programmer only needs to write storage mappings, which describe, in a declarative way, how the tensors are laid out in main memory. Then, we describe a cost-based optimizer that optimizes the tensor program for the specific memory layout. We demonstrate empirically significant performance improvements compared to state-of-the-art tensor processing systems. 
    more » « less
  3. Producing efficient array code is crucial in high-performance domains like image processing and machine learning. It requires the ability to control factors like compute intensity and locality by reordering computations into different stages and granularities with respect to where they are stored. However, traditional pure, functional tensor languages struggle to do so. In a previous publication, we introduced ATL as a pure, functional tensor language capable of systematically decoupling compute and storage order via a set of high-level combinators known as reshape operators. Reshape operators are a unique functional-programming construct since they manipulate storage location in the generated code by modifying the indices that appear on the left-hand sides of storage expressions. We present a formal correctness proof for an implementation of the compilation algorithm, marking the first verification of a lowering algorithm targeting imperative loop nests from a source functional language that enables separate control of compute and storage ordering. One of the core difficulties of this proof required properly formulating the complex invariants to ensure that these storage-index remappings were well-formed. Notably, this exercise revealed asoundness bugin the original published compilation algorithm regarding the truncation reshape operators. Our fix is a new type system that captures safety conditions that were previously implicit and enables us to prove compiler correctness for well-typed source programs. We evaluate this type system and compiler implementation on a range of common programs and optimizations, including but not limited to those previously studied to demonstrate performance comparable to established compilers like Halide. 
    more » « less
  4. This paper introduces a method to synthesize a 3D tensor field within a constrained geometric domain represented as a tetrahedral mesh. Whereas previous techniques optimize forisotropicfields, we focus onanisotropictensor fields that are smooth and aligned with the domain boundary or user guidance. The key ingredient of our method is a novel computational design framework, built on top of thesymmetric orthogonally decomposable(odeco) tensor representation, to optimize the stretching ratios and orientations for each tensor in the domain. In contrast to past techniques designed only forisotropictensors, we demonstrate the efficacy of our approach in generating smooth volumetric tensor fields with highanisotropyand shape conformity, especially for the domain with complex shapes. We apply these anisotropic tensor fields to various applications, such as anisotropic meshing, structural mechanics, and fabrication. 
    more » « less
  5. Fully Homomorphic Encryption (FHE) enables computing on encrypted data, letting clients securely offload computation to untrusted servers. While enticing, FHE has two key challenges that limit its applicability: it has high performance overheads (10,000× over unencrypted computation) and it is extremely hard to program. Recent hardware accelerators and algorithmic improvements have reduced FHE’s overheads and enabled large applications to run under FHE. These large applications exacerbate FHE’s programmability challenges. Writing FHE programs directly is hard because FHE schemes expose a restrictive, low-level interface that prevents abstraction and composition. Specifically, FHE requires packing encrypted data into large vectors (tens of thousands of elements long), FHE provides limited operations on these vectors, and values have noise that grows with each operation, which creates unintuitive performance tradeoffs. As a result, translating large applications, like neural networks, into efficient FHE circuits takes substantial tedious work. We address FHE’s programmability challenges with the Fhelipe FHE compiler. Fhelipe exposes a simple, numpy-styletensorprogramming interface, and compiles high-level tensor programs into efficient FHE circuits. Fhelipe’s key contribution isautomatic data packing, which chooses data layouts for tensors and packs them into ciphertexts to maximize performance. Our novel framework considers a wide range of layouts and optimizes them analytically. This lets compile large FHE programs efficiently, unlike prior FHE compilers, which either use inefficient layouts or do not scale beyond tiny programs. We evaluate on both a state-of-the-art FHE accelerator and a CPU. is the first compiler that matches or exceeds the performance of large hand-optimized FHE applications, like deep neural networks, and outperforms a state-of-the-art FHE compiler by gmean 18.5. At the same time, dramatically simplifies programming, reducing code size by 10–48. 
    more » « less