NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Flo: A Semantic Foundation for Progressive Stream Processing

https://doi.org/10.1145/3704845

Laddad, Shadaj; Cheung, Alvin; Hellerstein, Joseph M; Milano, Mae (January 2025, Proceedings of the ACM on Programming Languages)

Streaming systems are present throughout modern applications, processing continuous data in real-time. Existing streaming languages have a variety of semantic models and guarantees that are often incompatible. Yet all these languages are considered streaming---what do they have in common? In this paper, we identify two general yet precise semantic properties: streaming progress and eager execution. Together, they ensure that streaming outputs are deterministic and kept fresh with respect to streaming inputs. We formally define these properties in the context of Flo, a parameterized streaming language that abstracts over dataflow operators and the underlying structure of streams. It leverages a lightweight type system to distinguish bounded streams, which allow operators to block on termination, from unbounded ones. Furthermore, Flo provides constructs for dataflow composition and nested graphs with cycles. To demonstrate the generality of our properties, we show how key ideas from representative streaming and incremental computation systems---Flink, LVars, and DBSP---have semantics that can be modeled in Flo and guarantees that map to our properties.
more » « less
Free, publicly-accessible full text available January 7, 2026
Inferring Visualization Intent from Conversation

https://doi.org/10.1145/3627673.3679589

Li, Haotian; Chalapathi, Nithin; Qu, Huamin; Cheung, Alvin; Parameswaran, Aditya G (October 2024, ACM)

Full Text Available
ADELT: Transpilation between Deep Learning Frameworks

https://doi.org/10.24963/ijcai.2024/694

Gong, Linyuan; Wang, Jiayi; Cheung, Alvin (August 2024, International Joint Conferences on Artificial Intelligence Organization)

We propose the Adversarial DEep Learning Transpiler (ADELT), a novel approach to source-to-source transpilation between deep learning frameworks. ADELT uniquely decouples code skeleton transpilation and API keyword mapping. For code skeleton transpilation, it uses few-shot prompting on large language models (LLMs), while for API keyword mapping, it uses contextual embeddings from a code-specific BERT. These embeddings are trained in a domain-adversarial setup to generate a keyword translation dictionary. ADELT is trained on an unlabeled web-crawled deep learning corpus, without relying on any hand-crafted rules or parallel data. It outperforms state-of-the-art transpilers, improving pass@1 rate by 16.2 pts and 15.0 pts for PyTorch-Keras and PyTorch-MXNet transpilation pairs respectively. We provide open access to our code at https://github.com/gonglinyuan/adelt
more » « less
Full Text Available
QED: A Powerful Query Equivalence Decider for SQL

https://doi.org/10.14778/3681954.3682024

Wang, Shuxian; Pan, Sicheng; Cheung, Alvin (July 2024, Proceedings of the VLDB Endowment)

Checking query equivalence is of great significance in database systems. Prior work in automated query equivalence checking sets the first steps in formally modeling and reasoning about query optimization rules, but only supports a limited number of query features. In this paper, we present Qed, a new framework for query equivalence checking based on bag semantics. Qed uses a new formalism called Q-expressions that models queries using different normal forms for efficient equivalence checking, and models features such as integrity constraints and NULLs in a principled way unlike prior work. Our formalism also allows us to define a new query fragment that encompasses many real-world queries with a complete equivalence checking algorithm, assuming a complete first-order theory solver. Empirically, Qed can verify 299 out of 444 query pairs extracted from the Calcite framework and 979 out of 1287 query pairs extracted from CockroachDB, which is more than 2× the number of cases proven by prior state-of-the-art solver.
more » « less
Full Text Available
Syntactic Code Search with Sequence-to-Tree Matching: Supporting Syntactic Search with Incomplete Code Fragments

https://doi.org/10.1145/3656460

Matute, Gabriel; Ni, Wode; Barik, Titus; Cheung, Alvin; Chasins, Sarah_E (June 2024, Proceedings of the ACM on Programming Languages)

Lightweight syntactic analysis tools like Semgrep and Comby leverage the tree structure of code, making them more expressive than string and regex search. Unlike traditional language frameworks (e.g., ESLint) that analyze codebases via explicit syntax tree manipulations, these tools use query languages that closely resemble the source language. However, state-of-the-art matching techniques for these tools require queries to be complete and parsable snippets, which makes in-progress query specifications useless. We propose a new search architecture that relies only on tokenizing (not parsing) a query. We introduce a novel language and matching algorithm to support tree-aware wildcards on this architecture by building on tree automata. We also presentstsearch, a syntactic search tool leveraging our approach. In contrast to past work, our approach supports syntactic searcheven for previously unparsable queries.We show empirically that stsea rch can support all tokenizable queries, while still providing results comparable to Semgrep for existing queries. Our work offers evidence that lightweight syntactic code search can accept in-progress specifications, potentially improving support for interactive settings. CCS Concepts: •Software and its engineering→Formal language definitions;Software maintenance tools;•Information systems→Query representation;•Theory of computation→ Tree languages.
more » « less
To Tile or not to Tile, That is the Question

https://doi.org/10.1109/IPDPSW63119.2024.00096

Haan, Altan; Popovici, Doru Thom; Sen, Koushik; Iancu, Costin; Cheung, Alvin (May 2024, IEEE)

Full Text Available
Spatialyze: A Geospatial Video Analytics System with Spatial-Aware Optimizations

https://doi.org/10.14778/3665844.3665846

Kittivorawong, Chanwut; Ge, Yongming; Helal, Yousef; Cheung, Alvin (May 2024, Proceedings of the VLDB Endowment)

Videos that are shot using commodity hardware such as phones and surveillance cameras record various metadata such as time and location. We encounter suchgeospatial videoson a daily basis and such videos have been growing in volume significantly. Yet, we do not have data management systems that allow users to interact with such data effectively. In this paper, we describe Spatialyze, a new framework for end-to-end querying of geospatial videos. Spatialyze comes with a domain-specific language where users can construct geospatial video analytic workflows using a 3-step, declarative,build-filter-observeparadigm. Internally, Spatialyze leverages the declarative nature of such workflows, the temporal-spatial metadata stored with videos, and physical behavior of real-world objects to optimize the execution of workflows. Our results using real-world videos and workflows show that Spatialyze can reduce execution time by up to 5.3×, while maintaining up to 97.1% accuracy compared to unoptimized execution.
more » « less
Full Text Available
Tenspiler: A Verified-Lifting-Based Compiler for Tensor Operations

https://doi.org/10.4230/LIPIcs.ECOOP.2024.32

Qiu, Jie; Cai, Colin; Bhatia, Sahil; Hasabnis, Niranjan; Seshia, Sanjit A; Cheung, Alvin (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Aldrich, Jonathan; Salvaneschi, Guido (Ed.)
Tensor processing infrastructures such as deep learning frameworks and specialized hardware accelerators have revolutionized how computationally intensive code from domains such as deep learning and image processing is executed and optimized. These infrastructures provide powerful and expressive abstractions while ensuring high performance. However, to utilize them, code must be written specifically using the APIs / ISAs of such software frameworks or hardware accelerators. Importantly, given the fast pace of innovation in these domains, code written today quickly becomes legacy as new frameworks and accelerators are developed, and migrating such legacy code manually is a considerable effort. To enable developers in leveraging such DSLs while preserving their current programming paradigm, we present Tenspiler, a verified-lifting-based compiler that uses program synthesis to translate sequential programs written in general-purpose programming languages (e.g., C++ or Python code that does not leverage any specialized framework or accelerator) into tensor operations. Central to Tenspiler is our carefully crafted yet simple intermediate language, named TensIR, that expresses tensor operations. TensIR enables efficient lifting, verification, and code generation. Unlike classical pattern-matching-based compilers, Tenspiler uses program synthesis to translate input code into TensIR, which is then compiled to the target API / ISA. Currently, Tenspiler already supports six DSLs, spanning a broad spectrum of software and hardware environments. Furthermore, we show that new backends can be easily supported by Tenspiler by adding simple pattern-matching rules for TensIR. Using 10 real-world code benchmark suites, our experimental evaluation shows that by translating code to be executed on 6 different software frameworks and hardware devices, Tenspiler offers on average 105× kernel and 9.65× end-to-end execution time improvement over the fully-optimized sequential implementation of the same benchmarks.
more » « less
Full Text Available
SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using Training Dynamics

https://doi.org/10.18653/v1/2024.naacl-long.345

Ardakani, Arash; Haan, Altan; Tan, Shangyin; Popovici, Doru Thom; Cheung, Alvin; Iancu, Costin; Sen, Koushik (January 2024, Association for Computational Linguistics)

Full Text Available
Towards Auto-Generated Data Systems

https://doi.org/10.14778/3611540.3611635

Cheung, Alvin; Ahmad, Maaz Bin; Haynes, Brandon; Kittivorawong, Chanwut; Laddad, Shadaj; Liu, Xiaoxuan; Wang, Chenglong; Yan, Cong (August 2023, Proceedings of the VLDB Endowment)

After decades of progress, database management systems (DBMSs) are now the backbones of many data applications that we interact with on a daily basis. Yet, with the emergence of new data types and hardware, building and optimizing new data systems remain as difficult as the heyday of relational databases. In this paper, we summarize our work towards automating the building and optimization of data systems. Drawing from our own experience, we further argue that any automation technique must address three aspects: user specification, code generation, and result validation. We conclude by discussing a case study using videos data processing, along with opportunities for future research towards designing data systems that are automatically generated.
more » « less
Full Text Available

« Prev Next »

Search for: All records