Membership Testing for Semantic Regular Expressions

Huang, Yifei; Amini, Matin; Le_Glaunec, Alexis; Mamouras, Konstantinos; Raghothaman, Mukund

doi:10.1145/3729300

Citation Details

This content will become publicly available on June 10, 2026

Membership Testing for Semantic Regular Expressions

This paper is about semantic regular expressions (SemREs). This is a concept that was recently proposed by Smore (Chen et al. 2023) in which classical regular expressions are extended with a primitive to query external oracles such as databases and large language models (LLMs). SemREs can be used to identify lines of text containing references to semantic concepts such as cities, celebrities, political entities, etc. The focus in their paper was on automatically synthesizing semantic regular expressions from positive and negative examples. In this paper, we study themembership testing problem. First, we present a two-pass NFA-based algorithm to determine whether a stringwmatches a SemRErinO(|r|²|w|²+ |r| |w|³) time, assuming the oracle responds to each query in unit time. In common situations, where oracle queries are not nested, we show that this procedure runs inO(|r|²|w|²) time. Experiments with a prototype implementation of this algorithm validate our theoretical analysis, and show that the procedure massively outperforms a dynamic programming-based baseline, and incurs a ≈ 2 × overhead over the time needed for interaction with the oracle. Second, we establish connections between SemRE membership testing and the triangle finding problem from graph theory, which suggest that developing algorithms which are simultaneously practical and asymptotically faster might be challenging. Furthermore, algorithms for classical regular expressions primarily aim to optimize their time and memory consumption. In contrast, an important consideration in our setting is to minimize the cost of invoking the oracle. We demonstrate an Ω(|w|²) lower bound on the number of oracle queries necessary to make this determination. more »

Award ID(s):: 2107429 2107261 2313062

PAR ID:: 10625805

Author(s) / Creator(s):: Huang, Yifei; Amini, Matin; Le_Glaunec, Alexis; Mamouras, Konstantinos; Raghothaman, Mukund

Publisher / Repository:: ACM

Date Published:: 2025-06-10

Journal Name:: Proceedings of the ACM on Programming Languages

Volume:: 9

Issue:: PLDI

ISSN:: 2475-1421

Page Range / eLocation ID:: 1245 to 1268

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on June 10, 2026
Journal Article:
https://doi.org/10.1145/3729300

More Like this