skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Robust Theory of Series Parallel Graphs
Motivated by distributed data processing applications, we introduce a class of labeled directed acyclic graphs constructed using sequential and parallel composition operations, and study automata and logics over them. We show that deterministic and non-deterministic acceptors over such graphs have the same expressive power, which can be equivalently characterized by Monadic Second-Order logic and the graded µ-calculus. We establish closure under composition operations and decision procedures for membership, emptiness, and inclusion. A key feature of our graphs, calledsynchronized series-parallel graphs(SSPG), is that parallel composition introduces a synchronization edge from the newly introduced source vertex to the sink. The transfer of information enabled by such edges is crucial to the determinization construction, which would not be possible for the traditional definition of series-parallel graphs. SSPGs allow both ordered ranked parallelism and unordered unranked parallelism. The latter feature means that in the corresponding automata, the transition function needs to account for an arbitrary number of predecessors by counting each type of state only up to a specified constant, thus leading to a notion ofcounting complexitythat is distinct from the classical notion of state complexity. The determinization construction translates a nondeterministic automaton withnstates andkcounting complexity to a deterministic automaton with 2n2states andkncounting complexity, and both these bounds are shown to be tight. Furthermore, for nondeterministic automata a bound of 2 on counting complexity suffices without loss of expressiveness.  more » « less
Award ID(s):
1763514
PAR ID:
10601875
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Association for Computing Machinery (ACM)
Date Published:
Journal Name:
Proceedings of the ACM on Programming Languages
Volume:
7
Issue:
POPL
ISSN:
2475-1421
Format(s):
Medium: X Size: p. 1058-1088
Size(s):
p. 1058-1088
Sponsoring Org:
National Science Foundation
More Like this
  1. Bouajjani, A.; Holík, L.; Wu, Z. (Ed.)
    When omega-regular objectives were first proposed in model-free reinforcement learning (RL) for controlling MDPs, deterministic Rabin automata were used in an attempt to provide a direct translation from their transitions to scalar values. While these translations failed, it has turned out that it is possible to repair them by using good-for-MDPs (GFM) Buechi automata instead. These are nondeterministic Buechi automata with a restricted type of nondeterminism, albeit not as restricted as in good-for-games automata. Indeed, deterministic Rabin automata have a pretty straightforward translation to such GFM automata, which is bi-linear in the number of states and pairs. Interestingly, the same cannot be said for deterministic Streett automata: a translation to nondeterministic Rabin or Buechi automata comes at an exponential cost, even without requiring the target automaton to be good-for-MDPs. Do we have to pay more than that to obtain a good-for-MDPs automaton? The surprising answer is that we have to pay significantly less when we instead expand the good-for-MDPs property to alternating automata: like the nondeterministic GFM automata obtained from deterministic Rabin automata, the alternating good-for-MDPs automata we produce from deterministic Streett automata are bi-linear in the size of the deterministic automaton and its index. They can therefore be exponentially more succinct than the minimal nondeterministic Buechi automaton. 
    more » « less
  2. The determinization of a nondeterministic Büchi automaton (NBA) is a fundamental construction of automata theory, with applications to probabilistic verification and reactive synthesis. The standard determinization constructions, such as the ones based on the Safra-Piterman’s approach, work on the whole NBA. In this work we propose a divide-and-conquer determinization approach. To this end, we first classify the strongly connected components (SCCs) of the given NBA as inherently weak, deterministic accepting, and nondeterministic accepting. We then present how to determinize each type of SCC independently from the others; this results in an easier handling of the determinization algorithm that takes advantage of the structure of that SCC. Once all SCCs have been determinized, we show how to compose them so to obtain the final equivalent deterministic Emerson-Lei automaton, which can be converted into a deterministic Rabin automaton without blow-up of states and transitions. We implement our algorithm in our tool COLA and empirically evaluate COLA with the state-of-the-art tools Spot and Owl on a large set of benchmarks from the literature. The experimental results show that our prototype COLA outperforms Spot and Owl regarding the number of states and transitions. 
    more » « less
  3. Multi-pattern matching is widely used in modern software for applications requiring high throughput such as protein search, network traffic inspection, virus or spam detection. Graphics Processor Units (GPUs) excel at executing massively parallel workloads. Regular expression (regex) matching is typically performed by simulating the execution of deterministic finite automata (DFAs) or nondeterministic finite automata (NFAs). The natural implementations of these automata simulation algorithms on GPUs are highly inefficient because they give rise to irregular memory access patterns. This paper presents HybridSA, a heterogeneous CPU-GPU parallel engine for multi-pattern matching. HybridSA uses bit parallelism to efficiently simulate NFAs on GPUs, thus reducing the number of memory accesses and increasing the throughput. Our bit-parallel algorithms extend the classical shift-and algorithm for string matching to a large class of regular expressions and reduce automata simulation to a small number of bitwise operations. We have developed a compiler to translate regular expressions into bit masks, perform optimizations, and choose the best algorithms to run on the GPU. The majority of the regular expressions are accelerated on the GPU, while the patterns that exhibit random memory accesses are executed on the CPU in parallel. We evaluate HybridSA against state-of-the-art CPU and GPU engines, as well as a hybrid combination of the two. HybridSA achieves between 4 and 60 times higher throughput than the state-of-the-art CPU engine and between 4 and 233 times better than the state-of-the-art GPU engine across a collection of real-world benchmarks. 
    more » « less
  4. Finite-state automata serve as compute kernels for many application domains such as pattern matching and data analytics. Existing approaches on GPUs exploit three levels of parallelism in automata processing tasks: 1)~input stream level, 2)~automaton-level and 3)~state-level. Among these, only state-level parallelism is intrinsic to automata while the other two levels of parallelism depend on the number of automata and input streams to be processed. As GPU resources increase, a parallelism-limited automata processing task can underutilize GPU compute resources. To this end, we propose AsyncAP, a low-overhead approach that optimizes for both scalability and throughput. Our insight is that most automata processing tasks have an additional source of parallelism originating from the input symbols which has not been leveraged before. Making the matching process associated with the automata tasks asynchronous, i.e., parallel GPU threads start processing an input stream from different input locations instead of processing it serially, improves throughput significantly and scales with input length. When the task does not have enough parallelism to utilize all the GPU cores, detailed evaluation across 12 evaluated applications shows that AsyncAP achieves up to 58× speedup on average over the state-of-the-art GPU automata processing engine. When the tasks have enough parallelism to utilize GPU cores, AsyncAP still achieves 2.4× speedup. 
    more » « less
  5. We present an algorithm for constructing a depth-first search tree in planar digraphs; the algorithm can be implemented in the complexity class UL, which is contained in nondeterministic logspace NL, which in turn lies in NC^2. Pior to this (for more than a quarter-century), the fastest uniform deterministic parallel algorithm for this problem was O(log^10 n) (corresponding to the complexity class AC^10 ⊆ NC^11). We also consider the problem of computing depth-first search trees in other classes of graphs, and obtain additional new upper bounds. 
    more » « less