skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Programming Techniques for the Automata Processor
The Micron Automata Processor (AP) is a novel co-processor accelerator that supports the parallel execution of multiple Nondeterministic Finite Automata (NFA) programmed directly into hardware over a single data-stream. In this paper, we present a number of programming techniques to develop automata that execute efficiently on this processor. First, we present general techniques to transform NFAs defined in their classical representation to the representation used by the AP, and optimize the same. Then, we present automata development techniques using simple but powerful generic building blocks. All the above techniques are generic in nature and can be useful to application developers working on this new upcoming co-processor architecture.  more » « less
Award ID(s):
1448333
PAR ID:
10124163
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2016 45th International Conference on Parallel Processing (ICPP)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, we study the acceleration of applications that require searching for all occurrences of thousands of string-patterns in an input data-stream, using the Automata Processor (AP). For this purpose, we use two applications from two fields, namely, network security and bioinformatics. The first application, called Fast-SNAP (for Fast-SNort using AP), scans network data for 4312 signatures of intrusion derived from the popular open-source Snort database. Using the resources of a single AP board, Fast-SNAP can scan for all these signatures at 10.3 Gbps. The second application, called PROTOMATA (for PROTein autOMATA), looks for all occurrences of 1308 protein motifs from the PROSITE database in protein sequences. PROTOMATA is up to half a million times faster than its single-CPU-based counterpart. The techniques developed to program these applications may be useful in the design and development of similar applications using this new hardware accelerator. 
    more » « less
  2. The Automata Processor was designed for string-pattern matching. In this paper, we showcase its use to execute integer and floating-point comparisons and apply the same to accelerate interval stabbing queries. An interval stabbing query determines which of the intervals in a set overlap a query point. Such queries are often used in computational geometry, pattern matching, database management systems, and geographic information systems. The check for each interval is programmed as a single automaton and multiple automata are executed in parallel to provide significant performance gains. While handling 32-bit integers or single-precision floating-point numbers, up to 2.75 trillion comparisons can be executed per second, whereas 0.79 trillion comparisons per second can be completed for 64-bit integers or double-precision floating-point numbers. Additionally, our solution leaves the intervals in the set unordered; allowing addition or deletion of an interval in constant time. This is not possible for contemporary solutions wherein the intervals are ordered, making the query times faster, but making the updating of intervals complex. Our automata designs exemplify techniques that maximize resource utilization and minimize performance bottlenecks, which may be useful to future application developers on this processor. Their modular design allows them to become constituent parts of larger automata, where the numerical comparisons are part of the overall pattern matching operation. We have validated the designs on hardware, and the routines to generate the necessary automata and execute them on the AP will be made available as software libraries shortly. 
    more » « less
  3. Finite State Automata are widely used to accelerate pattern matching in many emerging application domains like DNA sequencing and XML parsing. Conventional CPUs and compute-centric accelerators are bottlenecked by memory bandwidth and irregular memory access patterns in automata processing. We present Cache Automaton, which repurposes last-level cache for automata processing, and a compiler that automates the process of mapping large real world Non-Deterministic Finite Automata (NFAs) to the proposed architecture. Cache Automaton extends a conventional last-level cache architecture with components to accelerate two phases in NFA processing: state-match and state-transition. State-matching is made efficient using a sense-amplifier cycling technique that exploits spatial locality in symbol matches. State-transition is made efficient using a new compact switch architecture. By overlapping these two phases for adjacent symbols we realize an efficient pipelined design. We evaluate two designs, one optimized for performance and the other optimized for space, across a set of 20 diverse benchmarks. The performance optimized design provides a speedup of 15× over DRAM-based Micron's Automata Processor and 3840× speedup over processing in a conventional x86 CPU. The proposed design utilizes on an average 1.2MB of cache space across benchmarks, while consuming 2.3nJ of energy per input symbol. Our space optimized design can reduce the cache utilization to 0.72MB, while still providing a speedup of 9× over AP. 
    more » « less
  4. Part-of-speech (POS) tagging is the foundation of many natural language processing applications. Rule-based POS tagging is a wellknown solution, which assigns tags to the words using a set of predefined rules. Many researchers favor statistical-based approaches over rule-based methods for better empirical accuracy. However, until now, the computational cost of rule-based POS tagging has made it difficult to study whether more complex rules or larger rulesets could lead to accuracy competitive with statistical approaches. In this paper, we leverage two hardware accelerators, the Automata Processor (AP) and Field Programmable Gate Arrays (FPGA), to accelerate rule-based POS tagging by converting rules to regular expressions and exploiting the highly-parallel regular-expressionmatching ability of these accelerators. We study the relationship between rule set size and accuracy, and observe that adding more rules only poses minimal overhead on the AP and FPGA. This allows a substantial increase in the number and complexity of rules, leading to accuracy improvement. Our experiments on Treebank and Brown corpora achieve up to 2,600X and 1,914X speedups on the AP and on the FPGA respectively over rule-based methods on the CPU in the rule-matching stage, up to 58× speedup over the Perceptron POS tagger on the CPU in total testing time, and up to 253× speedup over the LSTM tagger on the GPU in total testing time, while showing a competitive accuracy compared to neural-network and statistical solutions. 
    more » « less
  5. The notion of comparison between system runs is fundamental in formal verification. This concept is implicitly present in the verification of qualitative systems, and is more pronounced in the verification of quantitative systems. In this work, we identify a novel mode of comparison in quantitative systems: the online comparison of the aggregate values of two sequences of quantitative weights. This notion is embodied by comparator automata (comparators, in short), a new class of automata that read two infinite sequences of weights synchronously and relate their aggregate values. Weshowthat aggregate functions that can be represented with B¨uchi automaton result in comparators that are finite-state and accept by the B¨uchi condition as well. Such ω-regular comparators further lead to generic algorithms for a number of well-studied problems, including the quantitative inclusion and winning strategies in quantitative graph games with incomplete information, as well as related non-decision problems, such as obtaining a f inite representation of all counterexamples in the quantitative inclusion problem. We study comparators for two aggregate functions: discounted-sum and limit-average. We prove that the discounted-sum comparator is ω-regular iff the discount-factor is an integer. Not every aggregate function, however, has an ω-regular comparator. Specifically, we show that the language of sequence-pairs for which limit-average aggregates exist is neither ω-regular nor ω-context-free. Given this result, we introduce the notion of prefixaverage as a relaxation of limit-average aggregation, and show that it admits ω-context-free comparators i.e. comparator automata expressed by B¨uchi pushdown automata. 
    more » « less