Regulator: Dynamic Analysis to Detect ReDoS

McLaughlin, R.

Regular expressions (regexps) are a convenient way for programmers to express complex string searching logic. Sev- eral popular programming languages expose an interface to a regexp matching subsystem, either by language-level primi- tives or through standard libraries. The implementations be- hind these matching systems vary greatly in their capabilities and running-time characteristics. In particular, backtracking matchers may exhibit worst-case running-time that is either linear, polynomial, or exponential in the length of the string being searched. Such super-linear worst-case regexps expose applications to Regular Expression Denial-of-Service (Re- DoS) when inputs can be controlled by an adversarial attacker. In this work, we investigate the impact of ReDoS in back- tracking engines, a popular type of engine used by most programming languages. We evaluate several existing tools against a dataset of broadly collected regexps, and find that despite extensive theoretical work in this field, none are able to achieve both high precision and high recall. To address this gap in existing work, we develop REGULATOR, a novel dy- namic, fuzzer-based analysis system for identifying regexps vulnerable to ReDoS. We implement this system by directly instrumenting a popular backtracking regexp engine, which increases the scope of supported regexp syntax and features over prior work. Finally, we evaluate this system against three common regexp datasets, and demonstrate a seven-fold in- crease in true positives discovered when comparing against existing tools.

More Like this