MLRegTest: A benchmark for the machine learning of regular languages [DATASET]

Van_Der_Poel, Sam; Lambert, Dakotah; Kostyszyn, Kalina; Gao, Tiantian; Verma, Rahul; Andersen, Derek; Chau, Joanne; Peterson, Emily; St_Clair, Cody; Fodor, Paul; Shibata, Chihiro; Heinz, Jeffrey

doi:10.5061/dryad.dncjsxm4h

Citation Details

MLRegTest: A benchmark for the machine learning of regular languages [DATASET]

MLRegTest is a benchmark for machine learning systems on sequence classification, which contains training, development, and test sets from 1,800 regular languages. MLRegTest organizes its languages according to their logical complexity (monadic second order, first order, propositional, or monomial expressions) and the kind of logical literals (string, tier-string, subsequence, or combinations thereof). The logical complexity and choice of literal provides a systematic way to understand different kinds of long-distance dependencies in regular languages, and therefore to understand the capacities of different ML systems to learn such long-distance dependencies. more »

Award ID(s):: 2125295

PAR ID:: 10615154

Author(s) / Creator(s):: Van_Der_Poel, Sam; Lambert, Dakotah; Kostyszyn, Kalina; Gao, Tiantian; Verma, Rahul; Andersen, Derek; Chau, Joanne; Peterson, Emily; St_Clair, Cody; Fodor, Paul; Shibata, Chihiro; Heinz, Jeffrey

Publisher / Repository:: Dryad

Date Published:: 2023-01-01

Subject(s) / Keyword(s):: sequence classification Supervised machine learning regular languages logical complexity Recurrent neural networks FOS: Computer and information sciences FOS: Computer and information sciences

Format(s):: Medium: X Size: 106406182796 bytes

Size(s):: 106406182796 bytes

Right(s):: Creative Commons Zero v1.0 Universal

Sponsoring Org:: National Science Foundation

Dataset:
https://doi.org/10.5061/dryad.dncjsxm4h

More Like this