Automated Testing Linguistic Capabilities of NLP Models

Lee, Jaeseong; Chen, Simin; Mordahl, Austin; Liu, Cong; Yang, Wei; Wei, Shiyi

doi:10.1145/3672455

Citation Details

Automated Testing Linguistic Capabilities of NLP Models

Natural language processing (NLP) has gained widespread adoption in the development of real-world applications. However, the black-box nature of neural networks in NLP applications poses a challenge when evaluating their performance, let alone ensuring it. Recent research has proposed testing techniques to enhance the trustworthiness of NLP-based applications. However, most existing works use a single, aggregated metric (i.e., accuracy) which is difficult for users to assess NLP model performance on fine-grained aspects, such as LCs. To address this limitation, we present ALiCT, an automated testing technique for validating NLP applications based on their LCs. ALiCT takes user-specified LCs as inputs and produces diverse test suite with test oracles for each of given LC. We evaluate ALiCT on two widely adopted NLP tasks, sentiment analysis and hate speech detection, in terms of diversity, effectiveness, and consistency. Using Self-BLEU and syntactic diversity metrics, our findings reveal that ALiCT generates test cases that are 190% and 2213% more diverse in semantics and syntax, respectively, compared to those produced by state-of-the-art techniques. In addition, ALiCT is capable of producing a larger number of NLP model failures in 22 out of 25 LCs over the two NLP applications. more »

Award ID(s):: 2146443 2312397

PAR ID:: 10592632

Author(s) / Creator(s):: Lee, Jaeseong; Chen, Simin; Mordahl, Austin; Liu, Cong; Yang, Wei; Wei, Shiyi

Publisher / Repository:: ACM

Date Published:: 2024-09-30

Journal Name:: ACM Transactions on Software Engineering and Methodology

Volume:: 33

Issue:: 7

ISSN:: 1049-331X

Page Range / eLocation ID:: 1 to 33

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1145/3672455

More Like this