Quantifying the Evaluation of Heuristic Methods for Textual Data Augmentation

Kashefi, Omid; Hwa, Rebecca

doi:10.18653/v1/2020.wnut-1.26

Citation Details

Quantifying the Evaluation of Heuristic Methods for Textual Data Augmentation

Data augmentation has been shown to be effective in providing more training data for machine learning and resulting in more robust classifiers. However, for some problems, there may be multiple augmentation heuristics, and the choices of which one to use may significantly impact the success of the training. In this work, we propose a metric for evaluating augmentation heuristics; specifically, we quantify the extent to which an example is “hard to distinguish” by considering the difference between the distribution of the augmented samples of different classes. Experimenting with multiple heuristics in two prediction tasks (positive/negative sentiment and verbosity/conciseness) validates our claims by revealing the connection between the distribution difference of different classes and the classification accuracy. more »

Award ID(s):: 1735752

PAR ID:: 10248657

Author(s) / Creator(s):: Kashefi, Omid; Hwa, Rebecca

Date Published:: 2020-11-01

Journal Name:: Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

Page Range / eLocation ID:: 200 to 208

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.18653/v1/2020.wnut-1.26

More Like this