Confli-T5: An AutoPrompt Pipeline for Conflict Related Text Augmentation

Parolin, Erick Skorupa; Hu, Yibo; Khan, Latifur; Brandt, Patrick T.; Osorio, Javier; D'Orazio, Vito

doi:10.1109/BigData55660.2022.10020509

Citation Details

Confli-T5: An AutoPrompt Pipeline for Conflict Related Text Augmentation

Recent advances in natural language processing (NLP) and Big Data technologies have been crucial for scientists to analyze political unrest and violence, prevent harm, and promote global conflict management. Government agencies and public security organizations have invested heavily in deep learning-based applications to study global conflicts and political violence. However, such applications involving text classification, information extraction, and other NLP-related tasks require extensive human efforts in annotating/labeling texts. While limited labeled data may drastically hurt the models’ performance (over-fitting), large demands on annotation tasks may turn real-world applications impracticable. To address this problem, we propose Confli-T5, a prompt-based method that leverages the domain knowledge from existing political science ontology to generate synthetic but realistic labeled text samples in the conflict and mediation domain. Our model allows generating textual data from the ground up and employs our novel Double Random Sampling mechanism to improve the quality (coherency and consistency) of the generated samples. We conduct experiments over six standard datasets relevant to political science studies to show the superiority of Confli-T5. Our codes are publicly available more »

Award ID(s):: 1931541

PAR ID:: 10470312

Author(s) / Creator(s):: Parolin, Erick Skorupa; Hu, Yibo; Khan, Latifur; Brandt, Patrick T.; Osorio, Javier; D'Orazio, Vito

Publisher / Repository:: IEEE

Date Published:: 2022-12-17

ISBN:: 978-1-6654-8045-1

Page Range / eLocation ID:: 1906 to 1913

Format(s):: Medium: X

Location:: Osaka, Japan

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/BigData55660.2022.10020509

More Like this