ConfliBERT-Spanish: A Pre-trained Spanish Language Model for Political Conflict and Violence

Yang, Wooseong; Alsarra, Sultan; Abdeljaber, Luay; Zawad, Niamat; Delaram, Zeinab; Osorio, Javier; Khan, Latifur; Brandt, Patrick T; D’Orazio, Vito

doi:10.1109/CiSt56084.2023.10409883

Citation Details

ConfliBERT-Spanish: A Pre-trained Spanish Language Model for Political Conflict and Violence

This article introduces ConfliBERT-Spanish, a pre-trained language model specialized in political conflict and violence for text written in the Spanish language. Our methodology relies on a large corpus specialized in politics and violence to extend the capacity of pre-trained models capable of processing text in Spanish. We assess the performance of ConfliBERT-Spanish in comparison to Multilingual BERT and BETO baselines for binary classification, multi-label classification, and named entity recognition. Results show that ConfliBERT-Spanish consistently outperforms baseline models across all tasks. These results show that our domain-specific language-specific cyberinfrastructure can greatly enhance the performance of NLP models for Latin American conflict analysis. This methodological advancement opens vast opportunities to help researchers and practitioners in the security sector to effectively analyze large amounts of information with high degrees of accuracy, thus better equipping them to meet the dynamic and complex security challenges affecting the region. more »