skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on January 1, 2026

Title: Examining Spanish Counseling with MIDAS: a Motivational Interviewing Dataset in Spanish
Award ID(s):
2306372
PAR ID:
10616031
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Association for Computational Linguistics
Date Published:
Page Range / eLocation ID:
866 to 872
Format(s):
Medium: X
Location:
Albuquerque, New Mexico
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Neural Machine Translation (NMT) performs training of a neural network employing an encoder-decoder architecture. However, the quality of the neural-based translations predominantly depends on the availability of a large amount of bilingual training dataset. In this paper, we explore the performance of translations predicted by attention-based NMT systems for Spanish to Persian low-resource language pairs. We analyze the errors of NMT systems that occur in the Persian language and provide an in-depth comparison of the performance of the system based on variations in sentence length and size of the training dataset. We evaluate our translation results using BLEU and human evaluation measures based on the adequacy, fluency, and overall rating. 
    more » « less
  2. This article introduces ConfliBERT-Spanish, a pre-trained language model specialized in political conflict and violence for text written in the Spanish language. Our methodology relies on a large corpus specialized in politics and violence to extend the capacity of pre-trained models capable of processing text in Spanish. We assess the performance of ConfliBERT-Spanish in comparison to Multilingual BERT and BETO baselines for binary classification, multi-label classification, and named entity recognition. Results show that ConfliBERT-Spanish consistently outperforms baseline models across all tasks. These results show that our domain-specific language-specific cyberinfrastructure can greatly enhance the performance of NLP models for Latin American conflict analysis. This methodological advancement opens vast opportunities to help researchers and practitioners in the security sector to effectively analyze large amounts of information with high degrees of accuracy, thus better equipping them to meet the dynamic and complex security challenges affecting the region. 
    more » « less
  3. This paper details the development and features of the CNN-corpus in Spanish, possibly the largest test corpus for single document extractive text summarization in the Spanish language. Its current version encompasses 1,117 well-written texts in Spanish, each of them has an abstractive and an extractive summary. The development methodology adopted allows good-quality qualitative and quantitative assessments of summarization strategies for tools developed in the Spanish language. 
    more » « less
  4. Today, Spanish speaking countries face widespread political crisis. These political conflicts are published in a large volume of Spanish news articles from Spanish agencies. Our goal is to create a fully functioning system that parses realtime Spanish texts and generates scalable event code. Rather than translating Spanish text into English text and using English event coders, we aim to create a tool that uses raw Spanish text and Spanish event coders for better flexibility, coverage, and cost.To accommodate the processing of a large number of Spanish articles, we adapt a distributed framework based on Apache Spark. We highlight how to extend the existing ontology to provide support for the automated coding process for Spanish texts. We also present experimental data to provide insight into the data collection process with filtering unrelated articles, scaling the framework, and gathering basic statistics on the dataset. 
    more » « less
  5. We asked whether increased exposure to iambs, two-syllable words with stress on the second syllable (e.g., guitar), by way of another language – Spanish – facilitates English learning infants' segmentation of iambs. Spanish has twice as many iambic words (40%) compared to English (20%). Using the Headturn Preference Procedure we tested bilingual Spanish and English learning 8-month-olds' ability to segment English iambs. Monolingual English learning infants succeed at this task only by 11 months. We showed that at 8 months, bilingual Spanish and English learning infants successfully segmented English iambs, and not simply the stressed syllable, unlike their monolingual English learning peers. At the same age, bilingual infants failed to segment Spanish iambs, just like their monolingual Spanish peers. These results cannot be explained by bilingual infants' reliance on transitional probability cues to segment words in both their native languages because statistical cues were comparable in the two languages. Instead, based on their accelerated development, we argue for autonomous but interdependent development of the two languages of bilingual infants. 
    more » « less