CareCorpus+: Expanding and Augmenting Caregiver Strategy Data to Support Pediatric Rehabilitation

Farzana, Shahla; Lucero, Ivana; Villegas, Vivian; Kaelin, Vera C; Khetani, Mary; Parde, Natalie

doi:10.18653/v1/2024.emnlp-main.392

Citation Details

CareCorpus+: Expanding and Augmenting Caregiver Strategy Data to Support Pediatric Rehabilitation

Caregiver strategy classification in pediatric rehabilitation contexts is strongly motivated by real-world clinical constraints but highly under-resourced and seldom studied in natural language processing settings. We introduce a large dataset of 4,037 caregiver strategies in this setting, a five-fold increase over the nearest contemporary dataset. These strategies are manually categorized into clinically established constructs with high agreement (𝜅=0.68-0.89). We also propose two techniques to further address identified data constraints. First, we manually supplement target task data with publicly relevant data from online child health forums. Next, we propose a novel data augmentation technique to generate synthetic caregiver strategies with high downstream task utility. Extensive experiments showcase the quality of our dataset. They also establish evidence that both the publicly available data and the synthetic strategies result in large performance gains, with relative F1 increases of 22.6% and 50.9%, respectively. more »

Award ID(s):: 2125411

PAR ID:: 10636435

Author(s) / Creator(s):: Farzana, Shahla; Lucero, Ivana; Villegas, Vivian; Kaelin, Vera C; Khetani, Mary; Parde, Natalie

Publisher / Repository:: Association for Computational Linguistics

Date Published:: 2024-11-12

Page Range / eLocation ID:: 6912 to 6927

Format(s):: Medium: X

Location:: Miami, Florida, USA

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.18653/v1/2024.emnlp-main.392

More Like this