skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, July 12 until 9:00 AM ET on Saturday, July 13 due to maintenance. We apologize for the inconvenience.

This content will become publicly available on August 20, 2024

Title: An Equitable Framework for Automatically Assessing Children's Oral Narrative Language Abilities
This work proposes a novel framework for automatically scor- ing children’s oral narrative language abilities. We use audio recordings from 3rd-8th graders of the Atlanta, Georgia area as they take a portion of the Test of Narrative Language. We de- sign a system which extracts linguistic features and fine-tuned BERT-based self-supervised learning representation from state- of-the-art ASR transcripts. We predict manual test scores from the extracted features. This framework significantly outper- forms a deterministic method based on the assessment’s scoring rubric. Last, we evaluate the system performance across stu- dent’s reading level, dialect, and diagnosed learning/language disabilities to establish fairness across diverse demographics of students. Using this system, we achieve approximately 98% classification accuracy of student scores. We are also able to identify key areas of improvement for this type of system across demographic areas and reading ability.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Date Published:
Journal Name:
Prodeedings of Interspeech 2023
Page Range / eLocation ID:
4608 to 4612
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Design features of American Sign Language (ASL)-English bilingual storybook apps on the tablet computers, based on learning research, are intended to facilitate independent and interactive learning of English print literacy and of ASL skill among young learners. In 2013, the Science of Learning Center on Visual Language and Visual Learning introduced the first in a series of storybook apps for the iPad based on literacy and reading research. The current study, employing a sample of signing deaf children examined children’s self-motivated engagement with the various design features presented in the earliest of the apps, The Baobab, and analyzed the relationships of engagement with ASL skill and age of first exposure to ASL, ASL narrative ability, and grade-appropriate English reading ability. Results indicated a robust level of engagement with the app, and a relationship between app pages specifically targeting reading and early exposure and skill levels in ASL. No evidence of relationships between narrative and vocabulary skills and app reading engagement was found. Topics for future research, and strategies for app improvement are discussed.

    more » « less
  2. Abstract Many studies have claimed to find that reading fiction leads to improvements in social cognition. But this work has left open the critical question of whether any type of narrative, fictional or nonfictional, might have similar effects. To address this question, as well as to test whether framing a narrative as fiction matters, the current studies presented participants ( N  = 268 in Study 1; N  = 362 in Study 2) with literary fiction texts, narrative nonfiction texts, expository nonfiction texts, or no texts. We tested their theory-of-mind abilities using the picture-based Reading the Mind in the Eyes task and a text-based test of higher-order social cognition. Reading anything was associated with higher scores compared to reading nothing, but the effects of framing and text type were inconsistent. These results suggest that prior claims regarding positive effects of reading fiction on mentalizing should be seen as tenuous; other mechanisms may be driving previously published effects. 
    more » « less
  3. Literacy assessment is essential for effective literacy instruction and training. However, traditional paper-based literacy assessments are typically decontextualized and may cause stress and anxiety for test takers. In contrast, serious games and game environments allow for the assessment of literacy in more authentic and engaging ways, which has some potential to increase the assessment’s validity and reliability. The primary objective of this study is to examine the feasibility of a novel approach for stealthily assessing literacy skills using games in an intelligent tutoring system (ITS) designed for reading comprehension strategy training. We investigated the degree to which learners’ game performance and enjoyment predicted their scores on standardized reading tests. Amazon Mechanical Turk participants (n = 211) played three games in iSTART and self-reported their level of game enjoyment after each game. Participants also completed the Gates–MacGinitie Reading Test (GMRT), which includes vocabulary knowledge and reading comprehension measures. The results indicated that participants’ performance in each game as well as the combined performance across all three games predicted their literacy skills. However, the relations between game enjoyment and literacy skills varied across games. These findings suggest the potential of leveraging serious games to assess students’ literacy skills and improve the adaptivity of game-based learning environments. 
    more » « less
  4. Question answering (QA) is a fundamental means to facilitate assessment and training of narrative comprehension skills for both machines and young children, yet there is scarcity of high-quality QA datasets carefully designed to serve this purpose. In particular, existing datasets rarely distinguish fine-grained reading skills, such as the understanding of varying narrative elements. Drawing on the reading education research, we introduce FairytaleQA, a dataset focusing on narrative comprehension of kindergarten to eighth-grade students. Generated by educational experts based on an evidence-based theoretical framework, FairytaleQA consists of 10,580 explicit and implicit questions derived from 278 children-friendly stories, covering seven types of narrative elements or relations. Our dataset is valuable in two folds: First, we ran existing QA models on our dataset and confirmed that this annotation helps assess models’ fine-grained learning skills. Second, the dataset supports question generation (QG) task in the education domain. Through benchmarking with QG models, we show that the QG model trained on FairytaleQA is capable of asking high-quality and more diverse questions. 
    more » « less
  5. We introduce a novel framework for delexicalized dependency parsing in a new language. We show that useful features of the target language can be extracted automatically from an unparsed corpus, which consists only of gold part-of-speech (POS) sequences. Providing these features to our neural parser enables it to parse sequences like those in the corpus. Strikingly, our system has no supervision in the target language. Rather, it is a multilingual system that is trained end-to-end on a variety of other languages, so it learns a feature extractor that works well. We show experimentally across multiple languages: (1) Features computed from the unparsed corpus improve parsing accuracy. (2) Including thousands of synthetic languages in the training yields further improvement. (3) Despite being computed from unparsed corpora, our learned task-specific features beat previous work’s interpretable typological features that require parsed corpora or expert categorization of the language. Our best method improved attachment scores on held-out test languages by an average of 5.6 percentage points over past work that does not inspect the unparsed data (McDonald et al., 2011), and by 20.7 points over past “grammar induction” work that does not use training languages (Naseem et al., 2010). 
    more » « less