Research has revealed benefits and interest among Deaf and Hard-of-Hearing (DHH) adults in reading-assistance tools powered by Automatic Text Simplification (ATS), a technology whose development benefits from evaluations by specific user groups. While prior work has provided guidance for evaluating text complexity among DHH adults, researchers lack guidance for evaluating the fluency of automatically simplified texts, which may contain errors from the simplification process. Thus, we conduct methodological research on the effectiveness of metrics (including reading speed; comprehension questions; and subjective judgements of understandability, readability, grammaticality, and system performance) for evaluating texts controlled to be at different levels of fluency, when measured among DHH participants at different literacy levels. Reading speed and grammaticality judgements effectively distinguished fluency levels among participants across literacy levels. Readability and understandability judgements, however, only worked among participants with higher literacy. Our findings provide methodological guidance for designing ATS evaluations with DHH participants.
more »
« less
Do Text Simplification Systems Preserve Meaning? A Human Evaluation via Reading Comprehension
Abstract Automatic text simplification (TS) aims to automate the process of rewriting text to make it easier for people to read. A pre-requisite for TS to be useful is that it should convey information that is consistent with the meaning of the original text. However, current TS evaluation protocols assess system outputs for simplicity and meaning preservation without regard for the document context in which output sentences occur and for how people understand them. In this work, we introduce a human evaluation framework to assess whether simplified texts preserve meaning using reading comprehension questions. With this framework, we conduct a thorough human evaluation of texts by humans and by nine automatic systems. Supervised systems that leverage pre-training knowledge achieve the highest scores on the reading comprehension tasks among the automatic controllable TS systems. However, even the best-performing supervised system struggles with at least 14% of the questions, marking them as “unanswerable” based on simplified content. We further investigate how existing TS evaluation metrics and automatic question-answering systems approximate the human judgments we obtained.
more »
« less
- Award ID(s):
- 2147292
- PAR ID:
- 10609354
- Publisher / Repository:
- Association for Computational Linguistics
- Date Published:
- Journal Name:
- Transactions of the Association for Computational Linguistics
- Volume:
- 12
- Issue:
- 00
- ISSN:
- 2307-387X
- Page Range / eLocation ID:
- 432 to 448
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Research has explored using Automatic Text Simplification for reading assistance, with prior work identifying benefits and interests from Deaf and Hard-of-Hearing (DHH) adults. While the evaluation of these technologies remains a crucial aspect of research in the area, researchers lack guidance in terms of how to evaluate text complexity with DHH readers. Thus, in this work we conduct methodological research to evaluate metrics identified from prior work (including reading speed, comprehension questions, and subjective judgements of understandability and readability) in terms of their effectiveness for evaluating texts modified to be at various complexity levels with DHH adults at different literacy levels. Subjective metrics and low-linguistic-complexity comprehension questions distinguished certain text complexity levels with participants with lower literacy. Among participants with higher literacy, only subjective judgements of text readability distinguished certain text complexity levels. For all metrics, participants with higher literacy scored higher or provided more positive subjective judgements overall.more » « less
-
The recent explosion in question answering research produced a wealth of both factoid reading comprehension (RC) and commonsense reasoning datasets. Combining them presents a different kind of task: deciding not simply whether information is present in the text, but also whether a confident guess could be made for the missing information. We present QuAIL, the first RC dataset to combine text-based, world knowledge and unanswerable questions, and to provide question type annotation that would enable diagnostics of the reasoning strategies by a given QA system. QuAIL contains 15K multi-choice questions for 800 texts in 4 domains. Crucially, it offers both general and text-specific questions, unlikely to be found in pretraining data. We show that QuAIL poses substantial challenges to the current state-of-the-art systems, with a 30% drop in accuracy compared to the most similar existing dataset.more » « less
-
Abstract We know that reading involves coordination between textual characteristics and visual attention, but research linking eye movements during reading and comprehension assessed after reading is surprisingly limited, especially for reading long connected texts. We tested two competing possibilities: (a) the weak association hypothesis: Links between eye movements and comprehension are weak and short‐lived, versus (b) the strong association hypothesis: The two are robustly linked, even after a delay. Using a predictive modeling approach, we trained regression models to predict comprehension scores from global eye movement features, using participant‐level cross‐validation to ensure that the models generalize across participants. We used data from three studies in which readers (Ns = 104, 130, 147) answered multiple‐choice comprehension questions ~30 min after reading a 6,500‐word text, or after reading up to eight 1,000‐word texts. The models generated accurate predictions of participants' text comprehension scores (correlations between observed and predicted comprehension: 0.384, 0.362, 0.372,ps < .001), in line with the strong association hypothesis. We found that making more, but shorter fixations, consistently predicted comprehension across all studies. Furthermore, models trained on one study's data could successfully predict comprehension on the others, suggesting generalizability across studies. Collectively, these findings suggest that there is a robust link between eye movements and subsequent comprehension of a long connected text, thereby connecting theories of low‐level eye movements with those of higher order text processing during reading.more » « less
-
What can eye movements reveal about reading, a complex skill ubiquitous in everyday life? Research suggests that gaze can measure short-term comprehension for facts, but it is unknown whether it can measure long-term, deep comprehension. We tracked gaze while 147 participants read long, connected, in-formative texts and completed assessments of rote (factual) and inference (connecting ideas) comprehension while reading a text, after reading a text, after reading five texts, and after a seven-day delay. Gaze-based student-independent computa-tional models predicted both immediate and long-term rote and inference comprehension with moderate accuracies. Surprising-ly, the models were most accurate for comprehension assessed after reading all texts and predicted comprehension even after a week-long delay. This shows that eye movements can provide a lens into the cognitive processes underlying reading compre-hension, including inference formation, and the consolidation of information into long-term memory, which has implications for intelligent student interfaces that can automatically detect and repair comprehension in real-time.more » « less
An official website of the United States government

