skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM to 12:00 PM ET on Tuesday, March 25 due to maintenance. We apologize for the inconvenience.


Title: Evaluating Factuality in Text Simplification
Automated simplification models aim to make input texts more readable. Such methods have the potential to make complex information accessible to a wider audience, e.g., providing access to recent medical literature which might otherwise be impenetrable for a lay reader. However, such models risk introducing errors into automatically simplified texts, for instance by inserting statements unsupported by the corresponding original text, or by omitting key information. Providing more readable but inaccurate versions of texts may in many cases be worse than providing no such access at all. The problem of factual accuracy (and the lack thereof) has received heightened attention in the context of summarization models, but the factuality of automatically simplified texts has not been investigated. We introduce a taxonomy of errors that we use to analyze both references drawn from standard simplification datasets and state-of-the-art model outputs. We find that errors often appear in both that are not captured by existing evaluation metrics, motivating a need for research into ensuring the factual accuracy of automated simplification models.  more » « less
Award ID(s):
1850153 2107524
PAR ID:
10350187
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics
Page Range / eLocation ID:
7331 to 7345
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Automated text simplification aims to produce simple versions of complex texts. This task is especially useful in the medical domain, where the latest medical findings are typically communicated via complex and technical articles. This creates barriers for laypeople seeking access to up-to-date medical findings, consequently impeding progress on health literacy. Most existing work on medical text simplification has focused on monolingual settings, with the result that such evidence would be available only in just one language (most often, English). This work addresses this limitation via multilingual simplification, i.e., directly simplifying complex texts into simplified texts in multiple languages. We introduce MultiCochrane, the first sentence-aligned multilingual text simplification dataset for the medical domain in four languages: English, Spanish, French, and Farsi. We evaluate fine-tuned and zero-shot models across these languages with extensive human assessments and analyses. Although models can generate viable simplified texts, we identify several outstanding challenges that this dataset might be used to address. 
    more » « less
  2. Automated text simplification, a technique useful for making text more accessible to people such as children and emergent bilinguals, is often thought of as a monolingual translation task from complex sentences to simplified sentences using encoder-decoder models. This view fails to account for elaborative simplification, where new information is added into the simplified text. This paper proposes to view elaborative simplification through the lens of the Question Under Discussion (QUD) framework, providing a robust way to investigate what writers elaborate upon, how they elaborate, and how elaborations fit into the discourse context by viewing elaborations as explicit answers to implicit questions. We introduce ELABQUD, consisting of 1.3K elaborations accompanied with implicit QUDs, to study these phenomena. We show that explicitly modeling QUD (via question generation) not only provides essential understanding of elaborative simplification and how the elaborations connect with the rest of the discourse, but also substantially improves the quality of elaboration generation. 
    more » « less
  3. Research has revealed benefits and interest among Deaf and Hard-of-Hearing (DHH) adults in reading-assistance tools powered by Automatic Text Simplification (ATS), a technology whose development benefits from evaluations by specific user groups. While prior work has provided guidance for evaluating text complexity among DHH adults, researchers lack guidance for evaluating the fluency of automatically simplified texts, which may contain errors from the simplification process. Thus, we conduct methodological research on the effectiveness of metrics (including reading speed; comprehension questions; and subjective judgements of understandability, readability, grammaticality, and system performance) for evaluating texts controlled to be at different levels of fluency, when measured among DHH participants at different literacy levels. Reading speed and grammaticality judgements effectively distinguished fluency levels among participants across literacy levels. Readability and understandability judgements, however, only worked among participants with higher literacy. Our findings provide methodological guidance for designing ATS evaluations with DHH participants. 
    more » « less
  4. What can eye movements reveal about reading, a complex skill ubiquitous in everyday life? Research suggests that gaze can measure short-term comprehension for facts, but it is unknown whether it can measure long-term, deep comprehension. We tracked gaze while 147 participants read long, connected, in-formative texts and completed assessments of rote (factual) and inference (connecting ideas) comprehension while reading a text, after reading a text, after reading five texts, and after a seven-day delay. Gaze-based student-independent computa-tional models predicted both immediate and long-term rote and inference comprehension with moderate accuracies. Surprising-ly, the models were most accurate for comprehension assessed after reading all texts and predicted comprehension even after a week-long delay. This shows that eye movements can provide a lens into the cognitive processes underlying reading compre-hension, including inference formation, and the consolidation of information into long-term memory, which has implications for intelligent student interfaces that can automatically detect and repair comprehension in real-time. 
    more » « less
  5. Access to higher education is critical for minority populations and emergent bilingual students. However, the language used by higher education institutions to communicate with prospective students is often too complex; concretely, many institutions in the US publish admissions application instructions far above the average reading level of a typical high school graduate, often near the 13th or 14th grade level. This leads to an unnecessary barrier between students and access to higher education. This work aims to tackle this challenge via text simplification. We present PSAT (Professionally Simplified Admissions Texts), a dataset with 112 admissions instructions randomly selected from higher education institutions across the US. These texts are then professionally simplified, and verified and accepted by subject-matter experts who are full-time employees in admissions offices at various institutions. Additionally, PSAT comes with manual alignments of 1,883 original-simplified sentence pairs. The result is a first-of-its-kind corpus for the evaluation and fine-tuning of text simplification systems in a high-stakes genre distinct from existing simplification resources. 
    more » « less