Automated text simplification aims to produce simple versions of complex texts. This task is especially useful in the medical domain, where the latest medical findings are typically communicated via complex and technical articles. This creates barriers for laypeople seeking access to up-to-date medical findings, consequently impeding progress on health literacy. Most existing work on medical text simplification has focused on monolingual settings, with the result that such evidence would be available only in just one language (most often, English). This work addresses this limitation via multilingual simplification, i.e., directly simplifying complex texts into simplified texts in multiple languages. We introduce MultiCochrane, the first sentence-aligned multilingual text simplification dataset for the medical domain in four languages: English, Spanish, French, and Farsi. We evaluate fine-tuned and zero-shot models across these languages with extensive human assessments and analyses. Although models can generate viable simplified texts, we identify several outstanding challenges that this dataset might be used to address.
more »
« less
Text Simplification of College Admissions Instructions: A Professionally Simplified and Verified Corpus
Access to higher education is critical for minority populations and emergent bilingual students. However, the language used by higher education institutions to communicate with prospective students is often too complex; concretely, many institutions in the US publish admissions application instructions far above the average reading level of a typical high school graduate, often near the 13th or 14th grade level. This leads to an unnecessary barrier between students and access to higher education. This work aims to tackle this challenge via text simplification. We present PSAT (Professionally Simplified Admissions Texts), a dataset with 112 admissions instructions randomly selected from higher education institutions across the US. These texts are then professionally simplified, and verified and accepted by subject-matter experts who are full-time employees in admissions offices at various institutions. Additionally, PSAT comes with manual alignments of 1,883 original-simplified sentence pairs. The result is a first-of-its-kind corpus for the evaluation and fine-tuning of text simplification systems in a high-stakes genre distinct from existing simplification resources.
more »
« less
- PAR ID:
- 10432260
- Date Published:
- Journal Name:
- Proceedings of the 29th International Conference on Computational Linguistics
- Page Range / eLocation ID:
- 6505–6515
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Automated simplification models aim to make input texts more readable. Such methods have the potential to make complex information accessible to a wider audience, e.g., providing access to recent medical literature which might otherwise be impenetrable for a lay reader. However, such models risk introducing errors into automatically simplified texts, for instance by inserting statements unsupported by the corresponding original text, or by omitting key information. Providing more readable but inaccurate versions of texts may in many cases be worse than providing no such access at all. The problem of factual accuracy (and the lack thereof) has received heightened attention in the context of summarization models, but the factuality of automatically simplified texts has not been investigated. We introduce a taxonomy of errors that we use to analyze both references drawn from standard simplification datasets and state-of-the-art model outputs. We find that errors often appear in both that are not captured by existing evaluation metrics, motivating a need for research into ensuring the factual accuracy of automated simplification models.more » « less
-
Research has revealed benefits and interest among Deaf and Hard-of-Hearing (DHH) adults in reading-assistance tools powered by Automatic Text Simplification (ATS), a technology whose development benefits from evaluations by specific user groups. While prior work has provided guidance for evaluating text complexity among DHH adults, researchers lack guidance for evaluating the fluency of automatically simplified texts, which may contain errors from the simplification process. Thus, we conduct methodological research on the effectiveness of metrics (including reading speed; comprehension questions; and subjective judgements of understandability, readability, grammaticality, and system performance) for evaluating texts controlled to be at different levels of fluency, when measured among DHH participants at different literacy levels. Reading speed and grammaticality judgements effectively distinguished fluency levels among participants across literacy levels. Readability and understandability judgements, however, only worked among participants with higher literacy. Our findings provide methodological guidance for designing ATS evaluations with DHH participants.more » « less
-
Many undergraduate neuroscience trainees aspire to earn a PhD. In recent years the number, demographics, and previous experiences of PhD applicants in neuroscience has changed. This has necessitated both a reconsideration of admissions processes to ensure equity for an increasingly diverse applicant pool as well as renewed efforts to expand access to the training and research experiences required for admission to graduate programs. Here, we describe both facets of graduate school admissions by demystifying the process and providing faculty with tools and resources to help undergraduate students successfully navigate it. We discuss admissions requirements and processes at two graduate institutions, highlighting holistic approaches to evaluating students, the ever-increasing research experience expectations, and the decreasing reliance on the GRE. With a particular focus on improving equity, diversity, inclusion and belonging, we discuss resources for applying to graduate school that are available for students from underrepresented populations, including summer institutes and fellowship programs and intentional relationships with minority serving institutions (MSIs) to foster bi-directional engagement between undergraduate programs at MSIs and graduate institutions. With diverse perspectives as faculty involved in undergraduate education, graduate programs, and post-baccalaureate training programs, we provide recommendations and resources for how to help all trainees — especially those from populations underrepresented in the STEM workforce — succeed in the current graduate education admissions landscape.more » « less
-
Abstract Automatic text simplification (TS) aims to automate the process of rewriting text to make it easier for people to read. A pre-requisite for TS to be useful is that it should convey information that is consistent with the meaning of the original text. However, current TS evaluation protocols assess system outputs for simplicity and meaning preservation without regard for the document context in which output sentences occur and for how people understand them. In this work, we introduce a human evaluation framework to assess whether simplified texts preserve meaning using reading comprehension questions. With this framework, we conduct a thorough human evaluation of texts by humans and by nine automatic systems. Supervised systems that leverage pre-training knowledge achieve the highest scores on the reading comprehension tasks among the automatic controllable TS systems. However, even the best-performing supervised system struggles with at least 14% of the questions, marking them as “unanswerable” based on simplified content. We further investigate how existing TS evaluation metrics and automatic question-answering systems approximate the human judgments we obtained.more » « less
An official website of the United States government

