NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification

Trienes, Jan; Joseph, Sebastian; Scholotterer, Jorg; Seifert, Christin; Lo, Kyle; Xu, Wei; Wallace, Byron C; Li, Jessy (August 2024, Association for Computational Linguistics)

Full Text Available
Multilingual Simplification of Medical Texts

https://doi.org/10.18653/v1/2023.emnlp-main.1037

Joseph, Sebastian; Kazanas, Kathryn; Reina, Keziah; Ramanathan, Vishnesh; Xu, Wei; Wallace, Byron; Li, Junyi (December 2023, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing)

Automated text simplification aims to produce simple versions of complex texts. This task is especially useful in the medical domain, where the latest medical findings are typically communicated via complex and technical articles. This creates barriers for laypeople seeking access to up-to-date medical findings, consequently impeding progress on health literacy. Most existing work on medical text simplification has focused on monolingual settings, with the result that such evidence would be available only in just one language (most often, English). This work addresses this limitation via multilingual simplification, i.e., directly simplifying complex texts into simplified texts in multiple languages. We introduce MultiCochrane, the first sentence-aligned multilingual text simplification dataset for the medical domain in four languages: English, Spanish, French, and Farsi. We evaluate fine-tuned and zero-shot models across these languages with extensive human assessments and analyses. Although models can generate viable simplified texts, we identify several outstanding challenges that this dataset might be used to address.
more » « less
Full Text Available
InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification

https://doi.org/10.18653/v1/2024.acl-long.234

Trienes, Jan; Joseph, Sebastian; Schlötterer, Jörg; Seifert, Christin; Lo, Kyle; Xu, Wei; Wallace, Byron; Li, Junyi Jessy (January 2024, Association for Computational Linguistics)

Full Text Available
FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence

https://doi.org/10.18653/v1/2024.acl-long.459

Joseph, Sebastian; Chen, Lily; Trienes, Jan; Göke, Hannah; Coers, Monika; Xu, Wei; Wallace, Byron; Li, Junyi Jessy (January 2024, Association for Computational Linguistics)

Full Text Available
Summarizing, Simplifying, and Synthesizing Medical Evidence using GPT-3 (with Varying Success)

Shaib, Chantal; Li, Millicent; Joseph, Sebastian; Marshall, Iain; Li, Junyi Jessy; Wallace, Byron (July 2023, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers))

Large language models, particularly GPT-3, are able to produce high quality summaries of general domain news articles in few- and zero-shot settings. However, it is unclear if such models are similarly capable in more specialized domains such as biomedicine. In this paper we enlist domain experts (individuals with medical training) to evaluate summaries of biomedical articles generated by GPT-3, given no supervision. We consider both single- and multi-document settings. In the former, GPT-3 is tasked with generating regular and plain-language summaries of articles describing randomized controlled trials; in the latter, we assess the degree to which GPT-3 is able to synthesize evidence reported across a collection of articles. We design an annotation scheme for evaluating model outputs, with an emphasis on assessing the factual accuracy of generated summaries. We find that while GPT-3 is able to summarize and simplify single biomedical articles faithfully, it struggles to provide accurate aggregations of findings over multiple documents. We release all data, code, and annotations used in this work.
more » « less
Full Text Available

Search for: All records