NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Offset Unlearning for Large Language Models

Huang, James Y; Zhou, Wenxuan Zhou; Wang, Fei Wang; Morstatter, Fred Morstatter; Zhang, Sheng; Poon, Hoifung Poon; Chen, Muhao (May 2025, Transactions on machine learning research)

Free, publicly-accessible full text available May 1, 2026
From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning

https://doi.org/10.18653/v1/2025.naacl-long.170

Xu, Nan; Wang, Fei; Zhang, Sheng; Poon, Hoifung; Chen, Muhao (January 2025, Association for Computational Linguistics)

Full Text Available
UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition

Zhou, Wenxuan; Zhang, Sheng; Gu, Yu; Chen, Muhao; Poon, Hoifung (May 2024, International Conference on Learning Representations)

Full Text Available
Context-faithful Prompting for Large Language Models

https://doi.org/10.18653/v1/2023.findings-emnlp.968

Zhou, Wenxuan; Zhang, Sheng; Poon, Hoifung; Chen, Muhao (January 2023, Findings of the Association for Computational Linguistics: EMNLP 2023)

Large language models (LLMs) encode parametric knowledge about world facts and have shown remarkable performance in knowledge-driven NLP tasks. However, their reliance on parametric knowledge may cause them to overlook contextual cues, leading to incorrect predictions in context-sensitive NLP tasks (e.g., knowledge acquisition tasks). In this paper, we seek to assess and enhance LLMs’ contextual faithfulness in two aspects: knowledge conflict and prediction with abstention. We demonstrate that LLMs’ faithfulness can be significantly improved using carefully designed prompting strategies. In particular, we identify opinion-based prompts and counterfactual demonstrations as the most effective methods. Opinion-based prompts reframe the context as a narrator’s statement and inquire about the narrator’s opinions, while counterfactual demonstrations use instances containing false facts to improve faithfulness in knowledge conflict situations. Neither technique requires additional training. We conduct experiments on three datasets of two standard NLP tasks, machine reading comprehension and relation extraction, and the results demonstrate significant improvement in faithfulness to contexts. Code and data are released at https://github.com/wzhouad/context-faithful-llm.
more » « less
EZLearn: Exploiting Organic Supervision in Automated Data Annotation

https://doi.org/https://doi.org/10.24963/ijcai.2018/568

Grechkin, Maxim; Poon, Hoifung; Howe, Bill (July 2018, Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence)

Many real-world applications require automated data annotation, such as identifying tissue origins based on gene expressions and classifying images into semantic categories. Annotation classes are often numerous and subject to changes over time, and annotating examples has become the major bottleneck for supervised learning methods. In science and other high-value domains, large repositories of data samples are often available, together with two sources of organic supervision: a lexicon for the annotation classes, and text descriptions that accompany some data samples. Distant supervision has emerged as a promising paradigm for exploiting such indirect supervision by automatically annotating examples where the text description contains a class mention in the lexicon. However, due to linguistic variations and ambiguities, such training data is inherently noisy, which limits the accuracy in this approach. In this paper, we introduce an auxiliary natural language processing system for the text modality, and incorporate co-training to reduce noise and augment signal in distant supervision. Without using any manually labeled data, our EZLearn system learned to accurately annotate data samples in functional genomics and scientific figure comprehension, substantially outperforming state-of-the-art supervised methods trained on tens of thousands of annotated examples.
more » « less
Full Text Available

Search for: All records