NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Offset Unlearning for Large Language Models

Huang, James Y; Zhou, Wenxuan Zhou; Wang, Fei Wang; Morstatter, Fred Morstatter; Zhang, Sheng; Poon, Hoifung Poon; Chen, Muhao (May 2025, Transactions on machine learning research)

Free, publicly-accessible full text available May 1, 2026
ThinkGuard: Deliberative Slow Thinking Leads to Cautious Guardrails

https://doi.org/10.18653/v1/2025.findings-acl.704

Wen, Xiaofei; Zhou, Wenxuan; Mo, Wenjie Jacky; Chen, Muhao (January 2025, Association for Computational Linguistics)

Full Text Available
UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition

Zhou, Wenxuan; Zhang, Sheng; Gu, Yu; Chen, Muhao; Poon, Hoifung (May 2024, International Conference on Learning Representations)

Full Text Available
Getting Sick After Seeing a Doctor? Diagnosing and Mitigating Knowledge Conflicts in Event Temporal Reasoning

https://doi.org/10.18653/v1/2024.findings-naacl.244

Fang, Tianqing; Wang, Zhaowei; Zhou, Wenxuan; Zhang, Hongming; Song, Yangqiu; Chen, Muhao (January 2024, Association for Computational Linguistics)

Full Text Available
Parameter-Efficient Tuning with Special Token Adaptation

Yang, Xiaocong Yang; Huang, James Y.; Zhou, Wenxuan; Chen, Muhao (May 2023, Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL))
Vlachos, Andreas; Augenstein, Isabelle (Ed.)
Parameter-efficient tuning aims at updating only a small subset of parameters when adapting a pretrained model to downstream tasks. In this work, we introduce PASTA, in which we only modify the special token representations (e.g., [SEP] and [CLS] in BERT) before the self-attention module at each layer in Transformer-based models. PASTA achieves comparable performance to fine-tuning in natural language understanding tasks including text classification and NER with up to only 0.029% of total parameters trained. Our work not only provides a simple yet effective way of parameter-efficient tuning, which has a wide range of practical applications when deploying finetuned models for multiple tasks, but also demonstrates the pivotal role of special tokens in pretrained language models.
more » « less
Full Text Available
An Improved Baseline for Sentence-level Relation Extraction

Zhou, Wenxuan; Chen, Muhao (November 2022, Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing)

Sentence-level relation extraction (RE) aims at identifying the relationship between two entities in a sentence. Many efforts have been devoted to this problem, while the best performing methods are still far from perfect. In this paper, we revisit two problems that affect the performance of existing RE models, namely entity representation and noisy or ill-defined labels. Our improved RE baseline, incorporated with entity representations with typed markers, achieves an F1 of 74.6% on TACRED, significantly outperforms previous SOTA methods. Furthermore, the presented new baseline achieves an F1 of 91.1% on the refined Re-TACRED dataset, demonstrating that the pretrained language models (PLMs) achieve high performance on this task. We release our code to the community for future research.
more » « less
Full Text Available
Context-faithful Prompting for Large Language Models

https://doi.org/10.18653/v1/2023.findings-emnlp.968

Zhou, Wenxuan; Zhang, Sheng; Poon, Hoifung; Chen, Muhao (January 2023, Findings of the Association for Computational Linguistics: EMNLP 2023)

Large language models (LLMs) encode parametric knowledge about world facts and have shown remarkable performance in knowledge-driven NLP tasks. However, their reliance on parametric knowledge may cause them to overlook contextual cues, leading to incorrect predictions in context-sensitive NLP tasks (e.g., knowledge acquisition tasks). In this paper, we seek to assess and enhance LLMs’ contextual faithfulness in two aspects: knowledge conflict and prediction with abstention. We demonstrate that LLMs’ faithfulness can be significantly improved using carefully designed prompting strategies. In particular, we identify opinion-based prompts and counterfactual demonstrations as the most effective methods. Opinion-based prompts reframe the context as a narrator’s statement and inquire about the narrator’s opinions, while counterfactual demonstrations use instances containing false facts to improve faithfulness in knowledge conflict situations. Neither technique requires additional training. We conduct experiments on three datasets of two standard NLP tasks, machine reading comprehension and relation extraction, and the results demonstrate significant improvement in faithfulness to contexts. Code and data are released at https://github.com/wzhouad/context-faithful-llm.
more » « less
GeoLM: Empowering Language Models for Geospatially Grounded Language Understanding

https://doi.org/10.18653/v1/2023.emnlp-main.317

Li, Zekun; Zhou, Wenxuan; Chiang, Yao-Yi; Chen, Muhao (January 2023, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing)

Humans subconsciously engage in geospatial reasoning when reading articles. We recognize place names and their spatial relations in text and mentally associate them with their physical locations on Earth. Although pretrained language models can mimic this cognitive process using linguistic context, they do not utilize valuable geospatial information in large, widely available geographical databases, e.g., OpenStreetMap. This paper introduces GeoLM, a geospatially grounded language model that enhances the understanding of geo-entities in natural language. GeoLM leverages geo-entity mentions as anchors to connect linguistic information in text corpora with geospatial information extracted from geographical databases. GeoLM connects the two types of context through contrastive learning and masked language modeling. It also incorporates a spatial coordinate embedding mechanism to encode distance and direction relations to capture geospatial context. In the experiment, we demonstrate that GeoLM exhibits promising capabilities in supporting toponym recognition, toponym linking, relation extraction, and geo-entity typing, which bridge the gap between natural language processing and geospatial sciences. The code is publicly available at https://github.com/knowledge-computing/geolm.
more » « less
A Causal View of Entity Bias in (Large) Language Models

https://doi.org/10.18653/v1/2023.findings-emnlp.1013

Wang, Fei; Mo, Wenjie; Wang, Yiwei; Zhou, Wenxuan; Chen, Muhao (January 2023, A Causal View of Entity Bias in (Large) Language Models)

Entity bias widely affects pretrained (large) language models, causing them to rely on (biased) parametric knowledge to make unfaithful predictions. Although causality-inspired methods have shown great potential to mitigate entity bias, it is hard to precisely estimate the parameters of underlying causal models in practice. The rise of black-box LLMs also makes the situation even worse, because of their inaccessible parameters and uncalibrated logits. To address these problems, we propose a specific structured causal model (SCM) whose parameters are comparatively easier to estimate. Building upon this SCM, we propose causal intervention techniques to mitigate entity bias for both white-box and black-box settings. The proposed causal intervention perturbs the original entity with neighboring entities. This intervention reduces specific biasing information pertaining to the original entity while still preserving sufficient semantic information from similar entities. Under the white-box setting, our training-time intervention improves OOD performance of PLMs on relation extraction (RE) and machine reading comprehension (MRC) by 5.7 points and by 9.1 points, respectively. Under the black-box setting, our in-context intervention effectively reduces the entity-based knowledge conflicts of GPT-3.5, achieving up to 20.5 points of improvement of exact match accuracy on MRC and up to 17.6 points of reduction in memorization ratio on RE.
more » « less
Robust Natural Language Understanding with Residual Attention Debiasing

https://doi.org/10.18653/v1/2023.findings-acl.32

Wang, Fei; Huang, James Y.; Yan, Tianyi; Zhou, Wenxuan; Chen, Muhao (January 2023, Findings of the Association for Computational Linguistics: ACL 2023)

Natural language understanding (NLU) models often suffer from unintended dataset biases. Among bias mitigation methods, ensemble-based debiasing methods, especially product-of-experts (PoE), have stood out for their impressive empirical success. However, previous ensemble-based debiasing methods typically apply debiasing on top-level logits without directly addressing biased attention patterns. Attention serves as the main media of feature interaction and aggregation in PLMs and plays a crucial role in providing robust prediction. In this paper, we propose REsidual Attention Debiasing (READ), an end-to-end debiasing method that mitigates unintended biases from attention. Experiments on three NLU benchmarks show that READ significantly improves the OOD performance of BERT-based models, including +12.9% accuracy on HANS, +11.0% accuracy on FEVER-Symmetric, and +2.7% F1 on PAWS. Detailed analyses demonstrate the crucial role of unbiased attention in robust NLU models and that READ effectively mitigates biases in attention.
more » « less
Full Text Available

« Prev Next »

Search for: All records