NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Text Grafting: Near-Distribution Weak Supervision for Minority Classes in Text Classification

https://doi.org/10.18653/v1/2024.emnlp-main.219

Peng, Letian; Gu, Yi; Dong, Chengyu; Wang, Zihan; Shang, Jingbo (November 2024, Association for Computational Linguistics)

Full Text Available
Evaluating the Smooth Control of Attribute Intensity in Text Generation with LLMs

Zhou, Shang; Yao, Feng; Dong, Chengyu; Wang, Zihan; Shang, Jingbo (August 2024, Findings of the Association for Computational Linguistics: ACL 2024)
Evaluating the Smooth Control of Attribute Intensity in Text Generation with LLMs

https://doi.org/10.18653/v1/2024.findings-acl.258

Zhou, Shang; Yao, Feng; Dong, Chengyu; Wang, Zihan; Shang, Jingbo (January 2024, Association for Computational Linguistics)

Full Text Available
Debiasing Made State-of-the-art: Revisiting the Simple Seed-based Weak Supervision for Text Classification

https://doi.org/10.18653/v1/2023.emnlp-main.32

Dong, Chengyu; Wang, Zihan; Shang, Jingbo (July 2023, Association for Computational Linguistics)
SELFOOD: Self-Supervised Out-Of-Distribution Detection via Learning to Rank

https://doi.org/10.18653/v1/2023.findings-emnlp.719

Mekala, Dheeraj; Samavedhi, Adithya; Dong, Chengyu; Shang, Jingbo (July 2023, Association for Computational Linguistics)
Label Noise in Adversarial Training: A Novel Perspective to Study Robust Overfitting

Dong, Chengyu; Liu, Liyuan; Shang, Jingbo (January 2022, Advances in neural information processing systems)

We show that label noise exists in adversarial training. Such label noise is due to the mismatch between the true label distribution of adversarial examples and the label inherited from clean examples – the true label distribution is distorted by the adversarial perturbation, but is neglected by the common practice that inherits labels from clean examples. Recognizing label noise sheds insights on the prevalence of robust overfitting in adversarial training, and explains its intriguing dependence on perturbation radius and data quality. Also, our label noise perspective aligns well with our observations of the epoch-wise double descent in adversarial training. Guided by our analyses, we proposed a method to automatically calibrate the label to address the label noise and robust overfitting. Our method achieves consistent performance improvements across various models and datasets without introducing new hyper-parameters or additional tuning.
more » « less
Full Text Available
LOPS: Learning Order Inspired Pseudo-Label Selection for Weakly Supervised Text Classification

Mekala, Dheeraj; Dong, Chengyu; Shang, Jingbo (January 2022, Findings of the Association for Computational Linguistics: EMNLP 2022)

Weakly supervised text classification methods typically train a deep neural classifier based on pseudo-labels. The quality of pseudo-labels is crucial to final performance but they are inevitably noisy due to their heuristic nature, so selecting the correct ones has a huge potential for performance boost. One straightforward solution is to select samples based on the softmax probability scores in the neural classifier corresponding to their pseudo-labels. However, we show through our experiments that such solutions are ineffective and unstable due to the erroneously high-confidence predictions from poorly calibrated models. Recent studies on the memorization effects of deep neural models suggest that these models first memorize training samples with clean labels and then those with noisy labels. Inspired by this observation, we propose a novel pseudo-label selection method LOPS that takes learning order of samples into consideration. We hypothesize that the learning order reflects the probability of wrong annotation in terms of ranking, and therefore, propose to select the samples that are learnt earlier. LOPS can be viewed as a strong performance-boost plug-in to most existing weakly-supervised text classification methods, as confirmed in extensive experiments on four real-world datasets.
more » « less
Full Text Available

Search for: All records