NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

LLMs Can Reason Faster Only If We Let Them

Sel, Bilgehan; Huang, Lifu; Ramakrishnan, Naren; Jia, Ruoxi; Jin, Ming (June 2025, International Conference on Machine Learning (ICML))

Free, publicly-accessible full text available June 18, 2026
Model Residuals as Shields: A Two-Level Formulation to Defend Smart Grids From Poisoning Attacks

https://doi.org/10.1109/JIOT.2025.3575005

Lin, Tung-Wei; Roy, Padmaksha; Zeng, Yi; Jin, Ming; Jia, Ruoxi; Liu, Chen-Ching; Sangiovanni-Vincentelli, Alberto (August 2025, IEEE Internet of Things Journal)

Free, publicly-accessible full text available August 1, 2026
Just Enough Shifts: Mitigating Over-Refusal in Aligned Language Models with Targeted Representation Fine-Tuning

Dabas, Mahavir; Chen, Si; Fleming, Charles; Jin, Ming; Jia, Ruoxi (May 2025, International Conference on Machine Learning (ICML))

Free, publicly-accessible full text available May 1, 2026
Rethinking Data Shapley for Data Selection Tasks: Misleads and Merits

Wang, Jiachen T; Yang, Tianji; Zou, James; Kwon, Yongchan; Jia, Ruoxi (February 2025, Proceedings of the 41st International Conference on Machine Learning (ICML 2024))

Free, publicly-accessible full text available February 3, 2026
FASTTRACK: Reliable Fact Tracing via Clustering and LLM-Powered Evidence Validation

Chen, Si; Kang, Feiyang; Yu, Ning; Jia, Ruoxi (December 2024, The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024) Findings)

Full Text Available
DiPT: Enhancing LLM Reasoning through Diversified Perspective-Taking

https://doi.org/10.18653/v1/2025.findings-naacl.356

Just, Hoang Anh; Dabas, Mahavir; Huang, Lifu; Jin, Ming; Jia, Ruoxi (January 2025, Association for Computational Linguistics)

Full Text Available
Data Valuation in the Absence of a Reliable Validation Set

Jahagirdar, Himanshu; Wang, Jiachen T; Jia, Ruoxi (October 2024, Transactions on Machine Learning Research)

Full Text Available
Mind Control through Causal Inference: Predicting Clean Images from Poisoned Data

Hu, Mengxuan; Guan, Zihan; Zeng, Yi; Guo, Junfeng; Zhou, Zhongliang; Zhang, Jielu; Jia, Ruoxi; Vullikanti, Anil; Li, Sheng (January 2025, International Conference on Learning Representations (ICLR))

Free, publicly-accessible full text available January 22, 2026
Mind Control through Causal Inference: Predicting Clean Images from Poisoned Data

Hu, Mengxuan; Guan, Zihan; Zeng, Yi; Guo, Junfeng; Zhou, Zhongliang; Zhang, Jielu; Jia, Ruoxi; Vullikanti, Anil Kumar; Li, Sheng (January 2025, International Conference on Learning Representations)

Anti-backdoor learning, aiming to train clean models directly from poisoned datasets, serves as an important defense method for backdoor attack. However, existing methods usually fail to recover backdoored samples to their original, correct labels and suffer from poor generalization to large pre-trained models due to its non end-to end training, making them unsuitable for protecting the increasingly prevalent large pre-trained models. To bridge the gap, we first revisit the anti-backdoor learning problem from a causal perspective. Our theoretical causal analysis reveals that incorporating both images and the associated attack indicators preserves the model's integrity. Building on the theoretical analysis, we introduce an end-to-end method, Mind Control through Causal Inference (MCCI), to train clean models directly from poisoned datasets. This approach leverages both the image and the attack indicator to train the model. Based on this training paradigm, the model’s perception of whether an input is clean or backdoored can be controlled. Typically, by introducing fake non-attack indicators, the model perceives all inputs as clean and makes correct predictions, even for poisoned samples. Extensive experiments demonstrate that our method achieves state-of-the-art performance, efficiently recovering the original correct predictions for poisoned samples and enhancing accuracy on clean samples.
more » « less
Free, publicly-accessible full text available January 22, 2026
BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models

https://doi.org/10.18653/v1/2024.emnlp-main.732

Zeng, Yi; Sun, Weiyu; Huynh, Tran; Song, Dawn; Li, Bo; Jia, Ruoxi (November 2024, Association for Computational Linguistics)

Full Text Available

« Prev Next »

Search for: All records