NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PSBD: Prediction shift uncertainty unlocks backdoor detection

Li, Wei; Chen, Pin-Yu; Liu, Sijia; Wang, Ren (July 2025, Proceedings of the Computer Vision and Pattern Recognition Conference)

Free, publicly-accessible full text available July 14, 2026
Revisiting Mode Connectivity in Neural Networks with Bezier Surface

Ren, Jie; Chen, Pin-Yu; Wang, Ren (April 2025, The Thirteenth International Conference on Learning Representations)

Understanding the loss landscapes of neural networks (NNs) is critical for optimizing model performance. Previous research has identified the phenomenon of mode connectivity on curves, where two well-trained NNs can be connected by a continuous path in parameter space where the path maintains nearly constant loss. In this work, we extend the concept of mode connectivity to explore connectivity on surfaces, significantly broadening its applicability and unlocking new opportunities. While initial attempts to connect models via linear surfaces in parameter space were unsuccessful, we propose a novel optimization technique that consistently discovers Bézier surfaces with low-loss and high-accuracy connecting multiple NNs in a nonlinear manner. We further demonstrate that even without optimization, mode connectivity exists in certain cases of Bézier surfaces, where the models are carefully selected and combined linearly. This approach provides a deeper and more comprehensive understanding of the loss landscape and offers a novel way to identify models with enhanced performance for model averaging and output ensembling. We demonstrate the effectiveness of our method on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets using VGG16, ResNet18, and ViT architectures.
more » « less
Free, publicly-accessible full text available April 24, 2026
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

Li, Hongkang; Lu, Songtao; Chen, Pin-Yu; Cui, Xiaodong; Wang, Meng (April 2025, The Thirteenth International Conference on Learning Representations (ICLR))

Free, publicly-accessible full text available April 30, 2026
Modular Prompt Learning Improves Vision-Language Models

https://doi.org/10.1109/ICASSP49660.2025.10889690

Huang, Zhenhan; Pedapati, Tejaswini; Chen, Pin-Yu; Gao, Jianxi (April 2025, IEEE)

Free, publicly-accessible full text available April 6, 2026
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers

Li, Hongkang; Zhang, Yihua; Zhang, Shuai; Chen, Pin-Yu; Liu, Sijia; Wang, Meng (April 2025, The Thirteenth International Conference on Learning Representations (ICLR))

Free, publicly-accessible full text available April 30, 2026
CENSOR: Defense Against Gradient Inversion via Orthogonal Subspace Bayesian Sampling

https://doi.org/10.14722/ndss.2025.230915

Zhang, Kaiyuan; Cheng, Siyuan; Shen, Guangyu; Ribeiro, Bruno; An, Shengwei; Chen, Pin-Yu; Zhang, Xiangyu; Li, Ninghui (February 2025, Internet Society)

Federated learning collaboratively trains a neural network on a global server, where each local client receives the current global model weights and sends back parameter updates (gradients) based on its local private data. The process of sending these model updates may leak client’s private data information. Existing gradient inversion attacks can exploit this vulnerability to recover private training instances from a client’s gradient vectors. Recently, researchers have proposed advanced gradient inversion techniques that existing defenses struggle to handle effectively. In this work, we present a novel defense tailored for large neural network models. Our defense capitalizes on the high dimensionality of the model parameters to perturb gradients within a subspace orthogonal to the original gradient. By leveraging cold posteriors over orthogonal subspaces, our defense implements a refined gradient update mechanism. This enables the selection of an optimal gradient that not only safeguards against gradient inversion attacks but also maintains model utility. We conduct comprehensive experiments across three different datasets and evaluate our defense against various state-of-the-art attacks and defenses. Code is available at https://censor-gradient.github.io.
more » « less
Free, publicly-accessible full text available February 24, 2026
SepsisLab: Early Sepsis Prediction with Uncertainty Quantification and Active Sensing

https://doi.org/10.1145/3637528.3671586

Yin, Changchang; Chen, Pin-Yu; Yao, Bingsheng; Wang, Dakuo; Caterino, Jeffrey; Zhang, Ping (August 2024, ACM)

Full Text Available
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Qi, Xiangyu; Zeng, Yi; Xie, Tinghao; Chen, Pin-Yu; Jia, Ruoxi; Mittal, Prateek; Henderson, Peter (May 2024, ICLR)

Full Text Available
Network properties determine neural network performance

https://doi.org/10.1038/s41467-024-48069-8

Jiang, Chunheng; Huang, Zhenhan; Pedapati, Tejaswini; Chen, Pin-Yu; Sun, Yizhou; Gao, Jianxi (July 2024, Nature Communications)

Abstract Machine learning influences numerous aspects of modern society, empowers new technologies, from Alphago to ChatGPT, and increasingly materializes in consumer products such as smartphones and self-driving cars. Despite the vital role and broad applications of artificial neural networks, we lack systematic approaches, such as network science, to understand their underlying mechanism. The difficulty is rooted in many possible model configurations, each with different hyper-parameters and weighted architectures determined by noisy data. We bridge the gap by developing a mathematical framework that maps the neural network’s performance to the network characters of the line graph governed by the edge dynamics of stochastic gradient descent differential equations. This framework enables us to derive a neural capacitance metric to universally capture a model’s generalization capability on a downstream task and predict model performance using only early training results. The numerical results on 17 pre-trained ImageNet models across five benchmark datasets and one NAS benchmark indicate that our neural capacitance metric is a powerful indicator for model selection based only on early training results and is more efficient than state-of-the-art methods.
more » « less
MulBERRY: Enabling Bit-Error Robustness for Energy-Efficient Multi-Agent Autonomous Systems

https://doi.org/10.1145/3620665.3640420

Wan, Zishen; Chandramoorthy, Nandhini; Swaminathan, Karthik; Chen, Pin-Yu; Bhardwaj, Kshitij; Reddi, Vijay Janapa; Raychowdhury, Arijit (April 2024, Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems)

Full Text Available

« Prev Next »

Search for: All records