NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters},

Xiao, Teng; Yuan, Yige; Chen, Zhengyu; Li, Mingxiao; Liang, Shangsong; Ren, Zhaochun; Honavar, Vasant G (July 2025, Proceedings of the International Conference on Learning Representations (ICLR 2025))

Existing preference optimization objectives for language model alignment require additional hyperparameters that must be extensively tuned to achieve optimal performance, increasing both the complexity and time required for fine-tuning large language models. In this paper, we propose a simple yet effective hyperparameter-free preference optimization algorithm for alignment. We observe that promising performance can be achieved simply by optimizing inverse perplexity, which is calculated as the inverse of the exponentiated average log-likelihood of the chosen and rejected responses in the preference dataset. The resulting simple learning objective, SimPER, is easy to implement and eliminates the need for expensive hyperparameter tuning and a reference model, making it both computationally and memory efficient. Extensive experiments on widely used real-world benchmarks, including MT-Bench, AlpacaEval 2, and 10 key benchmarks of the Open LLM Leaderboard with 5 base models, demonstrate that SimPER consistently and significantly outperforms existing approaches—even without any hyperparameters or a reference model. For example, despite its simplicity, SimPER outperforms state-of-the-art methods by up to 5.7 points on AlpacaEval 2 and achieves the highest average ranking across 10 benchmarks on the Open LLM Leaderboard. The source code for SimPER is publicly available at: https://github.com/tengxiao1/SimPER.
more » « less
Free, publicly-accessible full text available July 28, 2026
On a Connection Between Imitation Learning and RLHF

Xiao, Teng; Yuan, Yige; Li, Mingxiao; Chen, Zhengyu; Honavar, Vasant G (April 2025, International Conference on Representation Learning 2025 (ICLR 2025))

This work studies the alignment of large language models with preference data from an imitation learning perspective. We establish a close theoretical connection between reinforcement learning from human feedback (RLHF) and imitation learning (IL), revealing that RLHF implicitly performs imitation learning on the preference data distribution. Building on this connection, we propose DIL, a principled framework that directly optimizes the imitation learning objective. DIL provides a unified imitation learning perspective on alignment, encompassing existing alignment algorithms as special cases while naturally introducing new variants. By bridging IL and RLHF, DIL offers new insights into alignment with RLHF. Extensive experiments demonstrate that DIL outperforms existing methods on various challenging benchmarks. The code for DIL is available at https://github.com/tengxiao1/DIL.
more » « less
Free, publicly-accessible full text available April 28, 2026
Reconsidering Learning Objectives in Unbiased Recommendation: A Distribution Shift Perspective

https://doi.org/10.1145/3580305.3599487

Xiao, Teng; Chen, Zhengyu; Wang, Suhang (August 2023, In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2023))

Full Text Available
Representation Matters When Learning From Biased Feedback in Recommendation

https://doi.org/10.1145/3511808.3557431

Xiao, Teng; Chen Zhengyu; Wang, Suhang (October 2022, In Proceedings of the 31st ACM International Conference on Information and Knowledge Management (CIKM '22))

Full Text Available
Decoupled Self-supervised Learning for Graphs

Xiao, Teng; Chen, Zhengyu; Guo, Zhimeng; Zhuang, Zeyang; Wang, Suhang (December 2022, In Proceedings of Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022))

Full Text Available
Learning How to Propagate Messages in Graph Neural Networks

https://doi.org/10.1145/3447548.3467451

Xiao, Teng; Chen, Zhengyu; Wang, Donglin; Wang, Suhang (August 2021, 2021 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining)
null (Ed.)
Full Text Available
High-Throughput Dynamic Time Warping Accelerator for Time-Series Classification With Pipelined Mixed-Signal Time-Domain Computing

https://doi.org/10.1109/JSSC.2020.3021066

Chen, Zhengyu; Gu, Jie (February 2021, IEEE Journal of Solid-State Circuits)
null (Ed.)
Full Text Available
15.3 A 65nm 3T Dynamic Analog RAM-Based Computing-in-Memory Macro and CNN Accelerator with Retention Enhancement, Adaptive Analog Sparsity and 44TOPS/W System Energy Efficiency

https://doi.org/10.1109/ISSCC42613.2021.9366045

Chen, Zhengyu; Chen, Xi; Gu, Jie (February 2021, International Solid-State Circuit Conference)

Full Text Available
A Mixed-Signal Time-Domain Generative Adversarial Network Accelerator with Efficient Subthreshold Time Multiplier and Mixed-Signal On-Chip Training for Low Power Edge Devices

https://doi.org/10.1109/VLSICircuits18222.2020.9162829

Chen, Zhengyu; Fu, Sihua; Cao, Qiankai; Gu, Jie (June 2020, Symposium on VLSI Circuits)
null (Ed.)
Full Text Available
A Mixed-signal Time-Domain Generative Adversarial Network Accelerator with Efficient Subthreshold Time Multiplier and Mixed-signal On-chip Training for Low Power Edge Devices

Chen, Zhengyu; Fu, Sihua; Cao, Qiankai; Gu, Jie (June 2020, Symposium on VLSI Circuits)

Full Text Available

« Prev Next »

Search for: All records