NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Provably Efficient Algorithm for Best Scoring Rule Identification in Online Principal-Agent Information Acquisition

Wang, Ziechen; Li, Chuanhao; Wang, Huazheng (July 2025, The 42nd International Conference on Machine Learning (ICML 2025))

We investigate the problem of identifying the optimal scoring rule within the principal-agent framework for online information acquisition problem. We focus on the principal's perspective, seeking to determine the desired scoring rule through interactions with the agent. To address this challenge, we propose two algorithms: OIAFC and OIAFB, tailored for fixed confidence and fixed budget settings, respectively. Our theoretical analysis demonstrates that OIAFC can extract the desired $$(\epsilon,\delta)$$-scoring rule with an efficient instance-dependent sample complexity or an instance-independent sample complexity. Our analysis also shows that OIAFB matches the instance-independent performance bound of OIAFC, while both algorithms share the same complexity across fixed confidence and fixed budget settings.
more » « less
Free, publicly-accessible full text available July 13, 2026
Fair Online Influence Maximization

Wang, Xiangqi; Zhang, Shaokun; Aguilar_Escamilla, Jose E; Wu, Qingyun; Zhang, Xiangliang; Kang, Jian; Wang, Huazheng (June 2025, Transactions on machine learning research)

Fair influence maximization in networks has been actively studied to ensure equity in fields like viral marketing and public health. Existing studies often assume an offline setting, meaning that the learner identifies a set of seed nodes with known per-edge activation probabilities. In this paper, we study the problem of fair online influence maximization, i.e., without knowing the ground-truth activation probabilities. The learner in this problem aims to maximally propagate the information among demographic groups, while interactively selecting seed nodes and observing the activation feedback on the fly. We propose Fair Online Influence Maximization (FOIM) framework that can solve the online influence maximization problem under a wide range of fairness notions. Given a fairness notion, FOIM solves the problem with a combinatorial multi-armed bandit algorithm for balancing exploration-exploitation and an offline fair influence maximization oracle for seed nodes selection. FOIM enjoys sublinear regret when the fairness notion satisfies two mild conditions, i.e., monotonicity and bounded smoothness. Our analyses show that common fairness notions, including maximin fairness, diversity fairness, and welfare function, all satisfy the condition, and we prove the corresponding regret upper bounds under these notions. Extensive empirical evaluations on three real-world networks demonstrate the efficacy of our proposed framework.
more » « less
Free, publicly-accessible full text available June 28, 2026
A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement

Yuan, Hui; Zeng, Yifan; Wu, Yue; Wang, Huazheng; Wang, Mengdi; Liu, Leqi (April 2025, The Thirteenth International Conference on Learning Representations (ICLR 2025))

Reinforcement Learning from Human Feedback (RLHF) has become the predominant approach for language model (LM) alignment. At its core, RLHF uses a margin-based loss for preference optimization, specifying ideal LM behavior only by the difference between preferred and dispreferred responses. In this paper, we identify a common pitfall of margin-based methods -- the under-specification of ideal LM behavior on preferred and dispreferred responses individually, which leads to two unintended consequences as the margin increases: (1) The probability of dispreferred (e.g., unsafe) responses may increase, resulting in potential safety alignment failures. (2) The probability of preferred responses may decrease, even when those responses are ideal. We demystify the reasons behind these problematic behaviors: margin-based losses couple the change in the preferred probability to the gradient of the dispreferred one, and vice versa, often preventing the preferred probability from increasing while the dispreferred one decreases, and thus causing a synchronized increase or decrease in both probabilities. We term this effect, inherent in margin-based objectives, gradient entanglement. Formally, we derive conditions for general margin-based alignment objectives under which gradient entanglement becomes concerning: the inner product of the gradients of preferred and dispreferred log-probabilities is large relative to the individual gradient norms. We theoretically investigate why such inner products can be large when aligning language models and empirically validate our findings. Empirical implications of our framework extend to explaining important differences in the training dynamics of various preference optimization algorithms, and suggesting potential algorithm designs to mitigate the under-specification issue of margin-based methods and thereby improving language model alignment.
more » « less
Free, publicly-accessible full text available April 24, 2026
FCOM: A Federated Collaborative Online Monitoring Framework via Representation Learning

https://doi.org/10.1609/aaai.v39i17.33975

Kosolwattana, Tanapol; Wang, Huazheng; Al_Kontar, Raed; Lin, Ying (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

Monitoring a large population of dynamic processes with limited resources presents a significant challenge across various industrial sectors. This is due to 1) the inherent disparity between the available monitoring resources and the extensive number of processes to be monitored and 2) the unpredictable and heterogeneous dynamics inherent in the progression of these processes. Online learning approaches, commonly referred to as bandit methods, have demonstrated notable potential in addressing this issue by dynamically allocating resources and effectively balancing the exploitation of high-reward processes and the exploration of uncertain ones. However, most online learning algorithms are designed for 1) a centralized setting that requires data sharing across processes for accurate predictions or 2) a homogeneity assumption that estimates a single global model from decentralized data. To overcome these limitations and enable online learning in a heterogeneous population under a decentralized setting, we propose a federated collaborative online monitoring method. Our approach utilizes representation learning to capture the latent representative models within the population and introduces a novel federated collaborative UCB algorithm to estimate these models from sequentially observed decentralized data. This strategy facilitates informed monitoring of resource allocation. The efficacy of our method is demonstrated through theoretical analysis, simulation studies, and its application to decentralized cognitive degradation monitoring in Alzheimer’s disease.
more » « less
Free, publicly-accessible full text available April 11, 2026

Search for: All records