NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Provably Efficient Algorithm for Best Scoring Rule Identification in Online Principal-Agent Information Acquisition

Wang, Ziechen; Li, Chuanhao; Wang, Huazheng (July 2025, The 42nd International Conference on Machine Learning (ICML 2025))

We investigate the problem of identifying the optimal scoring rule within the principal-agent framework for online information acquisition problem. We focus on the principal's perspective, seeking to determine the desired scoring rule through interactions with the agent. To address this challenge, we propose two algorithms: OIAFC and OIAFB, tailored for fixed confidence and fixed budget settings, respectively. Our theoretical analysis demonstrates that OIAFC can extract the desired $$(\epsilon,\delta)$$-scoring rule with an efficient instance-dependent sample complexity or an instance-independent sample complexity. Our analysis also shows that OIAFB matches the instance-independent performance bound of OIAFC, while both algorithms share the same complexity across fixed confidence and fixed budget settings.
more » « less
Free, publicly-accessible full text available July 13, 2026
Fair Online Influence Maximization

Wang, Xiangqi; Zhang, Shaokun; Aguilar_Escamilla, Jose E; Wu, Qingyun; Zhang, Xiangliang; Kang, Jian; Wang, Huazheng (June 2025, Transactions on machine learning research)

Fair influence maximization in networks has been actively studied to ensure equity in fields like viral marketing and public health. Existing studies often assume an offline setting, meaning that the learner identifies a set of seed nodes with known per-edge activation probabilities. In this paper, we study the problem of fair online influence maximization, i.e., without knowing the ground-truth activation probabilities. The learner in this problem aims to maximally propagate the information among demographic groups, while interactively selecting seed nodes and observing the activation feedback on the fly. We propose Fair Online Influence Maximization (FOIM) framework that can solve the online influence maximization problem under a wide range of fairness notions. Given a fairness notion, FOIM solves the problem with a combinatorial multi-armed bandit algorithm for balancing exploration-exploitation and an offline fair influence maximization oracle for seed nodes selection. FOIM enjoys sublinear regret when the fairness notion satisfies two mild conditions, i.e., monotonicity and bounded smoothness. Our analyses show that common fairness notions, including maximin fairness, diversity fairness, and welfare function, all satisfy the condition, and we prove the corresponding regret upper bounds under these notions. Extensive empirical evaluations on three real-world networks demonstrate the efficacy of our proposed framework.
more » « less
Free, publicly-accessible full text available June 28, 2026
FCOM: A Federated Collaborative Online Monitoring Framework via Representation Learning

https://doi.org/10.1609/aaai.v39i17.33975

Kosolwattana, Tanapol; Wang, Huazheng; Al_Kontar, Raed; Lin, Ying (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

Monitoring a large population of dynamic processes with limited resources presents a significant challenge across various industrial sectors. This is due to 1) the inherent disparity between the available monitoring resources and the extensive number of processes to be monitored and 2) the unpredictable and heterogeneous dynamics inherent in the progression of these processes. Online learning approaches, commonly referred to as bandit methods, have demonstrated notable potential in addressing this issue by dynamically allocating resources and effectively balancing the exploitation of high-reward processes and the exploration of uncertain ones. However, most online learning algorithms are designed for 1) a centralized setting that requires data sharing across processes for accurate predictions or 2) a homogeneity assumption that estimates a single global model from decentralized data. To overcome these limitations and enable online learning in a heterogeneous population under a decentralized setting, we propose a federated collaborative online monitoring method. Our approach utilizes representation learning to capture the latent representative models within the population and introduces a novel federated collaborative UCB algorithm to estimate these models from sequentially observed decentralized data. This strategy facilitates informed monitoring of resource allocation. The efficacy of our method is demonstrated through theoretical analysis, simulation studies, and its application to decentralized cognitive degradation monitoring in Alzheimer’s disease.
more » « less
Free, publicly-accessible full text available April 11, 2026
A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement

Yuan, Hui; Zeng, Yifan; Wu, Yue; Wang, Huazheng; Wang, Mengdi; Liu, Leqi (April 2025, The Thirteenth International Conference on Learning Representations (ICLR 2025))

Reinforcement Learning from Human Feedback (RLHF) has become the predominant approach for language model (LM) alignment. At its core, RLHF uses a margin-based loss for preference optimization, specifying ideal LM behavior only by the difference between preferred and dispreferred responses. In this paper, we identify a common pitfall of margin-based methods -- the under-specification of ideal LM behavior on preferred and dispreferred responses individually, which leads to two unintended consequences as the margin increases: (1) The probability of dispreferred (e.g., unsafe) responses may increase, resulting in potential safety alignment failures. (2) The probability of preferred responses may decrease, even when those responses are ideal. We demystify the reasons behind these problematic behaviors: margin-based losses couple the change in the preferred probability to the gradient of the dispreferred one, and vice versa, often preventing the preferred probability from increasing while the dispreferred one decreases, and thus causing a synchronized increase or decrease in both probabilities. We term this effect, inherent in margin-based objectives, gradient entanglement. Formally, we derive conditions for general margin-based alignment objectives under which gradient entanglement becomes concerning: the inner product of the gradients of preferred and dispreferred log-probabilities is large relative to the individual gradient norms. We theoretically investigate why such inner products can be large when aligning language models and empirically validate our findings. Empirical implications of our framework extend to explaining important differences in the training dynamics of various preference optimization algorithms, and suggesting potential algorithm designs to mitigate the under-specification issue of margin-based methods and thereby improving language model alignment.
more » « less
Free, publicly-accessible full text available April 24, 2026
PARL: A Unified Framework for Policy Alignment in Reinforcement Learning

Chakraborty, Souradip; Bedi, Amrit; Koppel, Alec; Wang, Huazheng; Manocha, Dinesh; Wang, Mengdi; Huang, Furong (January 2024, The Twelfth International Conference on Learning Representations)
Incentivizing Exploration in Linear Contextual Bandits under Information Gap

https://doi.org/10.1145/3604915.3608794

Wang, Huazheng; Xu, Haifeng; Li, Chuanhao; Liu, Zhiyuan; Wang, Hongning (September 2023, ACM)

Full Text Available
Learning Kernelized Contextual Bandits in a Distributed and Asynchronous Environment

Li, Chuanhao; Wang, Huazheng; Wang, Mengdi; Wang, Hongning (January 2023, International Conference on Learning Representation)

Full Text Available
Incentivizing Exploration in Linear Bandits under Information Gap

Wang, Huazheng; Xu, Haifeng; Li, Chuanhao; Liu, Zhiyuan; Wang, Hongning (January 2023, 17th ACM Conference on Recommender Systems (RecSys'2023))

Full Text Available
Dynamic Global Sensitivity for Differentially Private Contextual Bandits

https://doi.org/10.1145/3523227.3546781

Wang, Huazheng; Zhao, David; Wang, Hongning (September 2022, Proceedings of the 16th ACM Conference on Recommender Systems)

We propose a differentially private linear contextual bandit algorithm, via a tree-based mechanism to add Laplace or Gaussian noise to model parameters. Our key insight is that as the model converges during online update, the global sensitivity of its parameters shrinks over time (thus named dynamic global sensitivity). Compared with existing solutions, our dynamic global sensitivity analysis allows us to inject less noise to obtain $$(\epsilon, \delta)$$-differential privacy with added regret caused by noise injection in $$\tilde O(\log{T}\sqrt{T}/\epsilon)$$. We provide a rigorous theoretical analysis over the amount of noise added via dynamic global sensitivity and the corresponding upper regret bound of our proposed algorithm. Experimental results on both synthetic and real-world datasets confirmed the algorithm's advantage against existing solutions.
more » « less
Full Text Available
When Are Linear Stochastic Bandits Attackable?

Wang, Huazheng; Xu, Haifeng; Wang, Hongning (July 2022, Proceedings of the 39th International Conference on Machine Learning)
Chaudhuri, Kamalika; Jegelka, Stefanie; Song, Le; Szepesvari, Csaba; Niu, Gang; Sabato, Sivan (Ed.)
We study adversarial attacks on linear stochastic bandits: by manipulating the rewards, an adversary aims to control the behaviour of the bandit algorithm. Perhaps surprisingly, we first show that some attack goals can never be achieved. This is in a sharp contrast to context-free stochastic bandits, and is intrinsically due to the correlation among arms in linear stochastic bandits. Motivated by this finding, this paper studies the attackability of a $$k$$-armed linear bandit environment. We first provide a complete necessity and sufficiency characterization of attackability based on the geometry of the arms’ context vectors. We then propose a two-stage attack method against LinUCB and Robust Phase Elimination. The method first asserts whether the given environment is attackable; and if yes, it poisons the rewards to force the algorithm to pull a target arm linear times using only a sublinear cost. Numerical experiments further validate the effectiveness and cost-efficiency of the proposed attack method.
more » « less
Full Text Available

« Prev Next »

Search for: All records