NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Incentivized Truthful Communication for Federated Bandits

Wei, Zhepei; Li, Chuanhao; Ren, Tianze; Xu, Haifeng; Wang, Hongning (May 2024, The EleventhInternational Conference on Learning Representations (ICLR'2024))
Federated Linear Contextual Bandits with Heterogeneous Clients

Blaser, Ethan; Li, Chuanhao; Wang, Hongning (May 2024, 27th International Conference onArtificial Intelligence and Statistics (AISTATS‘2024))
Meta-reinforcement Learning via Exploratory Task Clustering

Chu, Zhendong; Wang, Hongning (February 2024, 38th AAAI Conference on Artificial Intelligence(AAAI'2024))
Incentivized Communication for Federated Bandits

Wei, Zhepei; Li, Chuanhao; Xu, Haifeng; Wang, Hongning (December 2023, Thirty-Seventh Conference on NeuralInformation Processing Systems (NeurIPS'2023))
COFFEE: Counterfactual Fairness forPersonalized Text Generation in Explainable Recommendation

https://doi.org/10.18653/v1/2023.emnlp-main.819

Wang, Nan; Wang, Qifan; Wang, Yi-Chia; Sanjabi, Maziar; Liu, Jingzhou; Firooz, Hamed; Wang, Hongning; Nie, Shaoliang (December 2023, 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP'2023))
E-ADDA: Unsupervised Adversarial Domain AdaptationEnhanced by a New Mahalanobis Distance Loss for Smart Computing

https://doi.org/10.1109/SMARTCOMP58114.2023.00039

Gao, Ye; Baucom, Brian; Rose, Karen; Gordon, Kristina; Wang, Hongning; Stankovic, John (June 2023, IEEE International Conference on Smart Computing (SMARTCOMP))
Spectral Augmentation for Self-Supervised Learning on Graphs

Lu Lin, Jinghui Chen (May 2023, The Eleventh International Conference on Learning Representations)

Full Text Available
Learning from a Learning User for Optimal Recommendations

Yao, Fan; Li, Chuanhao; Nekipelov, Denis; Wang, Hongning; Xu, Haifeng (July 2022, Proceedings of the 39th International Conference on Machine Learning)
Chaudhuri, Kamalika; Jegelka, Stefanie; Song, Le; Szepesvari, Csaba; Niu, Gang; Sabato, Sivan (Ed.)
In real-world recommendation problems, especially those with a formidably large item space, users have to gradually learn to estimate the utility of any fresh recommendations from their experience about previously consumed items. This in turn affects their interaction dynamics with the system and can invalidate previous algorithms built on the omniscient user assumption. In this paper, we formalize a model to capture such ”learning users” and design an efficient system-side learning solution, coined Noise-Robust Active Ellipsoid Search (RAES), to confront the challenges brought by the non-stationary feedback from such a learning user. Interestingly, we prove that the regret of RAES deteriorates gracefully as the convergence rate of user learning becomes worse, until reaching linear regret when the user’s learning fails to converge. Experiments on synthetic datasets demonstrate the strength of RAES for such a contemporaneous system-user learning problem. Our study provides a novel perspective on modeling the feedback loop in recommendation problems.
more » « less
Full Text Available
When Are Linear Stochastic Bandits Attackable?

Wang, Huazheng; Xu, Haifeng; Wang, Hongning (July 2022, Proceedings of the 39th International Conference on Machine Learning)
Chaudhuri, Kamalika; Jegelka, Stefanie; Song, Le; Szepesvari, Csaba; Niu, Gang; Sabato, Sivan (Ed.)
We study adversarial attacks on linear stochastic bandits: by manipulating the rewards, an adversary aims to control the behaviour of the bandit algorithm. Perhaps surprisingly, we first show that some attack goals can never be achieved. This is in a sharp contrast to context-free stochastic bandits, and is intrinsically due to the correlation among arms in linear stochastic bandits. Motivated by this finding, this paper studies the attackability of a $$k$$-armed linear bandit environment. We first provide a complete necessity and sufficiency characterization of attackability based on the geometry of the arms’ context vectors. We then propose a two-stage attack method against LinUCB and Robust Phase Elimination. The method first asserts whether the given environment is attackable; and if yes, it poisons the rewards to force the algorithm to pull a target arm linear times using only a sublinear cost. Numerical experiments further validate the effectiveness and cost-efficiency of the proposed attack method.
more » « less
Full Text Available
Learning the Optimal Recommendation from Explorative Users

https://doi.org/10.1609/aaai.v36i9.21178

Yao, Fan; Li, Chuanhao; Nekipelov, Denis; Wang, Hongning; Xu, Haifeng (June 2022, Proceedings of the AAAI Conference on Artificial Intelligence)

We propose a new problem setting to study the sequential interactions between a recommender system and a user. Instead of assuming the user is omniscient, static, and explicit, as the classical practice does, we sketch a more realistic user behavior model, under which the user: 1) rejects recommendations if they are clearly worse than others; 2) updates her utility estimation based on rewards from her accepted recommendations; 3) withholds realized rewards from the system. We formulate the interactions between the system and such an explorative user in a K-armed bandit framework and study the problem of learning the optimal recommendation on the system side. We show that efficient system learning is still possible but is more difficult. In particular, the system can identify the best arm with probability at least 1-delta within O(1/delta) interactions, and we prove this is tight. Our finding contrasts the result for the problem of best arm identification with fixed confidence, in which the best arm can be identified with probability 1-delta within O(log(1/delta)) interactions. This gap illustrates the inevitable cost the system has to pay when it learns from an explorative user's revealed preferences on its recommendations rather than from the realized rewards.
more » « less
Full Text Available

« Prev Next »

Search for: All records