NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers

Huang, Yixiao; Zhu, Hanlin; Guo, Tianyu; Jiao, Jiantao; Sojoudi, Somayeh; Jordan, Michael I; Russell, Stuart; Mei, Song (September 2025, Conference on Neural Information Processing Systems)

Free, publicly-accessible full text available September 18, 2026
Statistical complexity and optimal algorithms for nonlinear ridge bandits

https://doi.org/10.1214/24-AOS2395

Rajaraman, Nived; Han, Yanjun; Jiao, Jiantao; Ramchandran, Kannan (December 2024, The Annals of Statistics)

Full Text Available
Toxicity Detection for Free

Hu, Zhanhao; Piet, Julien; Zhao, Geng; Jiao, Jiantao; Wagner, David (December 2024, Advances in Neural Information Processing Systems)

Full Text Available
Online Learning in Stackelberg Games with an Omniscient Follower

Zhao, Geng; Zhu, Banghua; Jiao, Jiantao; Jordan, Michael (August 2023, Proceedings of Machine Learning Research)

We study the problem of online learning in a two-player decentralized cooperative Stackelberg game. In each round, the leader first takes an action, followed by the follower who takes their action after observing the leader’s move. The goal of the leader is to learn to minimize the cumulative regret based on the history of interactions. Differing from the traditional formulation of repeated Stackelberg games, we assume the follower is omniscient, with full knowledge of the true reward, and that they always best-respond to the leader’s actions. We analyze the sample complexity of regret minimization in this repeated Stackelberg game. We show that depending on the reward structure, the existence of the omniscient follower may change the sample complexity drastically, from constant to exponential, even for linear cooperative Stackelberg games.
more » « less
Full Text Available
Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

https://doi.org/10.1109/TIT.2022.3185139

Rashidinejad, Paria; Zhu, Banghua; Ma, Cong; Jiao, Jiantao; Russell, Stuart (December 2022, IEEE Transactions on Information Theory)

Full Text Available
Generalized resilience and robust statistics

https://doi.org/10.1214/22-AOS2186

Zhu, Banghua; Jiao, Jiantao; Steinhardt, Jacob (August 2022, The Annals of Statistics)

Full Text Available
Minimax Off-Policy Evaluation for Multi-Armed Bandits

https://doi.org/10.1109/TIT.2022.3162335

Ma, Cong; Zhu, Banghua; Jiao, Jiantao; Wainwright, Martin J. (August 2022, IEEE Transactions on Information Theory)

Full Text Available
Robust Estimation for Nonparametric Families via Generative Adversarial Networks

Zhu, Banghua; Jiao, Jiantao; Jordan, Michael (January 2022, IEEE International Symposium on Information Theory 2022)

Full Text Available
Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning

https://doi.org/10.1613/jair.1.13326

Zhai, Yuexiang; Baek, Christina; Zhou, Zhengyuan; Jiao, Jiantao; Ma, Yi (January 2022, Journal of Artificial Intelligence Research)

Many goal-reaching reinforcement learning (RL) tasks have empirically verified that rewarding the agent on subgoals improves convergence speed and practical performance. We attempt to provide a theoretical framework to quantify the computational benefits of rewarding the completion of subgoals, in terms of the number of synchronous value iterations. In particular, we consider subgoals as one-way intermediate states, which can only be visited once per episode and propose two settings that consider these one-way intermediate states: the one-way single-path (OWSP) and the one-way multi-path (OWMP) settings. In both OWSP and OWMP settings, we demonstrate that adding intermediate rewards to subgoals is more computationally efficient than only rewarding the agent once it completes the goal of reaching a terminal state. We also reveal a trade-off between computational complexity and the pursuit of the shortest path in the OWMP setting: adding intermediate rewards significantly reduces the computational complexity of reaching the goal but the agent may not find the shortest path, whereas with sparse terminal rewards, the agent finds the shortest path at a significantly higher computational cost. We also corroborate our theoretical results with extensive experiments on the MiniGrid environments using Q-learning and some popular deep RL algorithms.
more » « less
Full Text Available
Robust estimation via generalized quasi-gradients

https://doi.org/10.1093/imaiai/iaab018

Zhu, Banghua; Jiao, Jiantao; Steinhardt, Jacob (August 2021, Information and Inference: A Journal of the IMA)

Abstract We explore why many recently proposed robust estimation problems are efficiently solvable, even though the underlying optimization problems are non-convex. We study the loss landscape of these robust estimation problems, and identify the existence of ’generalized quasi-gradients’. Whenever these quasi-gradients exist, a large family of no-regret algorithms are guaranteed to approximate the global minimum; this includes the commonly used filtering algorithm. For robust mean estimation of distributions under bounded covariance, we show that any first-order stationary point of the associated optimization problem is an approximate global minimum if and only if the corruption level $$\epsilon < 1/3$$. Consequently, any optimization algorithm that approaches a stationary point yields an efficient robust estimator with breakdown point $1/3$. With carefully designed initialization and step size, we improve this to $1/2$, which is optimal. For other tasks, including linear regression and joint mean and covariance estimation, the loss landscape is more rugged: there are stationary points arbitrarily far from the global minimum. Nevertheless, we show that generalized quasi-gradients exist and construct efficient algorithms. These algorithms are simpler than previous ones in the literature, and for linear regression we improve the estimation error from $$O(\sqrt{\epsilon })$$ to the optimal rate of $$O(\epsilon )$$ for small $$\epsilon $$ assuming certified hypercontractivity. For mean estimation with near-identity covariance, we show that a simple gradient descent algorithm achieves breakdown point $1/3$ and iteration complexity $$\tilde{O}(d/\epsilon ^2)$$.
more » « less
Full Text Available

« Prev Next »

Search for: All records