NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

CDPU: Co-designing Compression and Decompression Processing Units for Hyperscale Systems

https://doi.org/10.1145/3579371.3589074

Karandikar, Sagar; Udipi, Aniruddha N.; Choi, Junsun; Whangbo, Joonho; Zhao, Jerry; Kanev, Svilen; Lim, Edwin; Alakuijala, Jyrki; Madduri, Vrishab; Shao, Yakun Sophia; et al (June 2023, ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture)

Full Text Available
RoSÉ: A Hardware-Software Co-Simulation Infrastructure Enabling Pre-Silicon Full-Stack Robotics SoC Evaluation

https://doi.org/10.1145/3579371.3589099

Nikiforov, Dima; Dong, Shengjun Chris; Zhang, Chengyi Lux; Kim, Seah; Nikolic, Borivoje; Shao, Yakun Sophia (June 2023, ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture)

Full Text Available
A new similarity measure for covariate shift with applications to nonparametric regression

Pathak, Reese; Ma, Cong; Wainwright, Martin J. (July 2022, Proceedings of the International Conference on Machine Learning)

We study covariate shift in the context of nonparametric regression. We introduce a new measure of distribution mismatch between the source and target distributions using the integrated ratio of probabilities of balls at a given radius. We use the scaling of this measure with respect to the radius to characterize the minimax rate of estimation over a family of H{ö}lder continuous functions under covariate shift. In comparison to the recently proposed notion of transfer exponent, this measure leads to a sharper rate of convergence and is more fine-grained. We accompany our theory with concrete instances of covariate shift that illustrate this sharp difference.
more » « less
Full Text Available
Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning.

Zanette, Andrea; Wainwright, Martin (July 2022, International Conference on Machine Learning)

The Q-learning algorithm is a simple and widely-used stochastic approximation scheme for reinforcement learning, but the basic protocol can exhibit instability in conjunction with function approximation. Such instability can be observed even with linear function approximation. In practice, tools such as target networks and experience replay appear to be essential, but the individual contribution of each of these mechanisms is not well understood theoretically. This work proposes an exploration variant of the basic Q-learning protocol with linear function approximation. Our modular analysis illustrates the role played by each algorithmic tool that we adopt: a second order update rule, a set of target networks, and a mechanism akin to experience replay. Together, they enable state of the art regret bounds on linear MDPs while preserving the most prominent feature of the algorithm, namely a space complexity independent of the number of step elapsed. We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error. The algorithm also exhibits a form of instance-dependence, in that its performance depends on the "effective" feature dimension.
more » « less
Full Text Available
Bellman Residual Orthogonalization for Offline Reinforcement Learning

Zanette, Andrea; Wainwright, Martin J. (April 2022, ArXivorg)

We propose and analyze a reinforcement learning principle that approximates the Bellman equations by enforcing their validity only along an user-defined space of test functions. Focusing on applications to model-free offline RL with function approximation, we exploit this principle to derive confidence intervals for off-policy evaluation, as well as to optimize over policies within a prescribed policy class. We prove an oracle inequality on our policy optimization procedure in terms of a trade-off between the value and uncertainty of an arbitrary comparator policy. Different choices of test function spaces allow us to tackle different problems within a common framework. We characterize the loss of efficiency in moving from on-policy to off-policy data using our procedures, and establish connections to concentrability coefficients studied in past work. We examine in depth the implementation of our methods with linear function approximation, and provide theoretical guarantees with polynomial-time implementations even when Bellman closure does not hold.
more » « less
Full Text Available
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

Zanette, Andrea; Brunskill, Emma; Wainwright, Martin J. (January 2021, NEURIPS Conference 2021)

Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically. We propose a new offline actor-critic algorithm that naturally incorporates the pessimism principle, leading to several key advantages compared to the state of the art. The algorithm can operate when the Bellman evaluation operator is closed with respect to the action value function of the actor's policies; this is a more general setting than the low-rank MDP model. Despite the added generality, the procedure is computationally tractable as it involves the solution of a sequence of second-order programs. We prove an upper bound on the suboptimality gap of the policy returned by the procedure that depends on the data coverage of any arbitrary, possibly data dependent comparator policy. The achievable guarantee is complemented with a minimax lower bound that is matching up to logarithmic factors.
more » « less
Full Text Available

Search for: All records