NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Online Learning of Unknown Dynamics for Model-Based Controllers in Legged Locomotion

Sun, Yu; Ubellacker, Wyatt L.; Ma, Wen-Loong; Zhang, Xiang; Wang, Changhao; Csomay-Shanklin, Noel V.; Tomizuka, Masayoshi; Sreenath, Koushil; Ames, Aaron D. (October 2021, International Conference on Intelligent Robots and Systems)
null (Ed.)
Full Text Available
Towards a Dimension-Free Understanding of Adaptive Linear Control

Perdomo, Juan C; Simchowitz, Max; Agarwal, Alekh; Bartlett, Peter (October 2021, Proceedings of Thirty Fourth Conference on Learning Theory)
null (Ed.)
Full Text Available
Outside the Echo Chamber: Optimizing the Performative Risk

Miller, John; Perdomo, Juan Carlos; Zrnic, Tijana (July 2021, International Conference on Machine Learning (ICML) 2021)
null (Ed.)
Full Text Available
Revisiting Design Choices in Proximal Policy Optimization

Hsu, Chloe Ching-Yun; Mendler-Dünner, Celestine; Hardt, Moritz (September 2020, ArXivorg)
null (Ed.)
Proximal Policy Optimization (PPO) is a popular deep policy gradient algorithm. In standard implementations, PPO regularizes policy updates with clipped probability ratios, and parameterizes policies with either continuous Gaussian distributions or discrete Softmax distributions. These design choices are widely accepted, and motivated by empirical performance comparisons on MuJoCo and Atari benchmarks. We revisit these practices outside the regime of current benchmarks, and expose three failure modes of standard PPO. We explain why standard design choices are problematic in these cases, and show that alternative choices of surrogate objectives and policy parameterizations can prevent the failure modes. We hope that our work serves as a reminder that many algorithmic design choices in reinforcement learning are tied to specific simulation environments. We should not implicitly accept these choices as a standard part of a more general algorithm.
more » « less
Full Text Available
Test-Time Training with Self-Supervision for Generalization under Distribution Shifts

Sun, Yu; Wang, Xiaolong; Liu, Zhuang; Miller, John; Efros, Alexei A.; Hardt, Moritz (April 2020, ICML 2020)
null (Ed.)
In this paper, we propose Test-Time Training, a general approach for improving the performance of predictive models when training and test data come from different distributions. We turn a single unlabeled test sample into a self-supervised learning problem, on which we update the model parameters before making a prediction. This also extends naturally to data in an online stream. Our simple approach leads to improvements on diverse image classification benchmarks aimed at evaluating robustness to distribution shifts.
more » « less
Full Text Available
Stable Recurrent Models

Miller, John; Hardt, Moritz (April 2019, In Proceedings of ICLR 2019)

Full Text Available

Search for: All records