NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits

Si, Nian; Zhang, Fan; Zhou, Zhengyuan; Blanchet, Jose (July 2020, Proceedings of the 37th International Conference on Machine Learning)
III, Hal Daumé (Ed.)
Policy learning using historical observational data is an important problem that has found widespread applications. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data{–}an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributionally robust policy with bandit observational data. We propose a novel learning algorithm that is able to learn a robust policy to adversarial perturbations and unknown covariate shifts. We first present a policy evaluation procedure in the ambiguous environment and also give a heuristic algorithm to solve the distributionally robust policy learning problems efficiently. Additionally, we provide extensive simulations to demonstrate the robustness of our policy.
more » « less
Full Text Available
Robust Bayesian Classification Using An Optimistic Score Ratio

Nguyen, Viet Anh; Si, Nian; Blanchet, Jose (July 2020, Proceedings of Machine Learning Research)
III, Hal Daumé (Ed.)
We build a Bayesian contextual classification model using an optimistic score ratio for robust binary classification when there is limited information on the class-conditional, or contextual, distribution. The optimistic score searches for the distribution that is most plausible to explain the observed outcomes in the testing sample among all distributions belonging to the contextual ambiguity set which is prescribed using a limited structural constraint on the mean vector and the covariance matrix of the underlying contextual distribution. We show that the Bayesian classifier using the optimistic score ratio is conceptually attractive, delivers solid statistical guarantees, and is computationally tractable. We showcase the power of the proposed optimistic score ratio classifier on both synthetic and empirical data.
more » « less
Full Text Available
Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Li, Z.; E., Shen; Lin, K.; Keutzer, K.; Klein, D.; Gonzalez, J.E. (January 2020, Proceedings of Machine Learning Research)
III, Hal Daumé (Ed.)
Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute: self-supervised pretraining and high-resource machine translation. We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps. Moreover, this acceleration in convergence typically outpaces the additional computational overhead of using larger models. Therefore, the most compute-efficient training strategy is to counterintuitively train extremely large models but stop after a small number of iterations. This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models. However, we show that large models are more robust to compression techniques such as quantization and pruning than small models. Consequently, one can get the best of both worlds: heavily compressed, large models achieve higher accuracy than lightly compressed, small models.
more » « less
Full Text Available
Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits

Si, Nian; Zhang, Fan; Zhou, Zhengyuan; Blanchet, Jose. (August 2020, Proceedings of Machine Learning Research)
III, Hal Daumé; Singh, Aarti (Ed.)
Policy learning using historical observational data is an important problem that has found widespread applications. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data{–}an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributionally robust policy with bandit observational data. We propose a novel learning algorithm that is able to learn a robust policy to adversarial perturbations and unknown covariate shifts. We first present a policy evaluation procedure in an ambiguous environment and also give a heuristic algorithm to solve the distributionally robust policy learning problems efficiently. Additionally, we provide extensive simulations to demonstrate the robustness of our policy.
more » « less
Full Text Available
Learning Selection Strategies in Buchberger’s Algorithm

Peifer, Dylan; Stillman, Michael; Halpern-Leistner, Daniel (July 2020, Proceedings of the 37th International Conference on Machine Learning)
III, Hal Daumé; Singh, Aarti (Ed.)
Studying the set of exact solutions of a system of polynomial equations largely depends on a single iterative algorithm, known as Buchberger’s algorithm. Optimized versions of this algorithm are crucial for many computer algebra systems (e.g., Mathematica, Maple, Sage). We introduce a new approach to Buchberger’s algorithm that uses reinforcement learning agents to perform S-pair selection, a key step in the algorithm. We then study how the difficulty of the problem depends on the choices of domain and distribution of polynomials, about which little is known. Finally, we train a policy model using proximal policy optimization (PPO) to learn S-pair selection strategies for random systems of binomial equations. In certain domains, the trained model outperforms state-of-the-art selection heuristics in total number of polynomial additions performed, which provides a proof-of-concept that recent developments in machine learning have the potential to improve performance of algorithms in symbolic computation.
more » « less
Full Text Available
Deep Molecular Programming: A Natural Implementation of Binary-Weight ReLU Neural Networks

Vasic, Marko; Chalk, Cameron; Khurshid, Sarfraz; Soloveichik, David (January 2020, Proceedings of the 37th International Conference on Machine Learning)
III, Hal Daumé; Singh, Aarti (Ed.)
Embedding computation in molecular contexts incompatible with traditional electronics is expected to have wide ranging impact in synthetic biology, medicine, nanofabrication and other fields. A key remaining challenge lies in developing programming paradigms for molecular computation that are well-aligned with the underlying chemical hardware and do not attempt to shoehorn ill-fitting electronics paradigms. We discover a surprisingly tight connection between a popular class of neural networks (binary-weight ReLU aka BinaryConnect) and a class of coupled chemical reactions that are absolutely robust to reaction rates. The robustness of rate-independent chemical computation makes it a promising target for bioengineering implementation. We show how a BinaryConnect neural network trained in silico using well-founded deep learning optimization techniques, can be compiled to an equivalent chemical reaction network, providing a novel molecular programming paradigm. We illustrate such translation on the paradigmatic IRIS and MNIST datasets. Toward intended applications of chemical computation, we further use our method to generate a chemical reaction network that can discriminate between different virus types based on gene expression levels. Our work sets the stage for rich knowledge transfer between neural network and molecular programming communities.
more » « less
Full Text Available

Search for: All records