Search for: All records

Creators/Authors contains: "Brunskill, Emma"

« Prev Next »

Total Resources

7

Resource Type
Conference Paper

3

Conference Proceeding

0

Dataset

0

Journal Article

4

Workshop Report

0

Availability
Full Text / Resource Available

7

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Universal Off-Policy Evaluation

Chandak, Yash ; Niekum, Scott ; Castro da Silva, Bruno ; Learned-Miller, Erik ; Brunskill, Emma ; Thomas, Philip ( December 2021 , Advances in neural information processing systems)

When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy. Those predictions must often be based on data collected under some previously used decision-making rule. Many previous methods enable such off-policy (or counterfactual) estimation of the expected value of a performance measure called the return. In this paper, we take the first steps towards a universal off-policy estimator (UnO)—one that provides off-policy estimates and high-confidence bounds for any parameter of the return distribution. We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns. Finally, we also discuss UnO’s applicability in various settings, including fully observable, partially observable (i.e., with unobserved confounders), Markovian, non-Markovian, stationary, smoothly non-stationary, and discrete distribution shifts.
more » « less
Full Text Available
Learning When-to-Treat Policies

https://doi.org/10.1080/01621459.2020.1831925

Nie, Xinkun ; Brunskill, Emma ; Wager, Stefan ( January 2021 , Journal of the American Statistical Association)

Full Text Available
Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning

Zanette, Andrea ; Brunskill, Emma ; Wainwright, Martin J. ( January 2021 , NEURIPS Conference 2021)

Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically. We propose a new offline actor-critic algorithm that naturally incorporates the pessimism principle, leading to several key advantages compared to the state of the art. The algorithm can operate when the Bellman evaluation operator is closed with respect to the action value function of the actor's policies; this is a more general setting than the low-rank MDP model. Despite the added generality, the procedure is computationally tractable as it involves the solution of a sequence of second-order programs. We prove an upper bound on the suboptimality gap of the policy returned by the procedure that depends on the data coverage of any arbitrary, possibly data dependent comparator policy. The achievable guarantee is complemented with a minimax lower bound that is matching up to logarithmic factors.
more » « less
Full Text Available
Scaling up behavioral science interventions in online education

https://doi.org/10.1073/pnas.1921417117

Kizilcec, René F. ; Reich, Justin ; Yeomans, Michael ; Dann, Christoph ; Brunskill, Emma ; Lopez, Glenn ; Turkay, Selen ; Williams, Joseph Jay ; Tingley, Dustin ( June 2020 , Proceedings of the National Academy of Sciences)

Online education is rapidly expanding in response to rising demand for higher and continuing education, but many online students struggle to achieve their educational goals. Several behavioral science interventions have shown promise in raising student persistence and completion rates in a handful of courses, but evidence of their effectiveness across diverse educational contexts is limited. In this study, we test a set of established interventions over 2.5 y, with one-quarter million students, from nearly every country, across 247 online courses offered by Harvard, the Massachusetts Institute of Technology, and Stanford. We hypothesized that the interventions would produce medium-to-large effects as in prior studies, but this is not supported by our results. Instead, using an iterative scientific process of cyclically preregistering new hypotheses in between waves of data collection, we identified individual, contextual, and temporal conditions under which the interventions benefit students. Self-regulation interventions raised student engagement in the first few weeks but not final completion rates. Value-relevance interventions raised completion rates in developing countries to close the global achievement gap, but only in courses with a global gap. We found minimal evidence that state-of-the-art machine learning methods can forecast the occurrence of a global gap or learn effective individualized intervention policies. Scaling behavioral science interventions across various online learning contexts can reduce their average effectiveness by an order-of-magnitude. However, iterative scientific investigations can uncover what works where for whom.
more » « less
Full Text Available
Offline Contextual Bandits with High Probability Fairness Guarantees

Metevier, Blossom ; Giguere, Stephen ; Brockman, Sarah ; Kobren, Ari ; Brun, Yuriy ; Brunskill, Emma ; Thomas, Philip ( December 2019 , Advances in neural information processing systems)

We present RobinHood, an offline contextual bandit algorithm designed to satisfy a broad family of fairness constraints. Our algorithm accepts multiple fairness definitions and allows users to construct their own unique fairness definitions for the problem at hand. We provide a theoretical analysis of RobinHood, which includes a proof that it will not return an unfair solution with probability greater than a user-specified threshold. We validate our algorithm on three applications: a tutoring system in which we conduct a user study and consider multiple unique fairness definitions; a loan approval setting (using the Statlog German credit data set) in which well-known fairness definitions are applied; and criminal recidivism (using data released by ProPublica). In each setting, our algorithm is able to produce fair policies that achieve performance competitive with other offline and online contextual bandit algorithms.
more » « less
Full Text Available
Preventing undesirable behavior of intelligent machines

https://doi.org/10.1126/science.aag3311

Thomas, Philip S. ; Castro da Silva, Bruno ; Barto, Andrew G. ; Giguere, Stephen ; Brun, Yuriy ; Brunskill, Emma ( November 2019 , Science)

Intelligent machines using machine learning algorithms are ubiquitous, ranging from simple data analysis and pattern recognition tools to complex systems that achieve superhuman performance on various tasks. Ensuring that they do not exhibit undesirable behavior—that they do not, for example, cause harm to humans—is therefore a pressing problem. We propose a general and flexible framework for designing machine learning algorithms. This framework simplifies the problem of specifying and regulating undesirable behavior. To show the viability of this framework, we used it to create machine learning algorithms that precluded the dangerous behavior caused by standard machine learning algorithms in our experiments. Our framework for designing machine learning algorithms simplifies the safe and responsible application of machine learning.
more » « less
Full Text Available
Combining adaptivity with progression ordering for intelligent tutoring systems

https://doi.org/10.1145/3231644.3231672

Mu, Tong ; Wang, Shuhan ; Andersen, Erik ; Brunskill, Emma ( June 2018 , Work in Progress, Learning at Scale 2018)

Learning at scale (LAS) systems like Massive Open Online Classes (MOOCs) have hugely expanded access to high quality educational materials however, such materials are frequently time and resource expensive to create. In this work we propose a new approach for automatically and adaptively sequencing practice activities for a particular learner and explore its application for foreign language learning. We evaluate our system through simulation and are in the process of running an experiment. Our simulation results suggest that such an approach may be significantly better than an expert system when there is high variability in the rate of learning among the students and if mastering prerequisites before advancing is important. They also suggest it is likely to be no worse than an expert system if our generated curriculum approximately describes the necessary structure of learning in students.
more » « less
Full Text Available