skip to main content


Search for: All records

Creators/Authors contains: "Brunskill, Emma"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy. Those predictions must often be based on data collected under some previously used decision-making rule. Many previous methods enable such off-policy (or counterfactual) estimation of the expected value of a performance measure called the return. In this paper, we take the first steps towards a universal off-policy estimator (UnO)—one that provides off-policy estimates and high-confidence bounds for any parameter of the return distribution. We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns. Finally, we also discuss UnO’s applicability in various settings, including fully observable, partially observable (i.e., with unobserved confounders), Markovian, non-Markovian, stationary, smoothly non-stationary, and discrete distribution shifts. 
    more » « less
  2. Actor-critic methods are widely used in offline reinforcement learning practice, but are not so well-understood theoretically. We propose a new offline actor-critic algorithm that naturally incorporates the pessimism principle, leading to several key advantages compared to the state of the art. The algorithm can operate when the Bellman evaluation operator is closed with respect to the action value function of the actor's policies; this is a more general setting than the low-rank MDP model. Despite the added generality, the procedure is computationally tractable as it involves the solution of a sequence of second-order programs. We prove an upper bound on the suboptimality gap of the policy returned by the procedure that depends on the data coverage of any arbitrary, possibly data dependent comparator policy. The achievable guarantee is complemented with a minimax lower bound that is matching up to logarithmic factors. 
    more » « less
  3. Online education is rapidly expanding in response to rising demand for higher and continuing education, but many online students struggle to achieve their educational goals. Several behavioral science interventions have shown promise in raising student persistence and completion rates in a handful of courses, but evidence of their effectiveness across diverse educational contexts is limited. In this study, we test a set of established interventions over 2.5 y, with one-quarter million students, from nearly every country, across 247 online courses offered by Harvard, the Massachusetts Institute of Technology, and Stanford. We hypothesized that the interventions would produce medium-to-large effects as in prior studies, but this is not supported by our results. Instead, using an iterative scientific process of cyclically preregistering new hypotheses in between waves of data collection, we identified individual, contextual, and temporal conditions under which the interventions benefit students. Self-regulation interventions raised student engagement in the first few weeks but not final completion rates. Value-relevance interventions raised completion rates in developing countries to close the global achievement gap, but only in courses with a global gap. We found minimal evidence that state-of-the-art machine learning methods can forecast the occurrence of a global gap or learn effective individualized intervention policies. Scaling behavioral science interventions across various online learning contexts can reduce their average effectiveness by an order-of-magnitude. However, iterative scientific investigations can uncover what works where for whom. 
    more » « less
  4. We present RobinHood, an offline contextual bandit algorithm designed to satisfy a broad family of fairness constraints. Our algorithm accepts multiple fairness definitions and allows users to construct their own unique fairness definitions for the problem at hand. We provide a theoretical analysis of RobinHood, which includes a proof that it will not return an unfair solution with probability greater than a user-specified threshold. We validate our algorithm on three applications: a tutoring system in which we conduct a user study and consider multiple unique fairness definitions; a loan approval setting (using the Statlog German credit data set) in which well-known fairness definitions are applied; and criminal recidivism (using data released by ProPublica). In each setting, our algorithm is able to produce fair policies that achieve performance competitive with other offline and online contextual bandit algorithms. 
    more » « less
  5. Intelligent machines using machine learning algorithms are ubiquitous, ranging from simple data analysis and pattern recognition tools to complex systems that achieve superhuman performance on various tasks. Ensuring that they do not exhibit undesirable behavior—that they do not, for example, cause harm to humans—is therefore a pressing problem. We propose a general and flexible framework for designing machine learning algorithms. This framework simplifies the problem of specifying and regulating undesirable behavior. To show the viability of this framework, we used it to create machine learning algorithms that precluded the dangerous behavior caused by standard machine learning algorithms in our experiments. Our framework for designing machine learning algorithms simplifies the safe and responsible application of machine learning. 
    more » « less
  6. Learning at scale (LAS) systems like Massive Open Online Classes (MOOCs) have hugely expanded access to high quality educational materials however, such materials are frequently time and resource expensive to create. In this work we propose a new approach for automatically and adaptively sequencing practice activities for a particular learner and explore its application for foreign language learning. We evaluate our system through simulation and are in the process of running an experiment. Our simulation results suggest that such an approach may be significantly better than an expert system when there is high variability in the rate of learning among the students and if mastering prerequisites before advancing is important. They also suggest it is likely to be no worse than an expert system if our generated curriculum approximately describes the necessary structure of learning in students. 
    more » « less