Search for: All records

Creators/Authors contains: "Niekum, Scott"

« Prev Next »

Total Resources

18

Resource Type
Conference Paper

16

Conference Proceeding

0

Dataset

0

Journal Article

2

Workshop Report

0

Availability
Full Text / Resource Available

18

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Fairness Guarantees Under Demographic Shift

Giguere, Stephen ; Metevier, Blossom ; Brun, Yuriy ; Castro da Silva, Bruno ; Thomas, Philip ; Niekum, Scott ( April 2022 , International Conference on Learning Representations)

Full Text Available
Fairness Guarantees under Demographic Shift

Giguere, Stephen ; Metevier, Blossom ; Brun, Yuriy ; da Silva, Bruno Castro ; Thomas, Philip S. ; Niekum, Scott ( April 2022 , Proceedings of the 10th International Conference on Learning Representations (ICLR))

Recent studies found that using machine learning for social applications can lead to injustice in the form of racist, sexist, and otherwise unfair and discriminatory outcomes. To address this challenge, recent machine learning algorithms have been designed to limit the likelihood such unfair behavior occurs. However, these approaches typically assume the data used for training is representative of what will be encountered in deployment, which is often untrue. In particular, if certain subgroups of the population become more or less probable in deployment (a phenomenon we call demographic shift), prior work's fairness assurances are often invalid. In this paper, we consider the impact of demographic shift and present a class of algorithms, called Shifty algorithms, that provide high-confidence behavioral guarantees that hold under demographic shift when data from the deployment environment is unavailable during training. Shifty, the first technique of its kind, demonstrates an effective strategy for designing algorithms to overcome demographic shift's challenges. We evaluate Shifty using the UCI Adult Census dataset, as well as a real-world dataset of university entrance exams and subsequent student success. We show that the learned models avoid bias under demographic shift, unlike existing methods. Our experiments demonstrate that our algorithm's high-confidence fairness guarantees are valid in practice and that our algorithm is an effective tool for training models that are fair when demographic shift occurs.
more » « less
Full Text Available
Fairness Guarantees Under Demographic Shift

Giguere, Stephen ; Metevier, Blossom ; Castro da Silva, Bruno ; Brun, Yuriy ; Thomas, Philip ; Niekum, Scott ( January 2022 , International Conference on Learning Representations)

Full Text Available
Universal Off-Policy Evaluation

Chandak, Yash ; Niekum, Scott ; Castro da Silva, Bruno ; Learned-Miller, Erik ; Brunskill, Emma ; Thomas, Philip ( December 2021 , Advances in neural information processing systems)

When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy. Those predictions must often be based on data collected under some previously used decision-making rule. Many previous methods enable such off-policy (or counterfactual) estimation of the expected value of a performance measure called the return. In this paper, we take the first steps towards a universal off-policy estimator (UnO)—one that provides off-policy estimates and high-confidence bounds for any parameter of the return distribution. We use UnO for estimating and simultaneously bounding the mean, variance, quantiles/median, inter-quantile range, CVaR, and the entire cumulative distribution of returns. Finally, we also discuss UnO’s applicability in various settings, including fully observable, partially observable (i.e., with unobserved confounders), Markovian, non-Markovian, stationary, smoothly non-stationary, and discrete distribution shifts.
more » « less
Full Text Available
Importance sampling in reinforcement learning with an estimated behavior policy

Hanna, Josiah ; Niekum, Scott ; Stone, Peter ( June 2021 , Machine learning)
null (Ed.)
Full Text Available
Value Alignment Verification

Brown, Daniel S ; Schneider, Jordan ; Dragan, Anca ; Niekum, Scott ( July 2021 , International Conference on Machine Learning)
null (Ed.)
Full Text Available
ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw Theory

Jain, Ajinkya ; Lioutikov, Rudolf ; Chuck, Caleb ; Niekum, Scott ( June 2021 , IEEE International Conference on Robotics and Automation)
null (Ed.)
Full Text Available
Importance sampling in reinforcement learning with an estimated behavior policy

https://doi.org/10.1007/s10994-020-05938-9

Hanna, Josiah P. ; Niekum, Scott ; Stone, Peter ( May 2021 , Machine Learning)

Abstract
In reinforcement learning, importance sampling is a widely used method for evaluating an expectation under the distribution of data of one policy when the data has in fact been generated by a different policy. Importance sampling requires computing the likelihood ratio between the action probabilities of a target policy and those of the data-producing behavior policy. In this article, we study importance sampling where the behavior policy action probabilities are replaced by their maximum likelihood estimate of these probabilities under the observed data. We show this general technique reduces variance due to sampling error in Monte Carlo style estimators. We introduce two novel estimators that use this technique to estimate expected values that arise in the RL literature. We find that these general estimators reduce the variance of Monte Carlo sampling methods, leading to faster learning for policy gradient algorithms and more accurate off-policy policy evaluation. We also provide theoretical analysis showing that our new estimators are consistent and have asymptotically lower variance than Monte Carlo estimators.

more » « less
Value Alignment Verification

Brown, Daniel ; Schneider, Jordan ; Dragan, Anca ; Niekum, Scott ( January 2021 , 38th International Conference on Machine Learning)

As humans interact with autonomous agents to perform increasingly complicated, potentially risky tasks, it is important to be able to efficiently evaluate an agent’s performance and correctness. In this paper we formalize and theoretically analyze the problem of efficient value alignment verification: how to efficiently test whether the behavior of another agent is aligned with a human’s values. The goal is to construct a kind of “driver’s test” that a human can give to any agent which will verify value alignment via a minimal number of queries. We study alignment verification problems with both idealized humans that have an explicit reward function as well as problems where they have implicit values. We analyze verification of exact value alignment for rational agents and propose and analyze heuristic and approximate value alignment verification tests in a wide range of gridworlds and a continuous autonomous driving domain. Finally, we prove that there exist sufficient conditions such that we can verify exact and approximate alignment across an infinite set of test environments via a constant- query-complexity alignment test.
more » « less
Full Text Available
One-Shot Learning of Multi-Step Tasks from Observation via Activity Localization in Auxiliary Video

Goo, Wonjoon ; Niekum, Scott ( May 2019 , IEEE International Conference on Robotics and Automation)

Full Text Available

« Prev Next »