Preventing undesirable behavior of intelligent machines

Thomas, Philip S.; Castro da Silva, Bruno; Barto, Andrew G.; Giguere, Stephen; Brun, Yuriy; Brunskill, Emma

doi:10.1126/science.aag3311

Citation Details

Preventing undesirable behavior of intelligent machines

Intelligent machines using machine learning algorithms are ubiquitous, ranging from simple data analysis and pattern recognition tools to complex systems that achieve superhuman performance on various tasks. Ensuring that they do not exhibit undesirable behavior—that they do not, for example, cause harm to humans—is therefore a pressing problem. We propose a general and flexible framework for designing machine learning algorithms. This framework simplifies the problem of specifying and regulating undesirable behavior. To show the viability of this framework, we used it to create machine learning algorithms that precluded the dangerous behavior caused by standard machine learning algorithms in our experiments. Our framework for designing machine learning algorithms simplifies the safe and responsible application of machine learning. more »

Award ID(s):: 1763423 1453474

PAR ID:: 10172836

Author(s) / Creator(s):: Thomas, Philip S.; Castro da Silva, Bruno; Barto, Andrew G.; Giguere, Stephen; Brun, Yuriy; Brunskill, Emma

Date Published:: 2019-11-21

Journal Name:: Science

Volume:: 366

Issue:: 6468

ISSN:: 0036-8075

Page Range / eLocation ID:: 999 to 1004

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1126/science.aag3311

More Like this