skip to main content


Title: Preventing undesirable behavior of intelligent machines
Intelligent machines using machine learning algorithms are ubiquitous, ranging from simple data analysis and pattern recognition tools to complex systems that achieve superhuman performance on various tasks. Ensuring that they do not exhibit undesirable behavior—that they do not, for example, cause harm to humans—is therefore a pressing problem. We propose a general and flexible framework for designing machine learning algorithms. This framework simplifies the problem of specifying and regulating undesirable behavior. To show the viability of this framework, we used it to create machine learning algorithms that precluded the dangerous behavior caused by standard machine learning algorithms in our experiments. Our framework for designing machine learning algorithms simplifies the safe and responsible application of machine learning.  more » « less
Award ID(s):
1763423 1453474
NSF-PAR ID:
10172836
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Science
Volume:
366
Issue:
6468
ISSN:
0036-8075
Page Range / eLocation ID:
999 to 1004
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Many real-world analytics problems involve two significant challenges: prediction and optimization. Because of the typically complex nature of each challenge, the standard paradigm is predict-then-optimize. By and large, machine learning tools are intended to minimize prediction error and do not account for how the predictions will be used in the downstream optimization problem. In contrast, we propose a new and very general framework, called Smart “Predict, then Optimize” (SPO), which directly leverages the optimization problem structure—that is, its objective and constraints—for designing better prediction models. A key component of our framework is the SPO loss function, which measures the decision error induced by a prediction. Training a prediction model with respect to the SPO loss is computationally challenging, and, thus, we derive, using duality theory, a convex surrogate loss function, which we call the SPO+ loss. Most importantly, we prove that the SPO+ loss is statistically consistent with respect to the SPO loss under mild conditions. Our SPO+ loss function can tractably handle any polyhedral, convex, or even mixed-integer optimization problem with a linear objective. Numerical experiments on shortest-path and portfolio-optimization problems show that the SPO framework can lead to significant improvement under the predict-then-optimize paradigm, in particular, when the prediction model being trained is misspecified. We find that linear models trained using SPO+ loss tend to dominate random-forest algorithms, even when the ground truth is highly nonlinear. This paper was accepted by Yinyu Ye, optimization. Supplemental Material: Data and the online appendix are available at https://doi.org/10.1287/mnsc.2020.3922 
    more » « less
  2. In recent years, there is a growing need to train machine learning models on a huge volume of data. Therefore, designing efficient distributed optimization algorithms for empirical risk minimization (ERM) has become an active and challenging research topic. In this paper, we propose a flexible framework for distributed ERM training through solving the dual problem, which provides a unified description and comparison of existing methods. Our approach requires only approximate solutions of the sub-problems involved in the optimization process, and is versatile to be applied on many large-scale machine learning problems including classification, regression, and structured prediction. We show that our framework enjoys global linear convergence for a broad class of non-strongly-convex problems, and some specific choices of the sub-problems can even achieve much faster convergence than existing approaches by a refined analysis. This improved convergence rate is also reflected in the superior empirical performance of our method. 
    more » « less
  3. Physical activity is a cornerstone of chronic conditions and one of the most critical factors in reducing the risks of cardiovascular diseases, the leading cause of death in the United States. App-based lifestyle interventions have been utilized to promote physical activity in people with or at risk for chronic conditions. However, these mHealth tools have remained largely static and do not adapt to the changing behavior of the user. In a step toward designing adaptive interventions, we propose BeWell24Plus, a framework for monitoring activity and user engagement and developing computational models for outcome prediction and intervention design. In particular, we focus on devising algorithms that combine data about physical activity and engagement with the app to predict future physical activity performance. Knowing in advance how active a person is going to be in the next day can help with designing adaptive interventions that help individuals achieve their physical activity goals. Our technique combines the recent history of a person's physical activity with app engagement metrics such as when, how often, and for how long the app was used to forecast the near future's activity. We formulate the problem of multimodal activity forecasting and propose an LSTM-based realization of our proposed model architecture, which estimates physical activity outcomes in advance by examining the history of app usage and physical activity of the user. We demonstrate the effectiveness of our forecasting approach using data collected with 58 prediabetic people in a 9-month user study. We show that our multimodal forecasting approach outperforms single-modality forecasting by 2.2$ to 11.1% in mean-absolute-error. 
    more » « less
  4. Skolnick, Jeffrey (Ed.)
    Systematically discovering protein-ligand interactions across the entire human and pathogen genomes is critical in chemical genomics, protein function prediction, drug discovery, and many other areas. However, more than 90% of gene families remain “dark”—i.e., their small-molecule ligands are undiscovered due to experimental limitations or human/historical biases. Existing computational approaches typically fail when the dark protein differs from those with known ligands. To address this challenge, we have developed a deep learning framework, called PortalCG, which consists of four novel components: (i) a 3-dimensional ligand binding site enhanced sequence pre-training strategy to encode the evolutionary links between ligand-binding sites across gene families; (ii) an end-to-end pretraining-fine-tuning strategy to reduce the impact of inaccuracy of predicted structures on function predictions by recognizing the sequence-structure-function paradigm; (iii) a new out-of-cluster meta-learning algorithm that extracts and accumulates information learned from predicting ligands of distinct gene families (meta-data) and applies the meta-data to a dark gene family; and (iv) a stress model selection step, using different gene families in the test data from those in the training and development data sets to facilitate model deployment in a real-world scenario. In extensive and rigorous benchmark experiments, PortalCG considerably outperformed state-of-the-art techniques of machine learning and protein-ligand docking when applied to dark gene families, and demonstrated its generalization power for target identifications and compound screenings under out-of-distribution (OOD) scenarios. Furthermore, in an external validation for the multi-target compound screening, the performance of PortalCG surpassed the rational design from medicinal chemists. Our results also suggest that a differentiable sequence-structure-function deep learning framework, where protein structural information serves as an intermediate layer, could be superior to conventional methodology where predicted protein structures were used for the compound screening. We applied PortalCG to two case studies to exemplify its potential in drug discovery: designing selective dual-antagonists of dopamine receptors for the treatment of opioid use disorder (OUD), and illuminating the understudied human genome for target diseases that do not yet have effective and safe therapeutics. Our results suggested that PortalCG is a viable solution to the OOD problem in exploring understudied regions of protein functional space. 
    more » « less
  5. Abstract

    Multimarginal Optimal Transport (MOT) has attracted significant interest due to applications in machine learning, statistics, and the sciences. However, in most applications, the success of MOT is severely limited by a lack of efficient algorithms. Indeed, MOT in general requires exponential time in the number of marginalskand their support sizesn. This paper develops a general theory about what “structure” makes MOT solvable in$$\mathrm {poly}(n,k)$$poly(n,k)time. We develop a unified algorithmic framework for solving MOT in$$\mathrm {poly}(n,k)$$poly(n,k)time by characterizing the structure that different algorithms require in terms of simple variants of the dual feasibility oracle. This framework has several benefits. First, it enables us to show that the Sinkhorn algorithm, which is currently the most popular MOT algorithm, requires strictly more structure than other algorithms do to solve MOT in$$\mathrm {poly}(n,k)$$poly(n,k)time. Second, our framework makes it much simpler to develop$$\mathrm {poly}(n,k)$$poly(n,k)time algorithms for a given MOT problem. In particular, it is necessary and sufficient to (approximately) solve the dual feasibility oracle—which is much more amenable to standard algorithmic techniques. We illustrate this ease-of-use by developing$$\mathrm {poly}(n,k)$$poly(n,k)-time algorithms for three general classes of MOT cost structures: (1) graphical structure; (2) set-optimization structure; and (3) low-rank plus sparse structure. For structure (1), we recover the known result that Sinkhorn has$$\mathrm {poly}(n,k)$$poly(n,k)runtime; moreover, we provide the first$$\mathrm {poly}(n,k)$$poly(n,k)time algorithms for computing solutions that are exact and sparse. For structures (2)-(3), we give the first$$\mathrm {poly}(n,k)$$poly(n,k)time algorithms, even for approximate computation. Together, these three structures encompass many—if not most—current applications of MOT.

     
    more » « less