skip to main content


Title: Distributionally Robust Decision Making Leveraging Conditional Distributions
Distributionally robust optimization (DRO) is a powerful tool for decision making under uncertainty. It is particularly appealing because of its ability to leverage existing data. However, many practical problems call for decision- making with some auxiliary information, and DRO in the context of conditional distributions is not straightforward. We propose a conditional kernel distributionally robust optimiza- tion (CKDRO) method that enables robust decision making under conditional distributions through kernel DRO and the conditional mean operator in the reproducing kernel Hilbert space (RKHS). In particular, we consider problems where there is a correlation between the unknown variable y and an auxiliary observable variable x. Given past data of the two variables and a queried auxiliary variable, CKDRO represents the conditional distribution P(y|x) as the conditional mean operator in the RKHS space and quantifies the ambiguity set in the RKHS as well, which depends on the size of the dataset as well as the query point. To justify the use of RKHS, we demonstrate that the ambiguity set defined in RKHS can be viewed as a ball under a metric that is similar to the Wasserstein metric. The DRO is then dualized and solved via a finite dimensional convex program. The proposed CKDRO approach is applied to a generation scheduling problem and shows that the result of CKDRO is superior to common benchmarks in terms of quality and robustness.  more » « less
Award ID(s):
2144634
NSF-PAR ID:
10390336
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the IEEE 61st Conference on Decision and Control (CDC)
Page Range / eLocation ID:
5652-5659
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Distributionally robust optimization (DRO) has been introduced for solving stochastic programs in which the distribution of the random variables is unknown and must be estimated by samples from that distribution. A key element of DRO is the construction of the ambiguity set, which is a set of distributions that contains the true distribution with a high probability. Assuming that the true distribution has a probability density function, we propose a class of ambiguity sets based on confidence bands of the true density function. As examples, we consider the shape-restricted confidence bands and the confidence bands constructed with a kernel density estimation technique. The former allows us to incorporate the prior knowledge of the shape of the underlying density function (e.g., unimodality and monotonicity), and the latter enables us to handle multidimensional cases. Furthermore, we establish the convergence of the optimal value of DRO to that of the underlying stochastic program as the sample size increases. The DRO with our ambiguity set involves functional decision variables and infinitely many constraints. To address this challenge, we apply duality theory to reformulate the DRO to a finite-dimensional stochastic program, which is amenable to a stochastic subgradient scheme as a solution method. 
    more » « less
  2. We consider the problem of learning the underlying structure of a general discrete pairwise Markov network. Existing approaches that rely on empirical risk minimization may perform poorly in settings with noisy or scarce data. To overcome these limitations, we propose a computationally efficient and robust learning method for this problem with near-optimal sample complexities. Our approach builds upon distributionally robust optimization (DRO) and maximum conditional log-likelihood. The proposed DRO estimator minimizes the worst-case risk over an ambiguity set of adversarial distributions within bounded transport cost or f-divergence of the empirical data distribution. We show that the primal minimax learning problem can be efficiently solved by leveraging sufficient statistics and greedy maximization in the ostensibly intractable dual formulation. Based on DRO’s approximation to Lipschitz and variance regularization, we derive near-optimal sample complexities matching existing results. Extensive empirical evidence with different corruption models corroborates the effectiveness of the proposed methods. 
    more » « less
  3. In a chance constrained program (CCP), decision makers seek the best decision whose probability of violating the uncertainty constraints is within the prespecified risk level. As a CCP is often nonconvex and is difficult to solve to optimality, much effort has been devoted to developing convex inner approximations for a CCP, among which the conditional value-at-risk (CVaR) has been known to be the best for more than a decade. This paper studies and generalizes the ALSO-X, originally proposed by Ahmed, Luedtke, SOng, and Xie in 2017 , for solving a CCP. We first show that the ALSO-X resembles a bilevel optimization, where the upper-level problem is to find the best objective function value and enforce the feasibility of a CCP for a given decision from the lower-level problem, and the lower-level problem is to minimize the expectation of constraint violations subject to the upper bound of the objective function value provided by the upper-level problem. This interpretation motivates us to prove that when uncertain constraints are convex in the decision variables, ALSO-X always outperforms the CVaR approximation. We further show (i) sufficient conditions under which ALSO-X can recover an optimal solution to a CCP; (ii) an equivalent bilinear programming formulation of a CCP, inspiring us to enhance ALSO-X with a convergent alternating minimization method (ALSO-X+); and (iii) an extension of ALSO-X and ALSO-X+ to distributionally robust chance constrained programs (DRCCPs) under the ∞−Wasserstein ambiguity set. Our numerical study demonstrates the effectiveness of the proposed methods. 
    more » « less
  4. null (Ed.)
    Multiple Instance Learning (MIL) provides a promising solution to many real-world problems, where labels are only available at the bag level but missing for instances due to a high labeling cost. As a powerful Bayesian non-parametric model, Gaussian Processes (GP) have been extended from classical supervised learning to MIL settings, aiming to identify the most likely positive (or least negative) instance from a positive (or negative) bag using only the bag-level labels. However, solely focusing on a single instance in a bag makes the model less robust to outliers or multi-modal scenarios, where a single bag contains a diverse set of positive instances. We propose a general GP mixture framework that simultaneously considers multiple instances through a latent mixture model. By adding a top-k constraint, the framework is equivalent to choosing the top-k most positive instances, making it more robust to outliers and multimodal scenarios. We further introduce a Distributionally Robust Optimization (DRO) constraint that removes the limitation of specifying a fixed k value. To ensure the prediction power over high-dimensional data (e.g., videos and images) that are common in MIL, we augment the GP kernel with  fixed basis functions by using a deep neural network to learn adaptive basis functions so that the covariance structure of high-dimensional data can be accurately captured. Experiments are conducted on highly challenging real-world video anomaly detection tasks to demonstrate the effectiveness of the proposed model. 
    more » « less
  5. Multiple Instance Learning (MIL) provides a promising solution to many real-world problems, where labels are only available at the bag level but missing for instances due to a high labeling cost. As a powerful Bayesian non-parametric model, Gaussian Processes (GP) have been extended from classical supervised learning to MIL settings, aiming to identify the most likely positive (or least negative) instance from a positive (or negative) bag using only the bag-level labels. However, solely focusing on a single instance in a bag makes the model less robust to outliers or multi-modal scenarios, where a single bag contains a diverse set of positive instances. We propose a general GP mixture framework that simultaneously considers multiple instances through a latent mixture model. By adding a top-k constraint, the framework is equivalent to choosing the top-k most positive instances, making it more robust to outliers and multimodal scenarios. We further introduce a Distributionally Robust Optimization (DRO) constraint that removes the limitation of specifying a fix k value. To ensure the prediction power over high-dimensional data (eg, videos and images) that are common in MIL, we augment the GP kernel with fixed basis functions by using a deep neural network to learn adaptive basis functions so that the covariance structure of high-dimensional data can be accurately captured. Experiments are conducted on highly challenging real-world video anomaly detection tasks to demonstrate the effectiveness of the proposed model. 
    more » « less