skip to main content


Title: Distributionally Robust Submodular Maximization
ubmodular functions have applications throughout machine learning, but in many settings, we do not have direct access to the underlying function f . We focus on stochastic functions that are given as an expectation of functions over a distribution P. In practice, we often have only a limited set of samples fi from P . The standard approach indirectly optimizes f by maximizing the sum of fi. However, this ignores generalization to the true (unknown) distribution. In this paper, we achieve better performance on the actual underlying function f by directly optimizing a combination of bias and variance. Algorith- mically, we accomplish this by showing how to carry out distributionally robust optimiza- tion (DRO) for submodular functions, pro- viding efficient algorithms backed by theoret- ical guarantees which leverage several novel contributions to the general theory of DRO. We also show compelling empirical evidence that DRO improves generalization to the un- known stochastic submodular function.  more » « less
Award ID(s):
1717610
NSF-PAR ID:
10100048
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the International Workshop on Artificial Intelligence and Statistics
ISSN:
1525-531X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider optimal transport-based distributionally robust optimization (DRO) problems with locally strongly convex transport cost functions and affine decision rules. Under conventional convexity assumptions on the underlying loss function, we obtain structural results about the value function, the optimal policy, and the worst-case optimal transport adversarial model. These results expose a rich structure embedded in the DRO problem (e.g., strong convexity even if the non-DRO problem is not strongly convex, a suitable scaling of the Lagrangian for the DRO constraint, etc., which are crucial for the design of efficient algorithms). As a consequence of these results, one can develop efficient optimization procedures that have the same sample and iteration complexity as a natural non-DRO benchmark algorithm, such as stochastic gradient descent.

     
    more » « less
  2. We study a general stochastic ranking problem in which an algorithm needs to adaptively select a sequence of elements so as to “cover” a random scenario (drawn from a known distribution) at minimum expected cost. The coverage of each scenario is captured by an individual submodular function, in which the scenario is said to be covered when its function value goes above a given threshold. We obtain a logarithmic factor approximation algorithm for this adaptive ranking problem, which is the best possible (unless P = NP). This problem unifies and generalizes many previously studied problems with applications in search ranking and active learning. The approximation ratio of our algorithm either matches or improves the best result known in each of these special cases. Furthermore, we extend our results to an adaptive vehicle-routing problem, in which costs are determined by an underlying metric. This routing problem is a significant generalization of the previously studied adaptive traveling salesman and traveling repairman problems. Our approximation ratio nearly matches the best bound known for these special cases. Finally, we present experimental results for some applications of adaptive ranking. 
    more » « less
  3. null (Ed.)
    Distributionally robust optimization (DRO) has been introduced for solving stochastic programs in which the distribution of the random variables is unknown and must be estimated by samples from that distribution. A key element of DRO is the construction of the ambiguity set, which is a set of distributions that contains the true distribution with a high probability. Assuming that the true distribution has a probability density function, we propose a class of ambiguity sets based on confidence bands of the true density function. As examples, we consider the shape-restricted confidence bands and the confidence bands constructed with a kernel density estimation technique. The former allows us to incorporate the prior knowledge of the shape of the underlying density function (e.g., unimodality and monotonicity), and the latter enables us to handle multidimensional cases. Furthermore, we establish the convergence of the optimal value of DRO to that of the underlying stochastic program as the sample size increases. The DRO with our ambiguity set involves functional decision variables and infinitely many constraints. To address this challenge, we apply duality theory to reformulate the DRO to a finite-dimensional stochastic program, which is amenable to a stochastic subgradient scheme as a solution method. 
    more » « less
  4. null (Ed.)
    In the classical selection problem, the input consists of a collection of elements and the goal is to pick a subset of elements from the collection such that some objective function f is maximized. This problem has been studied extensively in the data-mining community and it has multiple applications including influence maximization in social networks, team formation and recommender systems. A particularly popular formulation that captures the needs of many such applications is one where the objective function f is a monotone and non-negative submodular function. In these cases, the corresponding computational problem can be solved using a simple greedy (1- 1/e)-approximation algorithm. In this paper, we consider a generalization of the above formulation where the goal is to optimize a function that maximizes the submodular function f minus a linear cost function c. This formulation appears as a more natural one, particularly when one needs to strike a balance between the value of the objective function and the cost being paid in order to pick the selected elements. We address variants of this problem both in an offline setting, where the collection is known apriori, as well as in online settings, where the elements of the collection arrive in an online fashion. We demonstrate that by using simple variants of the standard greedy algorithm (used for submodular optimization) we can design algorithms that have provable approximation guarantees, are extremely efficient and work very well in practice. 
    more » « less
  5. In the classical selection problem, the input consists of a collection of elements and the goal is to pick a subset of elements from the collection such that some objective function $f$ is maximized. This problem has been studied extensively in the data-mining community and it has multiple applications including influence maximization in social networks, team formation and recommender systems. A particularly popular formulation that captures the needs of many such applications is one where the objective function $f$ is a monotone and non-negative submodular function. In these cases, the corresponding computational problem can be solved using a simple greedy $(1-1/e)$-approximation algorithm. In this paper, we consider a generalization of the above formulation where the goal is to optimize a function that maximizes the submodular function $f$ minus a linear cost function $c$. This formulation appears as a more natural one, particularly when one needs to strike a balance between the value of the objective function and the cost being paid in order to pick the selected elements. We address variants of this problem both in an offline setting, where the collection is known apriori, as well as in online settings, where the elements of the collection arrive in an online fashion. We demonstrate that by using simple variants of the standard greedy algorithm (used for submodular optimization) we can design algorithms that have provable approximation guarantees, are extremely efficient and work very well in practice. 
    more » « less