In this paper, we study a class of online optimization problems with long-term budget constraints where the objective functions are not necessarily concave (nor convex), but they instead satisfy the Diminishing Returns (DR) property. In this online setting, a sequence of monotone DR-submodular objective functions and linear budget functions arrive over time and assuming a limited total budget, the goal is to take actions at each time, before observing the utility and budget function arriving at that round, to achieve sub-linear regret bound while the total budget violation is sub-linear as well. Prior work has shown that achieving sub-linear regret and total budget violation simultaneously is impossible if the utility and budget functions are chosen adversarially. Therefore, we modify the notion of regret by comparing the agent against the best fixed decision in hindsight which satisfies the budget constraint proportionally over any window of length W. We propose the Online Saddle Point Hybrid Gradient (OSPHG) algorithm to solve this class of online problems. For W = T, we recover the aforementioned impossibility result. However, if W is sublinear in T, we show that it is possible to obtain sub-linear bounds for both the regret and the total budget violation.
more »
« less
An Optimal Learning Algorithm for Online Unconstrained Submodular Maximization
This paper considers a basic problem at the interface of submodular optimization and online learning. In the online unconstrained submodular maximization problem, there is a universe [n] = {1, 2, . . . , n} and a sequence of T nonnegative (not necessarily monotone) submodular functions arrive over time. The goal is to design a computationally efficient online algorithm, which chooses a subset of [n] at each time step as a function only of the past, such that the accumulated value of the chosen subsets is as close as possible to the maximum total value of a fixed subset in hindsight. Our main result is a polynomial time no-1/2-regret algorithm for this problem, meaning that for every sequence of nonnegative submodular functions, the algorithmβs expected total value is at least 1/2 times that of the best subset in hindsight, up to an error term sublinear in T. The factor of 1/2 cannot be improved upon by any polynomial-time online algorithm when the submodular functions are presented as value oracles.
more »
« less
- Award ID(s):
- 1813188
- PAR ID:
- 10096361
- Date Published:
- Journal Name:
- 31st Annual Conference on Learning Theory (COLT)
- Page Range / eLocation ID:
- 1307β1325
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We study the problem of computing and learning non-anonymous reserve prices to maximize revenue. We first define the M aximizing M ultiple R eserves (MMR) problem in single-parameter matroid environments, where the input is m valuation profiles v 1 ,β¦, v m , indexed by the same n bidders, and the goal is to compute the vector r of (non-anonymous) reserve prices that maximizes the total revenue obtained on these profiles by the VCG mechanism with reserves r . We prove that the problem is APX -hard, even in the special case of single-item environments, and give a polynomial-time 1/2-approximation algorithm for it in arbitrary matroid environments. We then consider the online no-regret learning problem and show how to exploit the special structure of the MMR problem to translate our offline approximation algorithm into an online learning algorithm that achieves asymptotically time-averaged revenue at least 1/2 times that of the best fixed reserve prices in hindsight. On the negative side, we show that, quite generally, computational hardness for the offline optimization problem translates to computational hardness for obtaining vanishing time-averaged regret. Thus, our hardness result for the MMR problem implies that computationally efficient online learning requires approximation, even in the special case of single-item auction environments.more » « less
-
null; null; null; null; null (Ed.)We consider an online optimization problem in which the reward functions are DR-submodular, and in addition to maximizing the total reward, the sequence of decisions must satisfy some convex constraints on average. Specifically, at each round t, upon committing to an action x_t, a DR-submodular utility function f_t and a convex constraint function g_t are revealed, and the goal is to maximize the overall utility while ensuring the average of the constraint functions is non-positive (so constraints are satisfied on average). Such cumulative constraints arise naturally in applications where the average resource consumption is required to remain below a specified threshold. We study this problem under an adversarial model and a stochastic model for the convex constraints, where the functions g_t can vary arbitrarily or according to an i.i.d. process over time. We propose a single algorithm which achieves sub-linear regret (with respect to the time horizon T) as well as sub-linear constraint violation bounds in both settings, without prior knowledge of the regime. Prior works have studied this problem in the special case of linear constraint functions. Our results not only improve upon the existing bounds under linear cumulative constraints, but also give the first sub-linear bounds for general convex long-term constraints.more » « less
-
We propose a novel combinatorial stochastic-greedy bandit (SGB) algorithm for combinatorial multi-armed bandit problems when no extra information other than the joint reward of the selected set of n arms at each time step t in [T] is observed. SGB adopts an optimized stochastic-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms. Unlike existing methods that explore the entire set of unselected base arms during each selection step, our SGB algorithm samples only an optimized proportion of unselected arms and selects actions from this subset. We prove that our algorithm achieves a (1-1/e)-regret bound of O(n^(1/3) k^(2/3) T^(2/3) log(T)^(2/3)) for monotone stochastic submodular rewards, which outperforms the state-of-the-art in terms of the cardinality constraint k. Furthermore, we empirically evaluate the performance of our algorithm in the context of online constrained social influence maximization. Our results demonstrate that our proposed approach consistently outperforms the other algorithms, increasing the performance gap as k grows.more » « less
-
null (Ed.)In this work we consider the problem of online submodular maximization under a cardinality constraint with differential privacy (DP). A stream of T submodular functions over a common finite ground set U arrives online, and at each time-step the decision maker must choose at most k elements of U before observing the function. The decision maker obtains a profit equal to the function evaluated on the chosen set and aims to learn a sequence of sets that achieves low expected regret. In the full-information setting, we develop an (π,πΏ)-DP algorithm with expected (1-1/e)-regret bound of π(π2log|π|πlogπ/πΏβπ). This algorithm contains k ordered experts that learn the best marginal increments for each item over the whole time horizon while maintaining privacy of the functions. In the bandit setting, we provide an (π,πΏ+π(πβπ1/3))-DP algorithm with expected (1-1/e)-regret bound of π(logπ/πΏβπ(π(|π|log|π|)1/3)2π2/3). One challenge for privacy in this setting is that the payoff and feedback of expert i depends on the actions taken by her i-1 predecessors. This particular type of information leakage is not covered by post-processing, and new analysis is required. Our techniques for maintaining privacy with feedforward may be of independent interest.more » « less
An official website of the United States government

