skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Differentially Private Online Submodular Maximization
In this work we consider the problem of online submodular maximization under a cardinality constraint with differential privacy (DP). A stream of T submodular functions over a common finite ground set U arrives online, and at each time-step the decision maker must choose at most k elements of U before observing the function. The decision maker obtains a profit equal to the function evaluated on the chosen set and aims to learn a sequence of sets that achieves low expected regret. In the full-information setting, we develop an (πœ€,𝛿)-DP algorithm with expected (1-1/e)-regret bound of 𝑂(π‘˜2log|π‘ˆ|𝑇logπ‘˜/π›Ώβˆšπœ€). This algorithm contains k ordered experts that learn the best marginal increments for each item over the whole time horizon while maintaining privacy of the functions. In the bandit setting, we provide an (πœ€,𝛿+𝑂(π‘’βˆ’π‘‡1/3))-DP algorithm with expected (1-1/e)-regret bound of 𝑂(logπ‘˜/π›Ώβˆšπœ€(π‘˜(|π‘ˆ|log|π‘ˆ|)1/3)2𝑇2/3). One challenge for privacy in this setting is that the payoff and feedback of expert i depends on the actions taken by her i-1 predecessors. This particular type of information leakage is not covered by post-processing, and new analysis is required. Our techniques for maintaining privacy with feedforward may be of independent interest.  more » « less
Award ID(s):
1850187 1942772
PAR ID:
10223682
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
130
ISSN:
2640-3498
Page Range / eLocation ID:
1279-1287
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Banerjee, Arindam; Fukumizu, Kenji (Ed.)
    Numerous tasks in machine learning and artificial intelligence have been modeled as submodular maximization problems. These problems usually involve sensitive data about individuals, and in addition to maximizing the utility, privacy concerns should be considered. In this paper, we study the general framework of non-negative monotone submodular maximization subject to matroid or knapsack constraints in both offline and online settings. For the offline setting, we propose a differentially private $$(1-\frac{\kappa}{e})$$-approximation algorithm, where $$\kappa\in[0,1]$$ is the total curvature of the submodular set function, which improves upon prior works in terms of approximation guarantee and query complexity under the same privacy budget. In the online setting, we propose the first differentially private algorithm, and we specify the conditions under which the regret bound scales as $$Ø(\sqrt{T})$$, i.e., privacy could be ensured while maintaining the same regret bound as the optimal regret guarantee in the non-private setting. 
    more » « less
  2. We propose a novel combinatorial stochastic-greedy bandit (SGB) algorithm for combinatorial multi-armed bandit problems when no extra information other than the joint reward of the selected set of n arms at each time step t in [T] is observed. SGB adopts an optimized stochastic-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms. Unlike existing methods that explore the entire set of unselected base arms during each selection step, our SGB algorithm samples only an optimized proportion of unselected arms and selects actions from this subset. We prove that our algorithm achieves a (1-1/e)-regret bound of O(n^(1/3) k^(2/3) T^(2/3) log(T)^(2/3)) for monotone stochastic submodular rewards, which outperforms the state-of-the-art in terms of the cardinality constraint k. Furthermore, we empirically evaluate the performance of our algorithm in the context of online constrained social influence maximization. Our results demonstrate that our proposed approach consistently outperforms the other algorithms, increasing the performance gap as k grows. 
    more » « less
  3. In this paper, we study the problem of optimal data collection for policy evaluation in linear bandits. In policy evaluation, we are given a \textit{target} policy and asked to estimate the expected reward it will obtain when executed in a multi-armed bandit environment. Our work is the first work that focuses on such an optimal data collection strategy for policy evaluation involving heteroscedastic reward noise in the linear bandit setting. We first formulate an optimal design for weighted least squares estimates in the heteroscedastic linear bandit setting with the knowledge of noise variances. This design minimizes the mean squared error (MSE) of the estimated value of the target policy and is termed the oracle design. Since the noise variance is typically unknown, we then introduce a novel algorithm, SPEED (\textbf{S}tructured \textbf{P}olicy \textbf{E}valuation \textbf{E}xperimental \textbf{D}esign), that tracks the oracle design and derive its regret with respect to the oracle design. We show that regret scales as π‘‚Λœ(𝑑3π‘›βˆ’3/2) and prove a matching lower bound of Ξ©(𝑑2π‘›βˆ’3/2) . Finally, we evaluate SPEED on a set of policy evaluation tasks and demonstrate that it achieves MSE comparable to an optimal oracle and much lower than simply running the target policy. 
    more » « less
  4. In this paper, we study a class of online optimization problems with long-term budget constraints where the objective functions are not necessarily concave (nor convex), but they instead satisfy the Diminishing Returns (DR) property. In this online setting, a sequence of monotone DR-submodular objective functions and linear budget functions arrive over time and assuming a limited total budget, the goal is to take actions at each time, before observing the utility and budget function arriving at that round, to achieve sub-linear regret bound while the total budget violation is sub-linear as well. Prior work has shown that achieving sub-linear regret and total budget violation simultaneously is impossible if the utility and budget functions are chosen adversarially. Therefore, we modify the notion of regret by comparing the agent against the best fixed decision in hindsight which satisfies the budget constraint proportionally over any window of length W. We propose the Online Saddle Point Hybrid Gradient (OSPHG) algorithm to solve this class of online problems. For W = T, we recover the aforementioned impossibility result. However, if W is sublinear in T, we show that it is possible to obtain sub-linear bounds for both the regret and the total budget violation. 
    more » « less
  5. This paper considers a basic problem at the interface of submodular optimization and online learning. In the online unconstrained submodular maximization problem, there is a universe [n] = {1, 2, . . . , n} and a sequence of T nonnegative (not necessarily monotone) submodular functions arrive over time. The goal is to design a computationally efficient online algorithm, which chooses a subset of [n] at each time step as a function only of the past, such that the accumulated value of the chosen subsets is as close as possible to the maximum total value of a fixed subset in hindsight. Our main result is a polynomial time no-1/2-regret algorithm for this problem, meaning that for every sequence of nonnegative submodular functions, the algorithm’s expected total value is at least 1/2 times that of the best subset in hindsight, up to an error term sublinear in T. The factor of 1/2 cannot be improved upon by any polynomial-time online algorithm when the submodular functions are presented as value oracles. 
    more » « less