<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Structured Robust Submodular Maximization: Offline and Online Algorithms</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>05/07/2019</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10106917</idno>
					<idno type="doi"></idno>
					<title level='j'>22nd International Conference on Artificial Intelligence and Statistics (AISTATS) 2019</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>N Anari</author><author>N. Haghtalab</author><author>S Naor</author><author>S. Pokutta</author><author>M. Singh</author><author>A. Torrico</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Constrained submodular function maximization has been used in subset selection problems such as selection of most informative sensor locations. While these models have been quite popular, the solutions obtained via this approach are unstable to perturbations in data defining the submodular functions. Robust submodular maximization has been proposed as a richer model that aims to overcome this discrepancy as well as increase the modeling scope of submodular optimization. In this work, we consider robust submodular maximization with structured combinatorial constraints and give efficient algorithms with provable guarantees. Our approach is applicable to constraints defined by single or multiple matroids, knapsack as well as distributionally robust criteria. We consider both the offline setting where the data defining the problem is known in advance as well as the online setting where the input data is revealed over time. For the offline setting, we give a nearly optimal bi-criteria approximation algorithm that relies on new extensions of the classical greedy algorithm. For the online version of the problem, we give an algorithm that returns a bi-criteria solution with sub-linear regret.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Constrained submodular function maximization has seen significant progress in recent years in the design and analysis of new algorithms with guarantees <ref type="bibr">(Calinescu et al., 2011;</ref><ref type="bibr">Ene and Nguyen, 2016;</ref><ref type="bibr">Buchbinder and Feldman, 2016;</ref><ref type="bibr">Sviridenko, 2004)</ref>, as well as numerous applications -especially in constrained subset selection problems <ref type="bibr">(Powers et al., 2016a;</ref><ref type="bibr">Lin and Bilmes, 2009;</ref><ref type="bibr">Krause and Guestrin, 2005;</ref><ref type="bibr">Krause et al., 2009</ref><ref type="bibr">Krause et al., , 2008a,b) ,b)</ref> and more broadly machine learning. A typical example is the problem of picking a subset of candidate sensor locations for spatial monitoring of certain phenomena such as temperature, ph values, humidity, etc. (see <ref type="bibr">(Krause et al., 2008a)</ref>). Here the goal is typically to find sensor locations that achieve the most coverage or give the most information about the observed phenomena. Submodularity naturally captures the decreasing marginal gain in the coverage, or the information acquired about relevant phenomena by using more sensors, <ref type="bibr">(Das and Kempe, 2008)</ref>. While submodular optimization offers an attractive model for such scenarios, there are a few key shortcomings, which motivated robust submodular optimization (see <ref type="bibr">(Krause et al., 2008a)</ref>) in the cardinality case, so as to optimize against several functions simultaneously:</p><p>1. The sensors are typically used to measure various parameters at the same time. Observations for these parameters need to be modeled via different submodular functions.</p><p>2. Many of the phenomena being observed are nonstationary and highly variable in certain locations.</p><p>To obtain a good solution, a common approach is to use different submodular functions to model different spatial regions.</p><p>3. The submodular functions are typically defined using data obtained from observations, and imprecise information can lead to unstable optimization problems. Thus, there is a desire to compute solutions that are robust to perturbations of the submodular functions.</p><p>Our main contribution is the development of new algorithms with provable guarantees for robust submodular optimization under a large class of combinatorial constraints. These include partition constraints, where local cardinality constraints are placed on disjoint parts of the ground set. More generally, we consider matroid and knapsack constraints. We provide bi-criteria approximations that trade-off the approximation factor with the "size" of the solution, measured by the number `of feasible sets {S i } i2 <ref type="bibr">[`]</ref> whose union constitutes the final solution S. While this might be nonintuitive at first, it turns out that the union of feasible sets corresponds to an appropriate relaxation of the single cardinality constraint. Some special cases of interest are:</p><p>1. Partition constraints. Given a partition of the candidate sensor locations, the feasible sets correspond to subsets that satisfy a cardinality constraint on each part of the partition. The union of feasible sets here corresponds to relaxing the cardinality constraints separately for each part. This results in a stronger guarantee than relaxing the constraint globally as would be the case in the single cardinality constraint case.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2.</head><p>Gammoid. Given a directed graph and a subset of nodes T , the feasible sets correspond to subsets S that can reach T via disjoint paths in the graph. Gammoids appear in flow based models, for example in reliable routing. The union of feasible sets now corresponds to sets S that can reach T via paths such that each vertex appears in few paths.</p><p>We consider both offline and online versions of the problem, where the data is either known a-priori or is revealed over time, respectively. We give a simple and efficient greedy-like algorithm for the offline version of the problem. The analysis relies on new insights on the performance of the classical greedy algorithm for submodular maximization, when extended to produce a solution comprising of a union of multiple feasible sets. For the online case, we introduce new technical ingredients that might be broadly applicable in online robust optimization. Our work significantly expands on previous works on robust submodular optimization that focused on a single cardinality constraint <ref type="bibr">(Krause et al., 2008a)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1">Problem Formulation</head><p>As we describe below, we study offline and online variations of robust submodular maximization under struc-tured combinatorial constraints. While our results holds for more general constraints, we focus our attention first on matroid constraints that generalize the partition as well as the gammoid structural constraints mentioned above. We discuss extensions to other class of constraints in Appendix A.</p><p>Consider a non-negative set function f : 2 V ! R + . We denote the marginal value for any subset A &#10003; V and e 2 V by f A (e) := f (A + e) f (A), where A + e := A [ {e}. Function f is submodular if and only if it satisfies the diminishing returns property. Namely, for any e 2 V and</p><p>Most of our results are concerned with optimization of monotone submodular functions.</p><p>A natural class of constraints considered in submodular optimization are matroid constraints. For a ground set V and a family of sets</p><p>Sets in such a family I are called independent sets, or simply put, feasible sets for the purpose of optimization.</p><p>We consider the robust variation of submodular optimization. That is, for a matroid M = (V, I), and a given collection of k monotone submodular functions</p><p>our goal is to select a set S that maximizes min i2[k] f i (S). We define a (1 &#9999;)approximately optimal solution S as</p><p>f i (S).</p><p>(1)</p><p>We also consider the online variation of the above optimization problem in presence of an adversary. In this setting, we are given a fixed matroid M = (V, I).</p><p>At each time step t 2 [T ], we choose a set S t . An adversary then selects a collection of k monotone submodular functions</p><p>where the expectation is taken over any randomness in choosing S t . We can then use the knowledge of the adversary's actions, i.e., oracle access to {f t i } i2[k] , in our future decisions. We consider non-adaptive adversaries whose choices {f t i } i2[k] are independent of S &#8999; for &#8999; &lt; t. In other words, an adversarial sequence of functions</p><p>is chosen upfront without being revealed to the optimization algorithm.</p><p>Our goal is to design an algorithm that maximizes the total payoff</p><p>. Thus, we would like to obtain a cumulative reward that competes with that of the fixed set S 2 I we should have played had we known all the functions f t i in advance, i.e., compete with max S2I</p><p>As in the offline optimization problem, we also consider competing with (1 &#9999;) fraction of the above benchmark. In this case, Regret 1 &#9999; (T ) denotes how far we are from this goal. That is,</p><p>We desire algorithms whose (1 &#9999;)-regret is sublinear in T . That is, we get arbitrarily close to a (1 &#9999;) fraction of the benchmark as T ! 1.</p><p>The offline (Equation <ref type="formula">1</ref>), or online (Equation <ref type="formula">2</ref>) variations of robust monotone submodular functions, are known to be NP-hard to approximate to any polynomial factor when the algorithm's choices are restricted to the family of independent sets I <ref type="bibr">(Krause et al., 2008a)</ref>. Therefore, to obtain any reasonable approximation guarantee we need to relax the algorithm's constraint set. Such an approximation approach is called a bi-criteria approximation scheme in which the algorithm outputs a set with a nearly optimal objective value, while ensuring that the set used is the union of only a few independent sets in I. More formally, to get a (1 &#9999;)-approximate solutions, we may use a set S where S = S 1 [ &#8226; &#8226; &#8226; [ S `such that S 1 , . . . , S `2 I and &#236;s a function of 1 &#9999; and other parameters.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2">Our Results and Contributions</head><p>We present (nearly tight) bi-criteria approximation algorithms for the offline and online variations of robust monotone submodular optimization under matroid constraints. Throughout the paper, we assume that the matroid is accessible via an independence oracle and the submodular functions are accessible via a value oracle. Moreover, we use log to denote logarithm with base 2 and ln to denote the natural logarithm.</p><p>For the offline setting of the problem we obtain the following result:</p><p>Theorem 1. For the offline robust submodular optimization problem (1), for any 0 &lt; &#9999; &lt; 1, there is a polynomial time algorithm that runs in O nr log k &#9999; log(n) min nk &#9999; , log 1+&#9999; (max e,j f j (e)) time and returns a set S ALG , such that</p><p>where</p><p>, and S 1 , . . . , S `2 I.</p><p>The algorithm that achieves this result is an extension of the greedy algorithm. It reuses the standard greedy algorithm of <ref type="bibr">Fisher et al. (1978)</ref> in an iterative scheme, so that it generates a small family of independent sets whose union achieves the (1 &#9999;)-guarantee. The argument is reminiscent of a well-known fact for submodular function maximization under cardinality constraints using the greedy algorithm: letting the greedy algorithm run longer results in better approximations at the expense of violating the cardinality constraint. Our extended greedy algorithm works in a similar spirit, however it iteratively produces independent sets in the matroid. We present the main results and the corresponding proofs in Section 2. Additionally, we also propose a second, randomized algorithm relying on continuous extensions of submodular functions that achieves tight bounds in line with the hardness result in <ref type="bibr">(Krause et al., 2008a)</ref> (see Appendix B). This algorithm also forms the basis of the online algorithm that we present later in Section 3. One might hope that similar results can be obtained even when functions are non-monotone (but still submodular). As we show in Appendix B.3 this is not possible.</p><p>A natural question is whether our algorithm can be carried over into the online setting, where functions are revealed over time. For the online setting, we present the first results for robust submodular optimization.</p><p>Theorem 2. For the online robust submodular optimization problem, for any 0 &lt; &#9999; &lt; 1, there is a randomized polynomial time algorithm that returns a set</p><p>where</p><p>We remark that the guarantee of Theorem 2 holds with respect to the minimum of E[f t i (S t )], as opposed to the guarantee of Theorem 1 that directly bounds the minimum of f i (S). Therefore, the solution for the online algorithm is a union of only O ln 1 &#9999; independent sets, in contrast to the offline solution which is the union of O log k &#9999; independent sets. The main challenge in the online algorithm is to deal with non-convexity and non-smoothness due to submodularity exacerbated by the robustness criteria. Our approach to coping with the robustness criteria is to use the soft-min function 1 &#8629; ln</p><p>e &#8629;gi , defined for a collection of smooth functions {g i } i2[k] and a suitable parameter &#8629; &gt; 0. While the choice of the specific soft-min function is seemingly arbitrary, one feature is crucial for us: its gradient is a convex combination of the gradients of the g i 's. Using this observation, we use parallel instances of the Follow-the-Perturbed-Leader (FPL) algorithm, presented by <ref type="bibr">Kalai and Vem-pala (2005)</ref>, one for each discretization step in the continuous greedy algorithm. We believe that the algorithm might be of independent interest to perform online learning over a minimum of many functions, a common feature in robust optimization. The main result and a summary of its proof appears in Section 3.</p><p>Our main results naturally extend to other types of combinatorial constraints, such as knapsack constraints or multiple matroids. We describe these extensions in the Supplemental Material (Appendix A) due to space restrictions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.3">Related Work</head><p>Building on the classical work of <ref type="bibr">Nemhauser et al. (1978)</ref>, constrained submodular maximization problems have seen much progress recently (see for example <ref type="bibr">(Calinescu et al., 2011;</ref><ref type="bibr">Chekuri et al., 2010;</ref><ref type="bibr">Buchbinder et al., 2014</ref><ref type="bibr">Buchbinder et al., , 2016))</ref>). Robust submodular maximization generalizes submodular function maximization under a matroid constraint for which a (1 1 e )-approximation is known <ref type="bibr">(Calinescu et al., 2011)</ref> and is optimal. The problem has been studied for constant k by Chekuri et al. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#8984;</head><p>. Closely related to our problem is the submodular cover problem where we are given a submodular function f , a target b 2 R + , and the goal is to find a set S of minimum cardinality such that f (S) b. A simple reduction shows that robust submodular maximization under a cardinality constraint reduces to the submodular cover problem <ref type="bibr">(Krause et al., 2008a)</ref>. <ref type="bibr">Wolsey (1982)</ref> showed that the greedy algorithm gives an O(ln n &#9999; )-approximation, where the output set S satisfies f (S) (1 &#9999;)b. <ref type="bibr">Krause et al. (2008a)</ref> use this approximation to build a bi-criteria algorithm which achieves tight bounds. However, this approach falls short of achieving a tight bi-criteria approximation when the problem is defined over a matroid. <ref type="bibr">Powers et al. (2016b)</ref> considers the same robust problem with matroid constraints. However, they take a different approach by presenting a bi-criteria algorithm that outputs a feasible set that is good only for a fraction of the k monotone submodular functions. A deletionrobust submodular optimization model is presented in <ref type="bibr">(Krause et al., 2008a)</ref>, which is later studied by <ref type="bibr">Orlin et al. (2016)</ref>; <ref type="bibr">Bogunovic et al. (2017)</ref>; <ref type="bibr">Kazemi et al. (2018)</ref>. Influence maximization <ref type="bibr">(Kempe et al., 2003)</ref> in a network has been a successful application of submodular maximization and recently, <ref type="bibr">He and Kempe (2016)</ref> and <ref type="bibr">Chen et al. (2016)</ref> study the robust influence maximization problem. Robust optimization for non-convex objectives (including submodular functions) has been also considered by <ref type="bibr">Chen et al. (2017)</ref>, however with weaker guarantees than ours due to the extended generality. Specifically, their algorithm out-puts r log k &#9999; 2 OPT feasible sets whose union achieves a factor of (1 1/e &#9999;). Finally, Wilder (2017) studies a similar problem in which the set of feasible solutions is the set of all distributions over independent sets of a matroid. In particular, for our setting <ref type="bibr">Wilder (2017)</ref> gives an algorithm that outputs O( log k &#9999; 3 ) feasible sets whose union obtains (1 1/e) 2 fraction of the optimal solution. Our results are stronger than the ones obtained by <ref type="bibr">Chen et al. (2017)</ref> and <ref type="bibr">Wilder (2017)</ref>, since we provide the same guarantees using the union of fewer feasible sets. Other variants of the robust submodular maximization problem are studied by <ref type="bibr">Mitrovic et al. (2018)</ref>; <ref type="bibr">Staib et al. (2018)</ref>.</p><p>There has been some prior work on online submodular function maximization that we briefly review here. <ref type="bibr">Streeter and Golovin (2008)</ref> study the budgeted maximum submodular coverage problem and consider several feedback cases (denote B a integral bound for the budget): in the full information case, a (1 1 e )-expected regret of O( p BT ln n) is achieved, but the algorithm uses B experts which may be very large. In a follow-up work, Golovin et al. ( <ref type="formula">2014</ref>) study the online submodular maximization problem under partition constraints, and then they generalize it to general matroid constraints. For the latter one, the authors present an online version of the continuous greedy algorithm, which relies on the Follow-the-Perturbed-Leader algorithm of <ref type="bibr">Kalai and Vempala (2005)</ref> and obtain a (1 1 e )-expected regret of O( p T ). Similar to this approach, our bi-criteria online algorithm will also use the Follow-the-Perturbed-Leader algorithm as a subroutine.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">The Offline Case</head><p>In this section, we consider offline robust optimization (Equation <ref type="formula">1</ref>) under matroid constraints.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Offline Algorithm and Analysis</head><p>In this section, we present a procedure that achieves a (nearly) tight bi-criteria approximation for the problem of interest and prove Theorem 1. First, we extend the standard greedy algorithm for maximizing a single submodular function under matroid constraint to the bi-criteria setting and prove Theorem 3.</p><p>Observe that Algorithm 1 with `= 1 is just the greedy algorithm presented by <ref type="bibr">Fisher et al. (1978)</ref>, which gives a 1 2 -approximation. Extending the standard algorithm gives us the following result.</p><p>Theorem 3. For any ` 1 and monotone submodular function f : 2 V ! R + with f (;) = 0, the extended greedy Algorithm 1 returns sets S 1 , . . . , S `such that</p><p>Algorithm 1 Extended Greedy Algorithm for Submodular Optimization</p><p>Input: ` 1, monotone submodular function f : 2 V ! R+, Matroid M = (V, I). Output: sets S1, . . . , S `2 I.</p><p>1: for &#8999; = 1, . . . , `do 2: S&#8999; ; 3:</p><p>while S&#8999; is not a basis of M do 4:</p><p>Compute</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>5:</head><p>Update S&#8999; S&#8999; + e &#8676; .</p><p>Proof. We use the following stronger statement that for any monotone non-negative submodular function <ref type="bibr">(Fisher et al., 1978)</ref>, the greedy algorithm when run for a single iteration returns a set S 1 2 I such that f (S 1 ) f (;) 1 1 2 max S2I {f (S) f (;)}. We use the above statement to prove our theorem by induction. For &#8999; = 1, the claim follows directly. Consider any ` 2. Observe that the algorithm in iteration &#8999; = `, is exactly the greedy algorithm run on submodular function f 0 : 2</p><p>We now apply Theorem 3 for the robust submodular problem, in which we are given monotone submodular functions</p><p>First, given parameter &#9999; &gt; 0, we obtain an estimate on the value of the optimal solution OPT := max S2I min i2[k] f i (S) via a binary search with a relative error of 1 &#9999; 2 , i.e., 1 &#9999; 2 OPT &#63743; &#63743; OPT . As in <ref type="bibr">(Krause et al., 2008a)</ref>, let g : 2 V ! R + be defined for any S &#10003; V as follows</p><p>Observe that max S2I g(S) = whenever &#63743; OPT. Moreover, note that g is also a monotone submodular function.</p><p>Proof of Theorem 1. Consider the family of monotone submodular functions {f i } i2[k] and define g as in equation (3) considering parameter with relative error of 1 &#9999; 2 . If we run the extended greedy algorithm 1 on g with ` dlog 2k &#9999; e, we get a set</p><p>, where S j 2 I for all j 2 [`]. Moreover, Theorem 3 implies that</p><p>Now, we will prove that</p><p>. Therefore, we obtain</p><p>Running time analysis. In this section, we study the running time of the bi-criteria algorithm we just presented. To show that a set of polynomial size of values for exists such that one of them satisfies (1 &#9999;/2) OPT &#63743; &#63743; OPT, we simply try = nf i (e)(1 &#9999;/2) j for all i 2 [k], e 2 V , and j = 0, . . . , dln 1 &#9999;/2 (1/n)e. Note that there exists an index i &#8676; 2 [k] and a set S &#8676; 2 I such that</p><p>Because of submodularity and monotonicity we have</p><p>))e is in the correct interval, obtaining</p><p>We remark that the dependency of the running time on &#9999; can be made logarithmic by running a binary search on j as opposed to trying all j = 0, . . . , dln 1 &#9999;/2 (1/n)e. This would take at most nk &#9999; log n iterations. We could also say that doing a binary search to get a value up to a relative error of 1 &#9999;/2 of OPT would take log 1+&#9999; OPT. So, we consider the minimum of those two quantities min{ nk &#9999; log n, log 1+&#9999; OPT}. Given that the extended greedy algorithm runs in O(nr`) time, where r is the rank of the matroid and `= O(log k &#9999; ) is the number of rounds, we conclude that the bi-criteria algorithm runs in nr log k &#9999; min{ nk &#9999; log n, log 1+&#9999; OPT}.</p><p>Continuous offline algorithm. As a final remark, we give a randomized version of the extended greedy based on the continuous extensions of submodular functions. This algorithm outputs a random set S ALG which is the union of O(ln k &#9999; ) independent sets and such that with constant probability has value close to the true optimum. The number of independent sets required for obtaining this result is smaller up to a constant than the number of sets obtained by the extended greedy and optimal given the hardness results. The design of the continuous offline algorithm and its analysis are in Appendix B.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Experimental results</head><p>In this section, we provide a simple computational experiment to exemplify our theoretical guarantees. Moreover, it illustrates that our algorithm performs much better on practical instances, in both the running time as well as degree of the violation of the constraints as compared to the worst-case guarantees given by Theorem 1.</p><p>We consider the movie recommendation problem, in which there is a ground set of n movies V and a set of users U . Each user u 2 U rate a group of movies, by assigning a value r e,u 2 {1, . . . , 5}, or zero, if that user did not rate the movie. Our interest is to select a subset of the movies that are the most representative of all users' ratings. To approach this idea, we consider a facilitylocation function, i.e., f (A) := 1 5|U |</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>P</head><p>u2U max e2A r e,u . Observe that we scale by the maximum rating and the number users. From this, consider a collection of monotone submodular functions that are perturbed versions of the facility-location objective, i.e., problem (1) corresponds to max A2I min i2[k] {f (A) + P e2A\&#8676;i &#8672; e }, where f is the function defined above, &#8676; i is a random set of fixed size different for each i 2 [k], and &#8672; &#8672; [0, 1] V is an error vector. For experiments, we consider partition constraints. Formally, there is a partition {P 1 , . . . , P q } of the movies and I = {S : |S \ P j | &#63743; b, 8j 2 [q]} (same budget b for each part). We run the bi-criteria algorithm with the following parameters: number of rounds for the Extended Greedy `= dlog 2k &#9999; e, and approximation 1 &#9999; = 0.99. We used the MovieLens dataset of Harper and <ref type="bibr">Konstan (2016)</ref> with n = 1, 000 movies and |U | = 1, 000 users. We consider k = 20 objective functions, where the random sets are of size |&#8676; i | = 100. We fixed the number of parts to be q = 10 (but not the composition) and the budget b = 5. We created 20 random instances in total, where each instance corresponds to a different composition {P 1 , . . . , P q }. An optimal solution to this problem has size q &#8226; b = 50, and Theorem 1 shows that the bi-criteria algorithm &#9999; e = 60 movies in each part (instead of 5), which leads to selecting 600 movies in total. However, in our experimental results we get a much smaller set that on the average has 14.90 movies per part (with a standard deviation of 0.22). We also report results in terms of CPU time and number of function calls in Figure <ref type="figure">1</ref>. The average CPU time is 21.67 seconds with a standard deviation of 5.22. The average number of function evaluations is 42.79 &#8226; 10 4 with a standard deviation of 7.07 &#8226; 10 4 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">The Online Case</head><p>In this section, we consider the online robust optimization problem (Equation <ref type="formula">2</ref>) under matroid constraints. We introduce an online bi-criteria algorithm that achieves a sublinear (1 &#9999;)-regret while using solution S t at time t that is a union of O(ln 1 &#9999; ) independent sets from I. To start, let us first present definitions and known results that play a key role in this online optimization problem.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Background</head><p>For a set function f , its multilinear extension F : [0, 1] V ! R + is defined for any y 2 [0, 1] V as the expected value of f (S y ), where S y is the random set generated by drawing independently each element e 2 V with probability y e . Formally,</p><p>(1 y e ).</p><p>(4) Note that F is an extension of f , since for any subset S &#10003; V , we have f (S) = F (1 S ), where 1 S (e) = 1 if e 2 S and zero otherwise. For all e 2 V let e F (y) := E S&#8672;y [f (S + e) f (S)] = (1 y e )r e F (y).</p><p>(5)</p><p>We use F (y) to denote the vector whose e thcoordinate is e F (y) as defined above. Furthermore, for a matroid M, we denote by P(M) = conv{1 I | I 2 I} its matroid polytope. For any &#8999; &gt; 0 we denote by &#8999; &#8226; P(M) = conv{&#8999; &#8226; 1 I | I 2 I} the scaling of the matroid polytope.</p><p>Multilinear extension plays a crucial role in designing approximation algorithms for various constrained submodular optimization problems (see Appendix B.1 for a list of its useful properties). Notably, <ref type="bibr">Vondr&#225;k (2008)</ref> introduced the discretized continuous greedy algorithm that achieves a 1 1/e approximate solution for maximizing a single submodular function under matroid constraints (see <ref type="bibr">(Feldman et al., 2011)</ref> for the variant of the continuous greedy that we use). At a high level, this algorithm discretizes interval [0, 1] into points {0, , 2 , . . . , 1}. Starting at y 0 = 0, for each &#8999; 2 { , 2 , . . . , 1} the algorithm uses an LP to compute the direction z &#8999; = argmax z2P(M) F (y &#8999; ) &#8226; z. Then the algorithm takes a step in the direction of z &#8999; by setting y &#8999;,e y &#8999; ,e + z &#8999;,e (1 y &#8999; ,e ) for all e 2 V . Finally, it outputs a set S by rounding the fractional solution y 1 . We will use this discretized version of the continuous greedy to construct our online algorithm in the following section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Online Algorithm and Analysis</head><p>In Appendix B we provide a continuous randomized algorithm for the offline problem. Broadly speaking, at every step, this algorithm finds a feasible direction that improves all k functions and moves in that direction. We use an LP to find this direction similar to the approach of <ref type="bibr">Vondr&#225;k (2008)</ref> for the case of k = 1. However, for the online robust optimization problem, we immediately face with two challenges. First, it is not clear how to find a feasible direction z t (as was found via an LP for the offline problem) that is good for all k submodular functions. To resolve this issue, we use a soft-min function that converts robust optimization over k functions into optimizing of a single function. Secondly, robust optimization leads to non-convex and non-smooth optimization combined with online arrival of such submodular functions. To deal with this, we use the Follow-the-Perturbed-Leader (FPL) online algorithm introduced by <ref type="bibr">Kalai and Vempala (2005)</ref>.</p><p>For any collection of monotone submodular functions {f t i } i2[k] played by the adversary, we define the soft-min function with respect to the corresponding multilinear extensions</p><p>where &#8629; &gt; 0 is a suitable parameter. Recall we as-sume functions f t i taking values in [0, 1], then their multilinear extensions F t i also take values in [0, 1]. The following properties of the soft-min function as defined above are easy to verify and crucial for our result.</p><p>1. Approximation:</p><p>2. Gradient:</p><p>where p t i (y) / e &#8629;F t i (y) for all i 2 [k].</p><p>Note that as &#8629; increases, the soft-min function H t becomes a better approximation of min i2</p><p>, however, its smoothness degrades (see Property (17) in Appendix C.1). On the other hand, the second property shows that the gradient of the soft-min function is a convex combination of the gradients of the multilinear extensions, which allows us to optimize all the functions at the same time. Indeed, define e H t (y) :=</p><p>we use the information from the gradients previously observed, in particular, { H 1 , &#8226; &#8226; &#8226; , H t 1 } to decide the set S t . To deal with adversarial input functions, we use the FPL algorithm <ref type="bibr">(Kalai and Vempala, 2005)</ref> and the following guarantee about the algorithm.</p><p>Theorem 4 ((Kalai and Vempala, 2005)). Let s 1 , . . . , s T 2 S be a sequence of rewards. The FPL algorithm 4 (see Appendix C.3) with parameter &#8984; &#63743; 1 outputs decisions d 1 , . . . , d T with regret</p><p>For completeness, we include the original setup and the algorithm in Appendix C.3.</p><p>Our online algorithm works as follows: first, given 0 &lt; &#9999; &lt; 1 we denote `:= dln 1 &#9999; e. We consider the following discretization indexed by &#8999; 2 {0, , 2 , . . . , `} and construct fractional solutions y t &#8999; for each iteration t and discretization index &#8999; . At each iteration t, ideally we would like to construct {y t &#8999; } &#8999; =0 by running the continuous greedy algorithm using the soft-min function H t and then play S t using these fractional solutions. But in the online model, function H t is revealed only after playing set S t . To remedy this, we aim to construct y t &#8999; using FPL algorithm based on gradients {rH j } t 1 j=1 obtained from previous iterations. Thus we have multiple FPL instances, one for each discretization parameter, being run by the algorithm. Finally, at the end of iteration t, we have a fractional vector y t &#7809;hich belongs to `&#8226; P(M) \ [0, 1] V and therefore can be written, fractionally, as a union of `independent sets using the matroid union theorem <ref type="bibr">(Schrijver, 2003)</ref>.</p><p>We round the fractional solution y t `using the randomized swap rounding proposed by <ref type="bibr">Chekuri et al. (2010)</ref> for matroid M `to obtain the set S t to be played at time t. The following theorem from <ref type="bibr">(Chekuri et al., 2010)</ref> gives the necessary property of the randomized swap rounding that we use.</p><p>Theorem 5 (Theorem II.1, <ref type="bibr">(Chekuri et al., 2010)</ref>). Let f be a monotone submodular function and F be its multilinear extension. Let x 2 P(M 0 ) be a point in the polytope of matroid M 0 and S 0 a random independent set obtained from it by randomized swap rounding. Then, E[f (S 0 )] F (x).</p><p>Below, we formalize the details in Algorithm 2 (observe that `/ 2 Z + ). Now, we provide a summary of the proof of Theorem 2, for a complete version we refer to Appendix C.2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Algorithm 2 OnlineSoftMin algorithm</head><p>Input: learning parameter &#8984; &gt; 0, &#9999; &gt; 0, &#8629; = n 2 T 2 , discretization = n 6 T 3 , and `= dln 1 &#9999; e. Output: sequence of sets S1, . . . , ST .</p><p>1: Sample q &#8672; [0, 1/&#8984;] V 2: for t = 1 to T do 3:  <ref type="bibr">x)</ref>. This recurrence is similar to the one shown in <ref type="bibr">(Vondr&#225;k, 2008)</ref> for the discretized continuous greedy. Then, by iterating ` times in &#8999; , we get</p><p>The term E h P  </p><p>Finally, by doing swap rounding on each y t `, and applying Theorem 5, we get the desired result.</p></div></body>
		</text>
</TEI>
