<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>The Power of Adaptivity for Stochastic Submodular Cover</title></titleStmt>
			<publicationStmt>
				<publisher>INFORMS</publisher>
				<date>05/01/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10521347</idno>
					<idno type="doi">10.1287/opre.2022.2388</idno>
					<title level='j'>Operations Research</title>
<idno>0030-364X</idno>
<biblScope unit="volume">72</biblScope>
<biblScope unit="issue">3</biblScope>					

					<author>Rohan Ghuge</author><author>Anupam Gupta</author><author>Viswanath Nagarajan</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[<p>Adaptivity in Stochastic Submodular Cover</p> <p>Solutions to stochastic optimization problems are typically sequential decision processes that make decisions one by one, waiting for (and using) the feedback from each decision. Whereas such “adaptive” solutions achieve the best objective, they can be very time-consuming because of the need to wait for feedback after each decision. A natural question is are there solutions that only adapt (i.e., wait for feedback) a few times whereas still being competitive with the fully adaptive optimal solution? In “The Power of Adaptivity for Stochastic Submodular Cover,” Ghuge, Gupta, and Nagarajan resolve this question in the context of stochastic submodular cover, which is a fundamental stochastic covering problem. They provide algorithms that achieve a smooth trade-off between the number of adaptive “rounds” and the solution quality. The authors also demonstrate via experiments on real-world and synthetic data sets that, even for problems with more than 1,000 decisions, about six rounds of adaptivity suffice to obtain solutions nearly as good as fully adaptive solutions.</p>]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Submodularity is a fundamental notion that arises in applications such as image segmentation, data summarization, hypothesis identification, information gathering, and social networks (see, for example, <ref type="bibr">Simon et al. 2007</ref><ref type="bibr">, Lin and Bilmes 2011</ref><ref type="bibr">, Barinova et al. 2012</ref><ref type="bibr">, Sipos et al. 2012</ref><ref type="bibr">, Chen et al. 2014</ref><ref type="bibr">, Kempe et al. 2015</ref><ref type="bibr">, Radanovic et al. 2018)</ref>. Given a nonnegative, monotone, and integervalued submodular function f : 2 U &#8594; Z &#8805;0 , the submodular cover optimization problem requires us to pick a minimum-cost subset S of items to "cover" function f. In other words, we want that f(S) Q, where Q f (U) is the total coverage value.</p><p>Submodular cover arises in many applications in computer science, machine learning, and operations research (see <ref type="bibr">Wolsey 1982</ref><ref type="bibr">, Golovin and Krause 2011</ref><ref type="bibr">, Mirzasoleiman et al. 2015</ref><ref type="bibr">, Bateni et al. 2018)</ref>. For instance, consider a sensor deployment setting, in which we need to place a collection of sensors to monitor some phenomenon, such as air quality or traffic behavior <ref type="bibr">(Gonz&#225;lez-Banos 2001</ref><ref type="bibr">, Sun et al. 2019)</ref>. Each sensor covers a limited area depending on its sensing range and also obstructions and the local geography. The goal is to deploy the fewest sensors to entirely cover some target region. The covered area is a submodular function of the sensors deployed, so this is a special case of submodular cover. Alternatively, one may want to place sensors so as to achieve a target level of "information gain." In many settings (see, e.g., <ref type="bibr">Krause and Guestrin 2005)</ref>, the information gain can be quantified as a submodular function, so this is again a special case of submodular cover. See Section 4.2 for more details.</p><p>A different application arises in medical diagnosis, in which we know s possible conditions from which a patient may suffer along with the priors on their occurrence (see, e.g., <ref type="bibr">Garey and</ref><ref type="bibr">Graham 1974, Kosaraju et al. 1999)</ref>. Our goal is to perform tests to identify the correct condition as quickly as possible. This can be cast as submodular cover by viewing each test as eliminating all inconsistent conditions (or hypotheses): the number of eliminated hypotheses is a coverage function that is submodular. This is an instance of submodular cover with a coverage value of s -1 because once the s -1 inconsistent hypotheses are eliminated, the remaining one must be correct. See Section 4.3 for more details.</p><p>Observe that both these applications involve uncertain data: the precise area covered by a sensor may not be known before the sensor is deployed, and the precise outcome (positive/negative) of a test is not known until the test is performed. This uncertainty can be modeled using the framework of stochastic submodular optimization, turning the desired solution into a sequential decision process. In the simplest case of such a problem, we are given an underlying set of stochastic items, each of which is either active or inactive (with known probabilities). Only the active items contribute to the submodular function coverage. At each step of the sequential decision process, an item is probed, and its realization (i.e., active or inactive) is observed. The goal is to probe items in way that minimizes the expected total cost incurred until the submodular function is covered.</p><p>The process is typically adaptive so that information from previously probed items is used to identify the next item to probe. Such adaptive solutions are inherently fully sequential, which is undesirable if probing an item is time-consuming. For example, probing/ performing a test in medical diagnosis may involve a long wait for test results. Or, in sensor deployment, if probing a sensor corresponds to physically deploying it, the action may take hours or days. Therefore, we prefer solutions with only few rounds of adaptivity, in which several items are probed simultaneously in each round, and the solution can only adapt at the end of each round.</p><p>Motivated by this, we ask, can solutions with a few adaptive rounds approximate fully adaptive solutions for the stochastic submodular cover problem? We consider cases in which realizations of different items are both independent and allowed to be correlated. For both these situations we give nearly tight answers with smooth trade-offs between the number r of adaptive rounds and the solution quality relative to fully adaptive solutions.</p><p>We make the following main contributions:</p><p>&#8226; When items have independent realizations, for any integer r &#8805; 1, we provide an algorithm with r rounds of adaptivity that costs at most O(Q 1=r log Q) times the optimal fully adaptive solution (that may adapt an unbounded number of times). Here, Q is the maximal value of the submodular function to be covered; note that the function is integer-valued. Our performance guarantee nearly matches a previously known lower bound of &#937; 1 r 3 Q 1=r . &#8226; When items have correlated realizations (with a joint distribution of support size s), for any integer r &#8805; 1, we provide an algorithm with r rounds of adaptivity that costs at most O rs 1=r log (sQ) times the optimal fully adaptive solution.</p><p>&#8226; We also prove that the dependence on the support size s is necessary for correlated distributions. We show an &#937; s 1=r r log s multiplicative gap between r-round adaptive solutions and fully adaptive solutions (even when Q 1).</p><p>&#8226; Finally, we demonstrate the practical efficacy of our algorithms by testing them on both real-world and synthetic data sets. Even on instances with more than 1,000 items, we observe that about six rounds of adaptivity suffice to obtain solutions that are nearly as good as fully adaptive solutions.</p><p>Our algorithms are based on a greedy-style approach and, hence, are very easy to implement. Moreover, unlike prior work, they provide approximation guarantees that do not depend on item costs or probability values (that may even be exponentially worse than Q and s).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1.">Problem Definition</head><p>In the stochastic submodular cover problem, the input is a collection of m random variables (or items) X {X 1 , : : : , X m }. Each item X i has a cost c i &#8712; R + and realizes to a random element of ground set U. Our results extend easily to the more general setting in which each item realizes to a subset of U rather than a single element; see Online Appendix E. Let the joint distribution of X be denoted by D. The random variables X i may or may not be independent; we discuss this issue in more detail at the end of this section. The realization of item X i is denoted by X i &#8712; U; this realization is only known when X i is probed at a cost of c i . Extending this notation, given a subset of items S &#8838; X , its realization is denoted S {X i : X i &#8712; S} &#8838; U.</p><p>In addition, we are given an integer-valued monotone submodular function f : 2 U &#8594; Z + with f(U) Q. A realization S of items S &#8838; X is feasible if and only if f(S) Q the maximal value; in this case, we also say that S covers f. The goal is to probe (possibly adaptively) a subset S &#8838; X of items that gets realized to a feasible set. We use the shorthand c(S) :</p><p>i:X i &#8712;S c i to denote the total cost of items in S &#8838; X and use c max : max i c i . The objective is to minimize the expected cost of probed items, for which the expectation is taken over the randomness in X . We assume that f(X) Q for every realization X of the full set of items X ; this ensures that the function can be covered with probability one. We consider the following types of solutions.</p><p>Definition 1. For an integer r &#8805; 1, an r-round adaptive solution in the set-based model proceeds as follows.</p><p>For each round k 1, : : : , r, it specifies a subset S k of items that is probed in parallel. The cost incurred in round k is c(S k ), the total cost of all probed items in that round. The choice of subset S k in round k can depend on realizations seen in all previous rounds 1, : : : , k -1.</p><p>In the set-based model, we allow solutions to be infeasible (i.e., to fail to cover f ) with some small probability &#951; &gt; 0. As shown in Online Appendix D, such a relaxed notion is necessary. In designing algorithms, <ref type="bibr">Ghuge, Gupta, and Nagarajan: Adaptivity in Stochastic Submodular Cover Operations Research, 2024</ref><ref type="bibr">, vol. 72, no. 3, pp. 1156</ref><ref type="bibr">-1176</ref><ref type="bibr">, &#169; 2022 INFORMS</ref> we find it more convenient to work with a slightly different "permutation" model, which we define next.</p><p>Definition 2. For an integer r &#8805; 1, an r-round adaptive solution in the permutation model proceeds in r rounds of adaptivity. In each round k &#8712; {1, : : : , r}, the solution specifies an ordering of all remaining items and probes them in this order until some stopping rule. The decisions in round k, that is, the ordering and the stopping rule, can depend on the realizations seen in all previous rounds 1, : : : , k -1.</p><p>The permutation model is also used in prior literature <ref type="bibr">(Goemans and</ref><ref type="bibr">Vondr&#225;k 2006, Agarwal et al. 2019)</ref>. In the permutation model, solutions must be feasible with probability one. In Online Appendix D, we show that our algorithms in the permutation model can be converted into algorithms in the set-based model with similar approximation ratios. Henceforth, an r-round adaptive algorithm refers to one in the permutation model unless specified otherwise.</p><p>Setting r 1 in Definition 2 gives us a nonadaptive solution as studied in <ref type="bibr">Goemans and Vondr&#225;k (2006)</ref> and <ref type="bibr">Agarwal et al. (2019)</ref>. Setting r m gives us a fully adaptive solution that may adapt after every probe. Having more rounds leads to a smaller objective value, so fully adaptive solutions have the least objective value. Our performance guarantees are relative to an optimal fully adaptive solution; let OPT denote this solution and its cost. The r-round adaptivity gap is defined as follows: (1)</p><p>Setting r 1 gives the adaptivity gap.</p><p>1.1.1. Independent and Correlated Distributions. We first study the case in which the random variables X are independent. In keeping with existing literature <ref type="bibr">(Deshpande et al. 2016</ref><ref type="bibr">, Im et al. 2016</ref><ref type="bibr">, Agarwal et al. 2019)</ref>, we refer to this case simply as stochastic submodular cover. We then consider the case when the random variables X are correlated with a joint distribution D of support size s: the realizations in the support of D are also called scenarios. We refer to the correlated setting as scenario submodular cover as in <ref type="bibr">Grammel et al. (2016)</ref>.</p><p>1.1.2. Comparison with "Adaptive Submodularity." We note that some previous papers, such as <ref type="bibr">Golovin and Krause (2017)</ref> and <ref type="bibr">Esfandiari et al. (2021b)</ref>, model correlations in stochastic submodular cover in a different way. This involves assuming that the function f and the item distribution D satisfy the following condition (called adaptive submodularity): the conditional expected increase in f resulting from any item never increases when we condition on more realizations. Adaptive submodularity is more general than (independent) stochastic submodular cover, but it does not generalize scenario submodular cover. See Section 1.3 for more details.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2.">Results and Techniques</head><p>Our first result is when the items have independent distributions.</p><p>Theorem 1 (Independent Items: Permutation Model).</p><p>For any integer r &#8805; 1, there is an r-round adaptive algorithm for the stochastic submodular cover problem with expected cost O(Q 1=r &#8226; log Q) times the cost of an optimal adaptive solution.</p><p>This improves over the O(r Q 1=r log Q log (mc max )) bound of <ref type="bibr">Agarwal et al. (2019)</ref> by eliminating the dependence on the number of items m and the item costs (which could be much larger than Q). Moreover, our result nearly matches the lower bound of &#937; 1 r 3 Q 1=r given by <ref type="bibr">Agarwal et al. (2019)</ref>. Setting r log Q shows that O(log Q) adaptive rounds give an O(log Q)-approximation. By transforming this algorithm into a set-based solution (using Theorem 10 in Online Appendix D), we get the following.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Corollary 1 (Independent Items: Set-Based Model).</head><p>There is an O(log Q)-round algorithm for stochastic submodular cover in the set-based model that (i) has expected cost O(log Q) times the optimal adaptive cost and (ii) covers the function with probability at least 1 -1 Q . This approximation ratio of O(log Q) is the best possible (unless P NP) even with an arbitrary number (r m) of adaptive rounds. Corollary 1 improves upon an O(log 2 Q log (mc max ))-approximation in a logarithmic number of rounds <ref type="bibr">(Agarwal et al. 2019</ref>) and an O(log (mQc max ))-approximation in O(log m log (Qmc max )) set-based rounds <ref type="bibr">(Esfandiari et al. 2021a)</ref>.</p><p>Moreover, Theorem 1 (with r 1) implies an O(Qlog Q) gap between nonadaptive and fully adaptive solutions for stochastic set cover, a special case of submodular cover.</p><p>This resolves an open question of <ref type="bibr">Goemans and Vondr&#225;k (2006)</ref> up to an O(log Q) factor, in which Q is the number of objects in the set cover instance.</p><p>Our second set of results is for the case when items have correlated distributions. Recall that s denotes the support size of the joint distribution D, that is, the number of scenarios.</p><p>Theorem 2 (Correlated Items: Permutation Model). For any integer r &#8805; 1, there is an r-round adaptive algorithm for scenario submodular cover with cost O s 1=r (log s + r log Q) times the optimal adaptive cost.</p><p>We also obtain a 2r-round adaptive algorithm with a better cost guarantee of O s 1=r log (sQ) times the optimal adaptive cost (see Corollary 3). Setting r log s and using the conversion to a set-based solution (Theorem 10), we infer the following.</p><p>Corollary 2 (Correlated Items: Set-Based Model). There is an O(log s)-round algorithm for scenario submodular cover in the set-based model that (i) has expected cost O(log (sQ)) times the optimal adaptive cost and (ii) covers the function with probability at least 1 -1 s . This approximation guarantee is nearly the best possible even with an arbitrary number of adaptive rounds: there is an &#937;(log Q)-factor hardness of approximation even for the deterministic case (in which s 1). We note that scenario submodular cover generalizes the classic optimal decision tree problem, which is studied extensively; see, for instance, <ref type="bibr">Garey and Graham (1974)</ref>, <ref type="bibr">Loveland (1985)</ref>, <ref type="bibr">Kosaraju et al. (1999)</ref>, <ref type="bibr">Dasgupta (2004), and</ref><ref type="bibr">Gupta et al. (2017a)</ref>. Corollary 2 improves upon the fully adaptive O(log (sQ))-approximation for scenario submodular cover <ref type="bibr">(Grammel et al. 2016)</ref> by achieving the same approximation guarantee in just O(log s) rounds. It is also an improvement over the O log mQ c max p min -approximation in O log mlog Qm c max p min set-based rounds, which follows from Grammel et al. (2016) and Esfandiari et al. (2021a); here, p min &#8804; 1 s is the minimum probability of any scenario.</p><p>We note that, when the number of rounds is less than logarithmic, our result provides the first approximation guarantee even in the well-studied special case of optimal decision trees.</p><p>The results in Theorems 1 and 2 are incomparable: whereas the independent case has more structure in the distribution D, its support size is exponential. Finally, the dependence on the support size s is necessary in the correlated setting as our next result shows.</p><p>Theorem 3 (Correlated Items Lower Bound). For any integer r &#8805; 1, there is an instance of scenario submodular cover with Q 1 for which the cost of any r-round adaptive solution is &#937; s 1=r r log s times the optimal adaptive cost.</p><p>This lower bound is information-theoretic and does not depend on computational assumptions, whereas the upper bound of Theorem 2 is given by a polynomial algorithm.</p><p>Finally, our algorithms are also easy to implement. We test our algorithms on both synthetic and real data sets; these tests validate the practical performance of our algorithms. Specifically, we test our algorithm for the independent case (Theorem 1) on instances of stochastic set cover and our algorithm for the correlated case (Theorem 2) on instances of optimal decision tree.</p><p>For stochastic set cover, we use real-world data sets to generate instances with &#8776; 1,200 items. We observe a sharp improvement in performance within a few rounds of adaptivity and that six to seven rounds of adaptivity are nearly as good as fully adaptive solutions. For optimal decision tree, we use both real-world and synthetic data. The real-world data has &#8776; 400 scenarios, and the synthetic data has 10,000 scenarios. Again, we find that about six rounds of adaptivity suffice to obtain solutions as good as fully adaptive ones. We also compared our algorithms' cost to informationtheoretic lower bounds for both applications: our costs are typically within 50% of these lower bounds.</p><p>1.2.1. Techniques. The algorithms for the independent and correlated cases are similar in spirit but have some crucial technical differences. In each round of both algorithms, we iteratively compute a "score" for each item and greedily select the item of maximum score. This results in a nonadaptive list of all remaining items, and the items are probed in this order until a stopping rule is satisfied. The stopping rule in the independent case corresponds to reducing the remaining target (on the function value) by a factor of Q 1=r , whereas the stopping rule in the correlated case involves reducing the number of "compatible scenarios" by an s 1=r factor.</p><p>The analysis for Theorems 1 and 2 follows parallel lines at the beginning. For each integer i &#8805; 0, we relate the following two quantities: (i) the probability that the algorithm does not complete within cost &#945; &#8226; 2 i and (ii) the probability that the optimal adaptive solution does not complete within cost 2 i . The "stretch factor" &#945; corresponds to the approximation ratio and is chosen differently for the independent and correlated cases. In order to relate these noncompletion probabilities, we consider the total score G of items selected by the algorithm between costs &#945;2 i-1 and &#945;2 i . The crux of the analysis lies in giving lower and upper bounds on the total score G; this is when the arguments for the independent and correlated settings begin to differ.</p><p>In the independent case, the score of any item X e is an estimate of its relative marginal gain, for which we take an expectation over all previous items as well as X e . We also normalize this gain by the item's cost. See Equation (2) for the definition. In lower bounding the total score G, we use a variant of a sampling lemma from <ref type="bibr">Agarwal et al. (2019)</ref> as well as the constant-factor adaptivity gap for submodular maximization from <ref type="bibr">Bradac et al. (2019)</ref>. We also need to partition the outcome space (of all previous realizations) into "good" and "bad" outcomes: conditional on a good outcome, OPT has a high probability of completing before cost 2 i . Good outcomes are necessary in our proof of the sampling lemma, but luckily, the total probability of bad <ref type="bibr">Ghuge, Gupta, and Nagarajan: Adaptivity in Stochastic Submodular Cover Operations Research, 2024</ref><ref type="bibr">, vol. 72, no. 3, pp. 1156</ref><ref type="bibr">-1176</ref><ref type="bibr">, &#169; 2022 INFORMS</ref> outcomes is small (and they can be effectively ignored). In upper bounding G, we consider the total score as a sum over decision/sample paths and use the fact that the sum of relative gains corresponds to a harmonic series.</p><p>In the correlated case, the score of any item X e is the sum of two terms: (i) its expected relative marginal gain as in the independent case and (ii) an estimate of the probability on "eliminated" scenarios. Both terms are needed because the algorithm needs to balance (i) covering the function and (ii) identifying the realized scenario (after which it is trivial to cover f ). Again, we normalize by the item's cost; see Equation ( <ref type="formula">17</ref>). In lower bounding the total score G, we partition the outcome space into good/okay/bad outcomes that correspond to a high conditional probability of OPT (i) covering function f by cost 2 i , (ii) eliminating a constant fraction of scenarios by cost 2 i , or (iii) neither of the two cases. Further, by restricting to outcomes that have a large number of compatible scenarios (else, the algorithm's stopping rule would apply), we can bound the number of relevant outcomes by s 1=r . Then, we consider OPT (up to cost 2 i ) conditional on all good/okay outcomes and show that one of these items has a high score. To upper bound G, we again consider the total score as a sum over decision paths.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.3.">Related Work</head><p>A (1 + ln Q)-approximation algorithm for the basic submodular cover problem is obtained in <ref type="bibr">Wolsey (1982)</ref>. <ref type="bibr">Dinur and Steurer (2014)</ref> show that this is also the best possible (unless P NP) because submodular cover generalizes the set cover problem. In the past several years, there have been many papers on stochastic variants of submodular cover as this framework captures many different applications; see <ref type="bibr">Golovin and</ref><ref type="bibr">Krause (2011), Im et al. (2016)</ref>, <ref type="bibr">Deshpande et al. (2016)</ref>, <ref type="bibr">Grammel et al. (2016), and</ref><ref type="bibr">Navidi et al. (2020)</ref>.</p><p>The stochastic set cover problem was first studied by <ref type="bibr">Goemans and Vondr&#225;k (2006)</ref>, who show that the adaptivity gap lies between &#937;(d) and O(d 2 ), where d is the number of objects to be covered. Recently, <ref type="bibr">Agarwal et al. (2019)</ref> improved the upper bound to O(d log d log (mc max )). As a corollary of Theorem 1, we obtain a tighter O(d log d) adaptivity gap. Importantly, we eliminate the dependence of the items m and maximum cost, which may even be exponential in d.</p><p>An O(log Q) adaptive approximation algorithm for stochastic submodular cover (independent items) follows from the work of <ref type="bibr">Im et al. (2016)</ref>. <ref type="bibr">Liu et al. (2008)</ref>, <ref type="bibr">Golovin and</ref><ref type="bibr">Krause (2011, 2017)</ref>, and <ref type="bibr">Deshpande et al. (2016)</ref> obtain related results for special cases or with weaker bounds. As mentioned earlier, <ref type="bibr">Agarwal et al. (2019)</ref> obtain r-round adaptivity gaps for independent items.</p><p>Theorem 1 improves their bound by an O(r &#8226; log (mc max )) factor. We bypass computationally expensive steps in prior work, such as solving several instances of stochastic submodular maximization: so our algorithm is more efficient. Our analysis (outlined earlier) is also very different. Whereas we use a sampling lemma similar to <ref type="bibr">Agarwal et al. (2019)</ref>, this result is applied in different manner, and it only affects the analysis (see Lemma 4).</p><p>The scenario submodular cover problem was introduced by <ref type="bibr">Grammel et al. (2016)</ref> as a common generalization of several problems, including optimal decision tree <ref type="bibr">(Garey and</ref><ref type="bibr">Graham 1974, Gupta et al. 2017a)</ref>, equivalence class determination <ref type="bibr">(Cicalese et al. 2014)</ref>, and decision region determination <ref type="bibr">(Javdani et al. 2014)</ref>. <ref type="bibr">Grammel et al. (2016)</ref> obtain an O(log (sQ)) fully adaptive algorithm for it. The same approximation ratio (in a more general setting) is also obtained by <ref type="bibr">Navidi et al. (2020)</ref>. In the correlated setting, we are not aware of any prior work on limited rounds of adaptivity (when the number of rounds r &lt; log s). Some aspects of our analysis (e.g., good/okay/bad outcomes) are similar to that of <ref type="bibr">Navidi et al. (2020)</ref>, but additional work is needed as we have to bound the r-round adaptivity gap.</p><p>As noted earlier, the framework of adaptive submodularity from Golovin and Krause (2017) models correlations in stochastic submodular cover in a different way. Whereas stochastic submodular cover with independent items satisfies adaptive submodularity, scenario submodular cover does not. <ref type="bibr">Esfandiari et al. (2021a)</ref> give an O(log (mQ))-approximation in O(log m log (Qm)) rounds of adaptivity for adaptive submodular cover (AdSubCov ), assuming unit costs. So their result implies the same bounds for (independent) stochastic submodular cover. Although scenario submodular cover is not a special case of AdSubCov , <ref type="bibr">Grammel et al. (2016)</ref> reformulate scenario submodular cover as AdSubCov but with a different goal function that is indeed adaptive submodular. However, this new goal function has a larger Q-value of Q p min because of which the algorithm of Esfandiari et al. (2021a) only implies an O log mQ c max p min -approximation in O log m log Qm c max p min rounds (see Table <ref type="table">2</ref>). To the best of our knowledge, there are no algorithms for AdSubCov using fewer than squared-logarithmic rounds of adaptivity. Tables <ref type="table">1</ref> and <ref type="table">2</ref> provide a comparison of results for stochastic and scenario submodular cover. These results are in the set-based model with failure probability o(1).</p><p>The role of adaptivity is extensively studied for stochastic submodular maximization. <ref type="bibr">Asadpour and Nazerzadeh (2016)</ref> give a constant adaptivity gap under matroid constraints (on the probed items). <ref type="bibr">Gupta et al. (2017b)</ref> obtain a constant adaptivity gap for a very large class of prefixclosed constraints; the constant factor was subsequently Ghuge, Gupta, and Nagarajan: Adaptivity in Stochastic Submodular Cover improved to two, which is also the best possible as shown by <ref type="bibr">Bradac et al. (2019)</ref>. We make use of this result in our analysis. More generally, the role of adaptivity is extensively studied for various stochastic maximization problems. <ref type="bibr">Dean et al. (2008)</ref> and <ref type="bibr">Bhalgat et al. (2011)</ref> study the stochastic knapsack problem; <ref type="bibr">Bansal et al. (2012)</ref> and <ref type="bibr">Behnezhad et al. (2020)</ref> examine the stochastic matching problem; <ref type="bibr">Gupta and Nagarajan (2013)</ref> obtain results for stochastic probing; and <ref type="bibr">Guha and Munagala (2009)</ref>, <ref type="bibr">Gupta et al. (2015)</ref>, and <ref type="bibr">Bansal and Nagarajan (2015)</ref> study stochastic orienteering. <ref type="bibr">Karbasi et al. (2021)</ref>, <ref type="bibr">Esfandiari et al. (2021b)</ref>, and <ref type="bibr">Gao et al. (2019)</ref> study the role of adaptivity and batch processing in online learning. Their work was motivated by noting that many real-world data observations are processed in batches. Recently, there have been several results examining the role of adaptivity in deterministic submodular optimization (see, for example, <ref type="bibr">Balkanski and Singer 2018</ref><ref type="bibr">, Balkanski et al. 2019</ref><ref type="bibr">, Chekuri and Quanrud 2019)</ref>. The motivation here was to parallelize function queries, which are often expensive. In many settings, there are algorithms using a (poly)logarithmic number of rounds that nearly match the best sequential (or fully adaptive) approximation algorithms. Whereas our motivation for parallelizing the expensive probing steps in stochastic submodular cover is similar to that in these two lines of work (online learning and deterministic submodular optimization), the techniques we use are very different.</p><p>A different (and extensively studied) model for stochastic online optimization is the setting of prophet inequalities, in which the algorithm observes several random items sequentially and needs to make immediate selection decisions. The classic setting, introduced by <ref type="bibr">Krengel and Sucheston (1977)</ref>, involves selecting a single item so as to maximize its expected value. See, for example, <ref type="bibr">Correa et al. (2018)</ref> for a recent survey. <ref type="bibr">Kleinberg and Weinberg (2019)</ref> extend the classic prophet inequality setting to one in which multiple items may be selected subject to a matroid constraint (the objective is a linear function of the selected items). See also the extension to multiple matroid constraints by <ref type="bibr">Feldman et al. (2021)</ref>. <ref type="bibr">Rubinstein and Singla (2017)</ref> generalize this setting even further to the case of submodular objectives. A key difference from our setting is that the benchmark in prophet inequalities is omniscient and allowed to be aware of all the random realizations before making its decisions, so the focus here is on competitive analysis, and constant competitive algorithms are known for all these settings. In contrast, for stochastic submodular cover, we cannot compare with such a strong omniscient benchmark: one cannot obtain competitive ratios better than O(Q). Given this, we compare with the best policy restricted in the same manner as the algorithm, that is, which is not aware of item realizations up front.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Stochastic Submodular Cover</head><p>We now consider the (independent) stochastic submodular cover problem and prove Theorem 1. For simplicity, we assume that costs c i are integers. Our results also hold for arbitrary costs (by replacing certain summations in the analysis by integrals). We work with the permutation model of rounds throughout this section. We use the notation S ~S to denote realization S drawn from the distribution induced by S; for example, E S~S f (S) is the expectation of f(S) over the randomness of S.</p><p>We find it convenient to solve a partial cover version of the stochastic submodular cover problem. In this partial cover version, we are given a parameter &#948; &#8712; [0, 1], and the goal is to probe some items R &#8838; X that realize to a set R achieving value f (R) &gt; Q(1 -&#948;). We are interested in a nonadaptive algorithm for this problem. Clearly, if </p><p>Table 2. Comparison of Results for Scenario Submodular Cover Scenario submodular cover Approximation guarantee Set-based rounds Grammel et al. (2016) O(log (sQ)) Fully-adaptive Esfandiari et al. (2021b), Grammel et al. (2016) O(log (mQ cmax pmin )) O(log m log (Qm cmax pmin )) Our result O(log sQ) O(log s) Note. s &#8804; 1=p min . Ghuge, Gupta, and Nagarajan: Adaptivity in Stochastic Submodular Cover Operations Research <ref type="bibr">, 2024</ref><ref type="bibr">, , vol. 72, no. 3, pp. 1156</ref><ref type="bibr">, -1176</ref><ref type="bibr">, , &#169; 2022 INFORMS INFORMS</ref> &#948; 1=Q, the integrality of the function f means that f(R) Q, and we solve the original (full-coverage) problem. Moreover, we can use this algorithm with different thresholds to also get the r-round algorithm. The main result of this section is the following.</p><p>Theorem 4. There is a nonadaptive algorithm for the partial cover version of stochastic submodular cover with cost O ln 1=&#948; &#948; times the optimal adaptive cost for the (full) submodular cover.</p><p>The algorithm first creates an ordering/list L of the items nonadaptively, that is, without knowing the realizations of the items. To do so, at each step, we pick a new item that maximizes a carefully defined score function (Equation ( <ref type="formula">2</ref>)). The score of an item cannot depend on the realizations of previous items on the list (because we are nonadaptive). The summation in the score (2) corresponds to the increase in the function value for the new item X e (scaled by the residual target) for a random realization of the previous items S; we also refer to this term as the incremental value of the item. The actual score is the ratio of the incremental value and the cost of item X e . Once this ordering L is specified, the algorithm starts probing and realizing the items in this order and does so until the realized value exceeds</p><p>Building the list nonadaptively 3: select an item X e &#8712; X \ S that maximizes: score(X e ) :</p><p>. Probing items on the list 7:</p><p>X e &#8592; first r.v. in list L not in R, and let X e &#8712; U be its realization. 8: R &#8592; R &#8746; {X e }, R &#8592; R &#8746; {X e } 9: return probed items R and their realizations R.</p><p>Given this partial covering algorithm, we immediately get an algorithm for the r-round version of the problem, in which we are allowed to make r rounds of adaptive decisions. Indeed, we can first set &#948; Q -1=r and solve the partial covering problem with this value of &#948;. Suppose we probe variables R, and their realizations are given by the set R &#8838; U. Then, we can condition on these values to get the marginal value function f R , where f R (S) f (R &#8746; S)f (R). We note that f R is submodular, and so we can inductively get an (r -1)-round solution for this problem. The following algorithm formalizes this.</p><p>Algorithm 2 (r-Round Adaptive Algorithm For Stochastic Submodular Cover SSC(r, X , f ))</p><p>1: run PARCA (X , f , Q, Q -1=r ) for round #1. 2: let R (respectively, R) denote the probed items (respectively, their realizations) in PARCA. 3: define residual submodular function f : f R . 4: recursively solve SSC(r -1, X \ R, f ).</p><p>Theorem 5. Algorithm 2 is an r-round adaptive algorithm for stochastic submodular cover with cost O(Q 1=r log Q) times the optimal adaptive cost.</p><p>Proof. We proceed by induction on the number of rounds r. Let OPT denote the cost of an optimal adaptive solution. The base case is r 1, in which case &#948; Q -1=r</p><p>We now consider r &gt; 1 and assume (inductively)</p><p>that Algorithm 2 finds an r -1-round O Q 1 r-1 log Qapproximation algorithm for any instance of stochastic submodular cover. Let &#948; Q -1=r . By Theorem 4, the expected cost in round 1 (step 1 in Algorithm 2) is</p><p>denote the maximal value of the residual submodular function</p><p>=r by the definition of the partial covering algorithm. The optimal solution OPT conditioned on the variables in R realizing to R gives a feasible adaptive solution to the residual problem of covering f ; we denote this conditional solution by OPT. We inductively get that the cost of our r -1-round solution on f is at most</p><p>where we used Q &lt; Q (r-1)=r . As this holds for every realization R, we can take expectations over R to get that the (unconditional) expected cost of the last r -1</p><p>Remark 1. Assuming that the scores (2) can be computed in polynomial time, it is clear that our entire algorithm can be implemented in polynomial time. In particular, if T denotes the time taken to calculate the score of one item, then the overall algorithm runs in time poly(m, T), Ghuge, Gupta, and Nagarajan: Adaptivity in Stochastic Submodular Cover</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1162</head><p>Operations <ref type="bibr">Research, 2024</ref><ref type="bibr">, vol. 72, no. 3, pp. 1156</ref><ref type="bibr">-1176</ref><ref type="bibr">, &#169; 2022 INFORMS</ref> where m is the number of items. We are not aware of a closed-form expression for the scores (for arbitrary submodular functions f ). However, as discussed in Section A.2, we can use sampling to estimate these scores to within a constant factor. Moreover, our analysis works even if we only choose an approximate maximizer for (2) at each step. It turns out that T poly(m, c max ) many samples suffice to estimate these scores. So the final runtime is poly(m, c max ); note that this does not depend on the number |U | of elements. We note that even the previous algorithms <ref type="bibr">(Agarwal et al. 2019</ref><ref type="bibr">, Esfandiari et al. 2021a</ref>) have a polynomial dependence on c max in their runtime. In the following analysis, we assume that the scores (2) are computed exactly; see Online Appendix A.2 for the sampling details.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Analysis for a Call to PARCA</head><p>We now prove Theorem 4. We denote by OPT an optimal adaptive solution for the (full) covering problem on f. We denote by NA the nonadaptive strategy given by algorithm PARCA. Note that NA probes variables according to the order given by the list L and stops when the realized coverage exceeds &#964; : Q(1 -&#948;) (see Algorithm 1 for details). Now, we relate the expected costs of NA and OPT. We refer to the cumulative cost incurred (by either OPT or NA) until any point in the solution as time elapsing. We say that OPT is in phase i when it is in</p><p>where the constant factor is fixed later. We say that NA is in phase i when it is in time interval [&#945; &#8226; 2 i-1 , &#945; &#8226; 2 i ) for i &#8805; 1; phase 0 refers to the time interval [1, &#945;). Define</p><p>&#8226; u i : probability that NA goes beyond phase i, that is, has cost at least &#945; &#8226; 2 i .</p><p>&#8226; u * i : probability that OPT goes beyond phase i -1, that is, costs at least 2 i .</p><p>As all costs are integers, u * 0 1. For ease of notation, we also use OPT and NA to denote the random cost incurred by OPT and NA, respectively. The following lemma, which bounds the noncompletion probability u i in terms of u * i , forms the crux of the analysis. Lemma 1. For any phase i &#8805; 1, we have u i &#8804; u i-1 4 + 6 5 u * i : Lemma 1 implies Theorem 4 using standard techniques (see Online Appendix A.1).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Proof of the Key Lemma (Lemma 1)</head><p>Recall the setting of Lemma 1 and fix the phase i &#8805; 1. Consider the list L generated by PARCA(X , f , Q, &#948;). Let NA denote both the nonadaptive strategy of Algorithm 1 as well as its cost. Note that NA probes items in the order given by L until a coverage value greater than &#964; : Q(1 -&#948;) is achieved. We assume (without loss of generality) that &#948; is a power of two, that is, &#948; 2 -z for some integer z &#8805; 0. Indeed, if this is not the case, we can use a power-of-two value &#948; , where &#948; 2 &#8804; &#948; &#8804; &#948;: this only increases the approximation ratio O 1 &#948; ln 1 &#948; in Theorem 4 by a constant factor.</p><p>For each time t &#8805; 0, let X e(t) denote the item that is selected at time t. In other words, this is the item that causes the cumulative cost of L to exceed t for the first time. We define the total gain as the following quantity: G :</p><p>which corresponds to the sum of scores over the time interval</p><p>The proof of Lemma 1 is completed by giving upper and lower bounds on G, which we do next. The lower bound views G as a sum over time steps, whereas the upper bound views G as a sum over decision paths.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2.2.1.</head><p>A Lower Bound for G. We show that G &#937;(&#945;&#948;)</p><p>To this end, we prove the contribution to</p><p>To prove the lower bound on score(X e(t) ), we show that there exists a set T of items with total cost O(2 i =&#948;) and incremental value &#937; u i -6 5 u * i ; see Lemma 5.</p><p>Henceforth, fix some time t &#8712; [&#945;2 i-1 , &#945;2 i ) in phase i of our algorithm. Let S be the set of chosen items (added to list L) until time t and let S denote its realization. Note that S is used crucially in the definition of score (2), and the total cost of the items S is at most t. We also need the following key definitions:</p><p>1. For any power-of-two &#952; &#8712; 1 2 j : 0 &#8804; j &#8804; log Q , we say that a realization S of S belongs to scale &#952; if and</p><p>Note that &#948; corresponds to some scale because it is a power of two. We use E &#952; to denote all outcomes of S that belong to scale &#952;. We also use S ~E&#952; to denote the conditional distribution of S corresponding to E &#952; .</p><p>2. For any scale &#952;, let r * i&#952; : P(OPT covers f with cost at most 2 i and S &#8712; E &#952; ).</p><p>3. Scale &#952; is called good if</p><p>, where P(E &#952; ) :</p><p>Here is an outline of the rest of the analysis. Lemma 2 characterizes the stopping rule in NA in terms of the scale &#952; of realization S. Lemma 3 lower bounds the total r * i&#952; value over good scales in terms of the noncompletion probabilities u i and u * i . Then, Lemma 4 shows that, for each good scale &#952;, there is a subset T &#952; of items <ref type="bibr">Ghuge, Gupta, and Nagarajan: Adaptivity in Stochastic Submodular Cover Operations Research, 2024</ref><ref type="bibr">, vol. 72, no. 3, pp. 1156</ref><ref type="bibr">-1176</ref><ref type="bibr">, &#169; 2022 INFORMS</ref> in which (i) the total cost is O(2 i =&#952;) and (ii) the total incremental value is large conditional on realization S &#8712; E &#952; . Finally, Lemma 5 shows that taking all items in T &#952; (for all good scales &#952; &#8805; &#948;) provides a set T with total cost O(2 i =&#948;) and large incremental value (unconditionally). This can then be used to lower bound G as noted.</p><p>Lemma 2. The realization S is in a scale &#952; &#8804; &#948; if and only if f (S) &#8805; &#964;. Hence, NA terminates by time t if and only if the realization of S is in a scale &#952; &#8804; &#948;.</p><p>Proof. Note that, if the realization of S is in some scale &#952; &#8805; 2&#948;, then</p><p>and NA would not terminate before time t. On the other hand, if the realization of S is in a scale &#952; &#8804; &#948; (note that all scales are powers of two), then f (S) &#8805; Q -&#952;Q &#8805; &#964;, and NA would terminate before time t. w</p><p>Lemma 3. We have &#952;&gt;&#948;, good r * i&#952; &#8805; u i -</p><p>Proof. First, we upper bound &#952; not good r * i&#952; . Consider any scale &#952; that is not good. Then,</p><p>where the last inequality uses the fact that &#952; P(OPT &#8805; 2 i and S &#8712; E &#952; ) u * i . We now upper bound &#952;&#8804;&#948; r * i&#952; . By Lemma 2, if the realization S is in scale &#952; &#8804; &#948;, then NA ends before time t &#8804; &#945;2 i ; that is, it does not go beyond phase i. Hence,</p><p>We now use the fact that 1u * i &#952; r * i&#952; , where we sum over all scales. So &#952;&gt;&#948;, good</p><p>where we use (4) and (5). w</p><p>We now define a function g : 2 U &#8594; R &#8805;0 as follows:</p><p>g(T) :</p><p>We use f S (T) f (S &#8746; T)f (S). The second equality is by Lemma 2. The function g is monotone and submodular because f S is monotone and submodular for each S &#8838; U, and g is a nonnegative linear combination of such functions. Moreover, for any item X e , its incremental value (the summation in (2)) is E X e [ g(X e )], and we have score(X e ) 1 c e &#8226; E X e [ g(X e )]:</p><p>Constrained Stochastic Submodular Maximization.</p><p>Our analysis makes use of some known results for stochastic submodular maximization. Here, we are given as input a nonnegative monotone submodular function h : 2 U &#8594; R &#8805;0 and independent stochastic items X {X 1 , : : : , X m } such that each X i realizes to some element of the ground set U. There is a cost c i associated with each item X i and a budget B on the total cost. The goal is to select S &#8838; X (possibly adaptively) such that the total cost of S is at most B and it maximizes the expected value E S~S [h(S)]. Note that an adaptive solution selects items S sequentially (based on prior realizations), so the subset S is itself random. On the other hand, a nonadaptive solution involves selecting a deterministic subset S up front. The adaptivity gap for stochastic submodular maximization is at most two; see theorem 1 in <ref type="bibr">Bradac et al. (2019)</ref>. In other words, for any instance, the optimal adaptive value is at most two times the optimal nonadaptive value.</p><p>Lemma 4. For any good scale &#952;, there exists a subset T &#952; &#8838; X \ S with c(T &#952; ) &#8804; 144 &#952; &#8226; 2 i such that</p><p>Proof. We construct set T &#952; as follows. Initially, T &#952; &#8592; &#8709;. For each k 1, 2, : : : , 144 &#952; , 1. Sample realization S of S from E &#952; . 2. Let T k &#8838; X \ S be an optimal nonadaptive solution to the stochastic submodular maximization instance with items X \ S, submodular function f S , and cost budget B 2 i . Note that the distribution of items X \ S is independent of the realization S of S.</p><p>3. Set T &#952; &#8592; T &#952; &#8746; T k . By construction, c(T &#952; ) &#8804; 144 &#952; &#8226; 2 i . So we focus on the expected function value. Consider any S &#8712; E &#952; as the realization of S. Let T S denote the nonadaptive solution obtained in step 2 for realization S. Define w * i,S as the probability that OPT covers f with cost at most 2 i given that S realizes to S; that is,</p><p>,S : P(OPT covers f with cost at most 2 i |S S): Note that S&#8712;E &#952; w * i,S &#8226; P(S S) r * i&#952; . Let AD S denote OPT conditional on S S, restricted to items X \ S and Ghuge, Gupta, and Nagarajan: Adaptivity in Stochastic Submodular Cover 1164 Operations Research, 2024, vol. 72, no. 3, pp. 1156-1176, &#169; 2022 INFORMS Downloaded from informs.org by [141.211.4.224] on 07 July 2024, at 09:44 . For personal use only, all rights reserved.</p><p>until time 2 i . Note that AD S is a feasible adaptive solution to the stochastic submodular maximization instance in step 2: every decision path has total cost at most 2 i . The expected value of AD S (under function f S ) is at least (Qf (S)) &#8226; w * i,S by definition of w * i,S . As S &#8712; E &#952; , we have Qf (S) &#8805; &#952;Q 2 , which implies AD S has value at least &#952;Q 2 &#8226; w * i,S . Now, using the factor two adaptivity gap for stochastic submodular maximization (discussed earlier), it follows that the nonadaptive solution T S has expected value</p><p>Taking expectations over S (conditional on S &#8712; E &#952; ),</p><p>The first equality uses the fact that the item realizations in T S are independent of the realization S of S.</p><p>The left-hand side of the relation can be rewritten to give</p><p>For a contradiction to (7), suppose that</p><p>Subtracting Equation (9) from Equation (8) gives the following:</p><p>where the second inequality uses monotonicity of f and the last one its submodularity.</p><p>Let T (k) T 1 &#8746; T 2 : : : &#8746; T k denote the items selected until iteration k. Then,</p><p>where we define f (T (0) ) : 0. Note that, in each iteration k, the sample S is drawn independently and identically from E &#952; , and items T k T S are added.</p><p>The first inequality uses T (k-1) &#8838; T &#952; and submodularity, and the second inequality uses (10). Adding over all iterations k,</p><p>where the last inequality uses the fact that &#952; is a good scale. This is a contradiction because the maximum function value is Q. This completes the proof of (7). w Lemma 5. For any S &#8838; X , there exists a subset T &#8838; X \ S of total cost at most 144 &#948; &#8226; 2 i with</p><p>Proof. Let B denote the set of good scales &#952; with &#952; &gt; &#948;.</p><p>Note that &#952;&#8712;B 1 &#952; &#8804; 1 &#948; as the scales are powers of two. From Lemma 4, let T &#952; denote the items satisfying (7) for each scale &#952; &#8712; B. Define T &#8746; &#952;&#8712;B T &#952; . As claimed, the total cost is</p><p>Next, we bound the total incremental value as</p><p>where the inequality is by submodularity of g. By definition of function g (see ( <ref type="formula">6</ref>)),</p><p>In order to bound the last term, consider any &#952; &#8712; B. By (7) and T &#8839; T &#952; , it follows that</p><p>Ghuge, Gupta, and Nagarajan: Adaptivity in Stochastic Submodular Cover Operations Research, 2024, vol. 72, no. 3, pp. 1156-1176, &#169; 2022 INFORMS 1165 Downloaded from informs.org by [141.211.4.224] on 07 July 2024, at 09:44 . For personal use only, all rights reserved.</p><p>Using the fact that &#952;Q 2 &lt; Qf (S) &#8804; &#952;Q for all S &#8712; E &#952; , we get</p><p>Combined with ( <ref type="formula">11</ref>) and ( <ref type="formula">12</ref>), we get X e &#8712;T E[ g(X e )] &#8805; 1 6 &#8226; &#952;&#8712;B r * i&#952; , which completes the proof of the lemma. w Using Lemma 5 and averaging, max</p><p>where &#946; O(1).</p><p>Combining (13) and Lemma 3, and using the greedy choice in step 3 (Algorithm 1), score(X e(t) ) &#8805; &#948; &#946;2 i u i -</p><p>We note that this inequality continues to hold (with a larger constant &#946;) even if we choose an item that only maximizes the score (2) within a constant factor. Using this inequality for each time t during phase i, we have</p><p>We use this lower bound for G in conjunction with an upper bound, which we prove next.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.2.">An</head><p>Upper Bound for G. Here, we show that G O(ln (1=&#948;)) &#8226; u i-1 . We now consider the implementation of the nonadaptive list L and calculate G as a sum of contributions over the observed decision path. Let &#928; denote the (random) decision path followed by the nonadaptive strategy NA: this consists of a prefix of L along with their realizations. Denote by X 1 , X 2 , : : : , the sequence of realizations (each in U) observed on &#928;. So item X j is selected between time j-1 &#8467; 1 c &#8467; and j &#8467; 1 c &#8467; . Let h(p) index the first (last) item in &#928; (if any) that is selected (even partially) during phase i, that is, between time &#945;2 i-1 and &#945;2 i . For each index h &#8804; j &#8804; p, let t j denote the duration of time that item X j is selected during in phase i, so t j is the width of interval [</p><p>It follows that t j &#8804; c j . Define G(&#928;) : 0 if index h is undefined (i.e., &#928; terminates before phase i), and otherwise, G(&#928;) :</p><p>By the stopping criterion for L, the f value before the end of &#928; remains at most &#964; Q(1 -&#948;). So the denominator, that is, Qf ({X 1 , : : : , X j-1 }), is at least &#948;Q for all j.</p><p>Lemma 6. For any decision path &#928;, we have G(&#928;) &#8804; 1 + ln (1=&#948;).</p><p>Proof. For each h &#8804; j &#8804; p, let V j : f ({X 1 , : : : , X j }); also let V 0 0. For j &#8804; p -1, as noted, V j &#8804; &#964;; as f is integervalued, V j &#8712; {0, 1, : : : , &#964; }. We have</p><p>The second inequality uses Q -</p><p>The right-hand side of ( <ref type="formula">15</ref>) is then</p><p>where we used V p &#8804; Q. This completes the proof. w</p><p>Taking expectations over the various decision paths means G E &#928; [G(&#928;)], so Lemma 6 gives G &#8804; (1 + ln (1=&#948;)) &#8226; P(&#928; doesn t terminate before phase i)</p><p>2.2.3. Wrapping Up. To complete the proof of Lemma 1, we set &#945; :</p><p>&#948; and combine ( <ref type="formula">14</ref>) and ( <ref type="formula">16</ref>) to get</p><p>This completes the proof of Lemma 1 and, hence, Theorem 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Scenario Submodular Cover</head><p>In this section, we describe an r-round adaptive algorithm for the scenario submodular cover problem.</p><p>Operations <ref type="bibr">Research, 2024</ref><ref type="bibr">, vol. 72, no. 3, pp. 1156</ref><ref type="bibr">-1176</ref><ref type="bibr">, &#169; 2022 INFORMS Downloaded from informs.org by [141.211.4.224</ref>] on 07 July 2024, at 09:44 . For personal use only, all rights reserved.</p><p>Again, we work with the permutation model of rounds (unless stated otherwise). As before, we have a collection of m stochastic items X {X 1 , : : : , X m } with costs {c i } m i 1 . In contrast to the independent case, the stochastic items here are correlated, and their joint distribution D is given as input. The goal is to minimize the expected cost of a solution S &#8838; X that realizes to a feasible set (i.e., f(S) Q). The joint distribution D specifies the (joint) probability that X realizes to any outcome X &#8712; U m . We refer to the realizations X &#8712; U m that have a nonzero probability of occurrence as scenarios. Let s denote the number of scenarios in D, that is, the support size of distribution D. The set of scenarios is denoted M {1, : : : , s}, and p &#969; denotes the probability of each scenario &#969; &#8712; M. Note that s &#969; 1 p &#969; 1. For each scenario &#969; &#8712; M and item X e , we denote by X e (&#969;) &#8712; U the realization of X e in scenario &#969;. The distribution D can be viewed as selecting a random realized scenario &#969; * &#8712; M according to the probabilities {p &#969; }, after which the item realizations are deterministically set to X 1 (&#969; * ), : : : , X m (&#969; * ) . However, an algorithm does not know the realized scenario &#969; * : it only knows the realizations of the probed items (using which it can infer a posterior distribution for &#969; * ). As stated in Section 1.1, our performance guarantee in this case depends on the support size s. We also show that such a dependence is necessary (even when Q is small).</p><p>For any subset S &#8838; X of items, we denote by S(&#969;) the realizations for items in S under scenario &#969;. We say that scenario &#969; is compatible with some realization {u e : X e &#8712; S} if and only if, X e (&#969;) u e for all items X e &#8712; S.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">The Algorithm</head><p>Similar to the algorithm for the independent case, it is convenient to solve a partial cover version of the scenario submodular cover problem. However, the notion of partial progress is different: we use the number of compatible scenarios instead of function value. Formally, in the partial version, we are given a parameter &#948; &#8712; [0, 1], and the goal is to probe some items R that realize to a set R such that either (i) the number of compatible scenarios is less than &#948;s or (ii) the function f is fully covered. Clearly, if &#948; 1=s, then case (i) cannot happen (it corresponds to zero compatible scenarios), so the function f must be fully covered. We use this algorithm with different parameters &#948; to solve the rround version of the problem. We show the following. Theorem 6. There is a nonadaptive algorithm for the partial version of scenario submodular cover with cost O 1 &#948; ln 1 &#948; + log Q times the optimal adaptive cost of the (full) submodular cover.</p><p>The algorithm first creates an ordering/list L of the items nonadaptively, that is, without knowing the realizations of the items. To do so, at each step we pick a new item that maximizes a carefully defined score function (Equation ( <ref type="formula">17</ref>)). The score of an item depends on an estimate of progress toward (i) eliminating scenarios and (ii) covering function f. Before we state this score formally, we need some definitions. Definition 3. For any S &#8838; X , let H(S) denote the partition {Y 1 , : : : , Y &#8467; } of the scenarios M in which all scenarios in a part have the same realization for items in S. Let Z : {Y &#8712; H(S) : |Y | &#8805; &#948;s} be the set of "large" parts having size at least &#948;s.</p><p>In other words, scenarios &#969; and &#963; lie in the same part of H(S) if and only if S(&#969;) S(&#963;). Note that partition H(S) does not depend on the realization of S. Moreover, after probing and realizing items S, the set of compatible scenarios must be one of the parts in H(S). Also, the number of large parts |Z | &#8804; s &#948;s 1 &#948; as the number of scenarios |M| s. See Figure <ref type="figure">1</ref>(a) for an example. Notes. (a) S {X e1 , X e2 } and we partition the scenarios M based on outcomes of S to get H(S) {Y 1 , Y 2 , Y 3 }. (b) We further partition scenarios Y 2 based on realizations of X e3 .The part of Y 2 compatible with outcome X e3 2 is the largest cardinality part, that is, B e3 (Y 2 ). The shaded region is L e3 (Y 2 ).</p><p>Ghuge, <ref type="bibr">Gupta, and Nagarajan: Adaptivity in Stochastic Submodular Cover Operations Research, 2024</ref><ref type="bibr">, vol. 72, no. 3, pp. 1156</ref><ref type="bibr">-1176</ref><ref type="bibr">, &#169; 2022 INFORMS</ref> Definition 4. For any X e &#8712; X and subset Z &#8838; M of scenarios, consider the partition of Z based on the realization of X e . Let B e (Z) &#8838; Z be the largest cardinality part and define L e (Z) : Z\B e (Z).</p><p>Note that L e (Z) comprises several parts of the preceding partition of Z. If the realized scenario &#969; * &#8712; L e (Z) and X e is selected, then at least half the scenarios in Z are eliminated (as being incompatible with X e ). Figure <ref type="figure">1</ref>(b) illustrates these definitions.</p><p>For any Z &#8712; H(S), note that the realizations S(&#969;) are identical for all &#969; &#8712; Z: we use S(Z) &#8838; U to denote the realization of S under each scenario in Z.</p><p>If S denotes the previously added items in list L, the score (17) of any item X e involves a term for each part Z &#8712; Z, which itself comes from two sources:</p><p>&#8226; Information gain &#969;&#8712;L e (Z) p &#969; , the total probability of scenarios in L e (Z).</p><p>&#8226; Relative function gain</p><p>, the expected relative gain obtained by including element X e , where the expectation is over the scenarios in part Z.</p><p>The overall score of item X e is the sum of these terms (over all parts in Z) normalized by the cost c e . In defining the score, we only focus on the large parts Z. If the realization of S corresponds to any other part, then the number of compatible scenarios is less than &#948;s (and the partial-cover algorithm would terminate).</p><p>Once the list L is specified, the algorithm starts probing and realizing the items in this order and does so until either (i) the number of compatible scenarios drops below &#948;s or (ii) the realized function value equals Q. Note that, in case (ii), the function is fully covered. See Algorithm 3 for a formal description of the nonadaptive partial-cover algorithm. Z&#8712;Z &#969;&#8712;L e (Z) X e &#8592; first r.v. in list L not in R 9:</p><p>X e v &#8712; U be the realization of X e . 10:</p><p>11: H &#8592; {&#969; &#8712; H : X e (&#969;) v} 12: return probed items R, realizations R and compatible scenarios H.</p><p>Given this partial covering algorithm, we immediately get an algorithm for the r-round version of the problem, in which we are allowed to make r rounds of adaptive decisions. Indeed, we can first set &#948; s -1=r and solve the partial covering problem. Suppose we probe the items R (with realizations R &#8838; U) and are left with compatible scenarios H &#8838; M. Then we can condition on scenarios H and the marginal value function f R (which is submodular) and inductively get an r -1-round solution for this problem. The following algorithm and result formalizes this.</p><p>Algorithm 4 (r-Round Adaptive Algorithm For Scenario Submodular Cover NSC(r, X , M, f ))</p><p>1: run SPARCA(X , M, f , Q, |M| -1=r ) for round one. Let R denote the probed items, R their realizations, and H &#8838; M the compatible scenarios returned by SPARCA.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2: define residual submodular function</head><p>Theorem 7. Algorithm 4 is an r-round adaptive algorithm for scenario submodular cover with cost O s 1=r (log s + r log Q)) times the optimal adaptive cost, where s is the number of scenarios.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Analysis for the Partial Covering Algorithm</head><p>We now prove Theorem 6. Consider any call to SPARCA. Let s |M | denote the number of scenarios in the instance. Recall that the goal is to probe items R &#8838; X with some realization R and compatible scenarios H &#8838; M such that (i) | H | &lt; &#948;s or (ii) f(R) Q. We denote by OPT an optimal adaptive solution for the covering problem on f. Let NA denote the nonadaptive strategy given by algorithm SPARCA. Note that NA probes items in the order given by the list L and stops when either condition (i) or (ii) occurs. We consider the expected cost of this strategy and relate it to the cost of OPT. The high-level approach is similar to that for the independent case, but the details are quite different.</p><p>We refer to the cumulative cost incurred until any point in a solution as time. We say that OPT is in phase i in the time interval [2 i , 2 i+1 ) for i &#8805; 0. We say that NA is in phase i in the time interval</p><p>We use phase 0 to refer to the interval [1, &#946;). We set &#946; : 16 &#948; log (Q=&#948;); this choice becomes clear in the proof. For any phase i &#8805; 0, we use</p><p>&#8226; u i : probability that NA goes beyond phase i, that is, costs at least &#946; &#8226; 2 i .</p><p>&#8226; u * i : probability that OPT goes beyond phase i -1, that is, costs at least 2 i .</p><p>Ghuge, Gupta, and Nagarajan: Adaptivity in Stochastic Submodular Cover</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1168</head><p>Operations <ref type="bibr">Research, 2024</ref><ref type="bibr">, vol. 72, no. 3, pp. 1156</ref><ref type="bibr">-1176</ref><ref type="bibr">, &#169; 2022 INFORMS Downloaded from informs.org by [141.211.4.224</ref>] on 07 July 2024, at 09:44 . For personal use only, all rights reserved.</p><p>Because all costs are integers, u * 0 1. For ease of notation, we also use OPT and NA to denote the random cost incurred by OPT and NA, respectively. The main result here is that E</p><p>As in the independent case, it suffices to show the following key lemma.</p><p>Lemma 7. For any phase i &#8805; 1, we have u i &#8804; u i-1 4 + 2u * i : Using this lemma, we can immediately prove Theorem 6: this part of the analysis is identical to the one for Theorem 4 (using Lemma 1). We provide the complete proof of Lemma 7 in Online Appendix B.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Tight Approximation Using More Rounds</head><p>We now show that a better (and tight) approximation is achievable if we use 2r (instead of r) adaptive rounds. The main idea is to use the following variant of the partial covering problem, called scenario submodular partial cover (SSPC ). The input is the same as scenario submodular cover: items X , scenarios M with |M| s, and submodular function f with maximal value Q. Given parameters &#948;, &#949; &#8712; [0, 1], the goal in SSPC is to probe some items R that realize to set R &#8838; U such that either (i) the number of compatible scenarios is less than &#948;s or (ii) the function value f (R) &gt; Q(1 -&#949;). Unlike the partial version studied previously, we do not require f to be fully covered in case (ii). Note that setting &#949; 1 Q , we recover the previous partial covering problem, so SSPC is more general.</p><p>Corollary 3. There is a nonadaptive algorithm for SSPC with cost O 1 &#948; ln 1 &#948; + ln 1 &#949; times the optimal adaptive cost for the (full) submodular cover.</p><p>The algorithm and proof are nearly identical to SPARCA. See Online Appendix B for the details.</p><p>The following result shows how SSPC can be used iteratively to obtain a 2r-round algorithm.</p><p>Theorem 8. There is a 2r-round adaptive algorithm for scenario submodular cover with cost O s 1=r log (sQ) times the optimal adaptive cost, where s is the number of scenarios.</p><p>Setting r log s, we achieve a tight O(log (sQ)) approximation in O(log s) rounds. Combined with the conversion to set-based rounds (Theorem 10), this proves Corollary 2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Applications</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Stochastic Set Cover</head><p>The stochastic set cover problem is a special case of stochastic submodular cover. The input is a universe E of d objects and a collection {X 1 , : : : , X m } of m items. Each item X i has a cost c i &#8712; R + and corresponds to a random subset of objects (with a known explicit distribution). Different items are independent of each other.</p><p>The goal is to select a set of items such that the realized subsets cover E and the expected cost is minimized. This problem was first studied by <ref type="bibr">Goemans and Vondr&#225;k (2006)</ref>, in which it is shown that the adaptivity gap lies between &#937;(d) and O(d 2 ). The correct adaptivity gap for stochastic set cover was posed as an open question by <ref type="bibr">Goemans and Vondr&#225;k (2006)</ref>. Subsequently, <ref type="bibr">Agarwal et al. (2019)</ref> made significant progress by showing that the adaptivity gap is O(d log d &#8226; log (mc max )). However, as a function of the natural parameter d (number of objects), the best adaptivity gap remained O(d 2 ) because the number of stochastic sets m and maximum cost c max may be arbitrarily larger than d.</p><p>As a corollary of Theorem 1, we obtain an O(d log d) adaptivity gap that nearly matches the &#937;(d) lower bound. In fact, for any r &#8805; 1, we obtain an r-round adaptive algorithm that costs at most O(d 1=r log d) times the fully adaptive cost. This nearly matches the &#937; 1 r 3 d 1=r r-round adaptivity gap shown in <ref type="bibr">Agarwal et al. (2019)</ref>. We note that, when r log d, we obtain an O(log d)-approximation algorithm with a logarithmic number of adaptive rounds; this approximation ratio is the best possible even for deterministic set cover.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Sensor Placement with Unreliable Sensors</head><p>In the sensor placement problem, we are concerned with deploying a collection of sensors for visual surveillance or to acquire information on air quality, temperature, humidity, etc. One approach to model this problem (suitable for visual surveillance) is to assume that each sensor has a particular sensing region and to minimize the number of sensors required to cover some target region. This corresponds to the art gallery problem (see Gonz&#225;lez-Banos 2001), which, in turn, is a special case of set cover. In the setting with unreliable sensors that we consider, each sensor may fail independently with some probability (in which case it does not cover its region), and the goal is to minimize the expected number of sensors so as to cover the target region. This can be modeled as an instance of stochastic set cover discussed in Section 4.1.</p><p>An alternative approach (suitable for information acquisition) is to model sensor readings as Gaussian processes <ref type="bibr">(Deshpande et al. 2004)</ref>, in which the goal is to place the minimum number of sensors so as to achieve a target level of information gain. Formally, let [m] denote the set of locations and Z i be a random variable representing the information at location</p><p>denote the information at all locations. A sensor at location i makes observation Y i Z i + i , where i is an independent (random) noise term. For any subset A &#8838; [m], let Y A {Y i : i &#8712; A} denote the random observations at the locations in A. <ref type="bibr">Ghuge, Gupta, and Nagarajan: Adaptivity in Stochastic Submodular Cover Operations Research, 2024</ref><ref type="bibr">, vol. 72, no. 3, pp. 1156</ref><ref type="bibr">-1176</ref><ref type="bibr">, &#169; 2022</ref> </p><p>to make predictions on the information at any location i. The information gain of the system if we place sensors at locations A &#8838; [m] is defined as <ref type="bibr">Krause and Guestrin (2005)</ref> show that the function g(A) is monotone and submodular in A assuming that the observations Y A are conditionally independent given Z; see, corollary 4 in their paper. The conditional independence assumption is satisfied in our setting because the noise terms i are independent. Given a target Q on the information gain, the goal in the deterministic problem is to find sensor locations A &#8838; [m] so that g(A) &#8805; Q.</p><p>In the stochastic setting (with unreliable sensors), each sensor i &#8712; [m] is active independently with some probability p i (and fails otherwise). The goal is to place sensors (possibly adapting based on the active/failure outcomes) so that the information gain from the active sensors is at least Q. This can be modeled as an instance of stochastic submodular cover as follows.</p><p>The items correspond to the m locations. For each sensor at location i &#8712; [m], we define two elements T i and F i corresponding to active and failure outcomes, respectively. The ground set</p><p>For each i &#8712; [m], random variable X i T i with probability p i and X i F i otherwise. Define function f :</p><p>Observe that f is monotone and submodular because g is monotone and submodular <ref type="bibr">(over [m]</ref>). Furthermore, let M be a large enough integer so that scaling f by a factor of M makes it integer-valued. Theorem 1 can be applied to the scaled function f with target Q M &#8226; Q. Then, we obtain an r-round adaptive algorithm of cost at most O Q 1=r log Q times an optimal fully adaptive algorithm for the unreliable sensor placement problem.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Optimal Decision Tree</head><p>The optimal decision tree problem captures problems from a variety of fields, such as learning theory, medical diagnosis, pattern recognition, and Boolean logic; see the surveys <ref type="bibr">Moret (1982)</ref> and <ref type="bibr">Murthy (1997)</ref>. In an instance of optimal decision tree (ODT) we are given s hypotheses with probabilities {p i } s i 1 . An unknown random hypothesis y * &#8712; [s] is drawn from this distribution. There is a collection of m tests, and test e costs c e &#8712; R + and returns a positive result if y * lies in some subset T e &#8838; [s] and a negative result if y * &#8713; T e . The goal is to identify y * using tests of minimum expected cost.</p><p>We can transform any ODT instance into an instance of scenario submodular cover. We associate scenarios with hypotheses, and the distribution D is given by probabilities {p i } s i 1 . For each test e, we have an item X e of cost c e . The ground set U {e + , e -: e test}, which corresponds to all possible (individual) test results. Item X e realizes to e + (e -) if test e is positive (,negative). If y * is the realized scenario, then the item realizations correspond to the test outcomes for hypothesis y * . For each test e, define subsets T(e -) T e and T(e + ) [s]\T e . The submodular function is f (S) min |&#8746; v&#8712;S T(v)| , { s -1} for S &#8838; U. Note that f(S) is the number of incompatible hypotheses given the tests results S. Moreover, Q s -1. Applying Theorem 2, for any r &#8805; 1, we obtain an r-round adaptive algorithm that costs at most O(rs 1=r log s) times the fully adaptive optimum. Our lower bound for scenario submodual cover (Theorem 3) also implies an &#937; 1 r log s s 1=r bound on the r-roundadaptivity gap for ODT. So, for any constant r &#8805; 1, our r-round-adaptivity gap is tight up to an O(log 2 s) factor. Moreover, using Corollary 2, we obtain an O(log s) round algorithm of cost O(log s) times the adaptive optimum. As shown in <ref type="bibr">Chakaravarthy et al. (2011)</ref>, O(log s) is the best possible approximation ratio even for fully adaptive algorithms. 4.3.1. Application to Medical Diagnosis. Recall our motivating application in medical diagnosis in which we know s possible conditions from which a patient may suffer (along with the priors of their occurrence), and our goal is to adaptively perform tests to identify the correct condition as quickly (and cheaply) as possible. This problem, known as automated diagnosis, can be directly cast as the optimal decision tree problem. See, for example, <ref type="bibr">Azar and El-Metwally (2013)</ref> for applications of decision tree methods to diagnostic problems. See also <ref type="bibr">Bellala et al. (2012)</ref> for an application of ODT in emergency response. Here, a first responder observes the symptoms of a victim of chemical exposure in order to identify the toxic chemical.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.2.">Application to Active</head><p>Learning. In a typical (binary) classification problem, there is an X set of data points, each of which is associated with a + orlabel. There is also a set H of hypotheses, in which each hypothesis provides a +=labeling of the data points. It is assumed that the true classifier/hypothesis h * belongs to H. In the average-case setting that we consider, there is also a Bayesian prior p H (&#8226;) for the true hypothesis, that is, Pr [h * h] p H (h) for all h &#8712; H. The learner needs to identify the true hypothesis h * . In active learning, the learner can query the label of any point e &#8712; X by incurring some cost c e (which corresponds to some expert labeling e). The goal is to identify h * by <ref type="bibr">Ghuge, Gupta, and Nagarajan: Adaptivity in Stochastic Submodular Cover 1170</ref><ref type="bibr">Operations Research, 2024</ref><ref type="bibr">, vol. 72, no. 3, pp. 1156</ref><ref type="bibr">-1176</ref><ref type="bibr">, &#169; 2022 INFORMS Downloaded from informs.org by [141.211.4.224</ref>] on 07 July 2024, at 09:44 . For personal use only, all rights reserved.</p><p>querying points (possibly adaptively) at the minimum expected cost. See, for example, <ref type="bibr">Dasgupta (2004</ref><ref type="bibr">), Guillory and Bilmes (2009</ref><ref type="bibr">), and Cicalese et al. (2014)</ref> for more details on this approach. This is exactly an instance of an optimal decision tree, in which the data points correspond to tests. Applying Theorem 2, for any r &#8805; 1, we obtain an r-round adaptive algorithm that costs at most O(r|H | 1=r log (|H |)) times the fully adaptive optimum.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Shared Filter Evaluation</head><p>This problem is introduced by <ref type="bibr">Munagala et al. (2007)</ref> in the context of executing multiple queries on a data set with shared (Boolean) filters. There are n independent "filters" X 1 , : : : , X n , and each filter i has cost c i and evaluates to true with probability p i (and false otherwise). There are also m conjunctive queries, in which each query j &#8712; [m] is the "and" of some subset Q i &#8838; [n] of the filters. So query j is true iff X i true for all i &#8712; Q j . The goal is to evaluate all queries at the minimum expected cost. This can be modeled as an instance of stochastic submodular cover. The items correspond to filters. The ground set U {T i , F i } n i 1 , where T i (F i ) corresponds to filter i evaluating to true (false). The submodular function is</p><p>Note that the term for query Q j is |Q j | iff the query's value is determined: it is false when a single false filter is seen and it's true when all its filters are true. The maximal value of the function is Q m j 1 |Q j | &#8804; mn. Using Theorem 1, for any r &#8805; 1, we obtain an r-round adaptive algorithm with cost at most O((mn) 1=r &#8226; log (mn)) times the optimal adaptive cost.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5.">Correlated Knapsack Cover</head><p>There are n items, each of cost c i and random (integer) reward X i . The rewards may be correlated and are given by a joint distribution D with s scenarios. The exact reward of an item is only known when it is probed. Given a target Q, the goal is to probe some items so that the total realized reward is at least Q, and we minimize the expected cost. This is a special case of scenario submodular cover. The ground set U {(i, v) : i &#8712; [n], 0 &#8804; v &#8804; Q}, where element (i, v) represents item X i realizing to v. Any realization of value at least Q is treated as equal to Q: this is because the target is itself Q. For each element e (i, v) &#8712; U, let a e v denote its value. Then, the submodular function is f (S) min e&#8712;S a e , Q { }for S &#8838; U. By Theorem 1, we obtain an r-round adaptive algorithm with cost at most O(s 1=r (log s + r log Q)) times the optimal adaptive cost.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.6.">Stochastic Score Classification</head><p>The stochastic score classification (SSClass) problem introduced by <ref type="bibr">Gkenosis et al. (2018)</ref> models applications such as assessing disease risk levels of patients and giving quality ratings to products. Formally, there are n binary random variables X 1 , : : : , X n , which are used to compute a score r(X) n i 1 a i X i , where a i &#8712; Z + for all i. The realization X i &#8712; {0, 1} of variable X i can be determined by performing a query of cost c i &#8712; R + . Additionally, there are B + 1 integers &#945; 1 &lt; &#8943; &lt; &#945; B+1 that partition the possible scores into classes. Realization X of X is in class j &#8712; [B] iff &#945; j &#8804; r(X) &#8804; &#945; j+1 -1. We can view the &#945; j values as "cutoffs" for the classes. The goal is to determine the correct class by querying variables at minimum expected cost. Note that it is not necessary to query all variables to determine the class. <ref type="bibr">Gkenosis et al. (2018)</ref> show that any instance of SSClass can be converted into an instance of stochastic submodular cover as follows. The ground set U {(i, 0), (i, 1) : i &#8712; [n]}, corresponding to the possible realizations of the variables. Consider any index j &#8712; {2, : : : B}. Recall the cutoff &#945; j and let &#946; j : n i 1 a i&#945; j + 1. Define two submodular functions: R 1 j (S) : min &#945; j , (i, 1)&#8712;S a i and R 0 j (S) : min &#946; j , (i, 0)&#8712;S a i , &#8704;S &#8838; U:</p><p>Observe that R 1 j (S) &#945; j (i.e., R 1 j is covered) iff the realizations in S imply that the score is at least &#945; j . Similarly, R 0 j (S) &#946; j iff the realizations in S imply that the score is at most &#945; j+1 -1. We combine these two functions using the "OR" construction of submodular functions to get</p><p>Note that g j is covered (i.e., has value &#945; j &#946; j ) if and only if either R 1 j or R 0 j is covered. Moreover, g j is monotone and submodular. Finally, we combine all these B functions using the "AND" construction to obtain f (S) : B j 2 g j (S). Note that f is monotone and submodular, and it is covered if and only if each of the g j 's is covered, which implies that the class is correctly identified. Moreover, the maximal value of f is Q</p><p>Using Theorems 1 and 2, we obtain r-round adaptive algorithms for independent and correlated distributions with approximation ratios O(W 3=r log W) and O(rs 1=r log (sW)), respectively (relative to the fully adaptive optimum). Recall that s is the number of scenarios for correlated distributions. Very recently, Ghuge et al. (2022) obtained a nonadaptive (i.e., one round) <ref type="bibr">Ghuge, Gupta, and Nagarajan: Adaptivity in Stochastic Submodular Cover Operations Research, 2024</ref><ref type="bibr">, vol. 72, no. 3, pp. 1156</ref><ref type="bibr">-1176</ref><ref type="bibr">, &#169; 2022 INFORMS</ref> algorithm that achieves an O(1) approximation for stochastic score classification with independent distributions; however, their result is not applicable for correlated distributions.</p><p>We can also extend our result to the d-dimensional score classification problem, in which one needs to determine the classes of d different score functions r 1 , &#8943; r d . For each score r k , we define a submodular function f k as earlier and define a combined function f (S) : d k 1 f k (S). Now, the maximal value Q &#8804; dW 3 . So we obtain r-round adaptivity gaps of O(d 1=r W 3=r log (dW)) and O(rs 1=r log (dsW)) for the independent and correlated cases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Computational Results</head><p>We provide a summary of computational results of our r-round adaptive algorithms for the stochastic set cover and optimal decision tree problems. We conducted all experiments using Python 3.8 and Gurobi 8.1 with a 2.3 Ghz Intel Core i5 processor and 16 GB 2133 MHz LPDDR3 memory.</p><p>5.1. Stochastic Set Cover 5.1.1. Instances. We use the Epinions network (<ref type="url">http://  snap.stanford.edu/</ref>) and a Facebook messages data set described in <ref type="bibr">Rossi and Ahmed (2015)</ref> to generate instances of the stochastic set cover problem. The Epinions network consists of 75,879 nodes and 508,837 directed edges. We compute the subgraph induced by the top 1,000 nodes with the highest out-degree (this subgraph has 57,092 directed edges) and use this to generate the stochastic set cover instance. The Facebook messages data set consists of 1,266 nodes and 6,451 directed edges. Given an underlying directed graph, we generate an instance of the stochastic set cover problem as follows. We associate the ground set U with the set of nodes of the underlying graph. We associate an item X u with each node u. Let &#915;(u) denote the outneighbors of u. We sample a subset of &#915;(u) using the binomial distribution with p 0.1 for 500 times. Let S &#8838; &#915;(u) be sampled &#945; S times: then, X u realizes to S &#8746; {u} with probability &#945; S =500. This ensures that X u always covers u. We set f to be the coverage function. We interpret the realization of a node as the set of neighbors it influences, and so f computes the total number of people that are influenced. We set Q &#948;n, where n represents the number of nodes in the underlying network. We use &#948; 0:5 for the Epinions network. However, because the Facebook messages network is sparse, we set &#948; 0:25 in the second instance. All costs are one.</p><p>5.1.2. Results. We test our r-round adaptive algorithm on the two kinds of instances described. We vary the number r of rounds over integers in the interval [1, log n], where n &#8776; 1, 000. We compare with our fully adaptive algorithm, which adapts after every probe: in each step, this algorithm probes the item that maximizes the score (2) with S &#8709;. We note that this also corresponds to the adaptive algorithm from <ref type="bibr">Im et al. (2016)</ref>. To compute an estimate of the expected cost of any algorithm, we sample item realizations 20 times independently and take the average cost incurred. We also find an estimate of an information-theoretic lower bound by sampling item realizations 20 times and taking the average cost of an integer linear program (solved using the Gurobi solver) that computes the optimal off-line cost to cover Q elements for a given realization. Note that no adaptive policy can do better than this lower bound. In fact, the gap between this information-theoretic lower bound and an optimal adaptive solution may be as large as &#937;(Q) on worst case instances. We observe that, in both cases, the expected cost of our solution after only a few rounds of adaptivity is within 50% of the information-theoretic lower bound. Moreover, in six to seven rounds of adaptivity, we notice a decrease of &#8776; 8% in the expected cost, and these solutions are nearly as good as fully adaptive solutions. We plot this trend in Figure <ref type="figure">2</ref>. Notes. (a) Epinions network. (b) Facebook messages network. Ghuge, Gupta, and Nagarajan: Adaptivity in Stochastic Submodular Cover 1172</p><p>Operations <ref type="bibr">Research, 2024</ref><ref type="bibr">, vol. 72, no. 3, pp. 1156</ref><ref type="bibr">-1176</ref><ref type="bibr">, &#169; 2022 INFORMS Downloaded from informs.org by [141.211.4.224</ref>] on 07 July 2024, at 09:44 . For personal use only, all rights reserved.</p><p>Finally, note that the increase in expected cost with rounds of adaptivity (see Figure <ref type="figure">2</ref>(a)) can be attributed to the probabilistic nature of our algorithm (and the experimental setup). We also notice this in the next batch of experiments.</p><p>5.2. Optimal Decision Tree 5.2.1. Instances. We use a real-world data set called WISER (<ref type="url">http://wiser.nlm.nih.gov/</ref>) for our experiments. The WISER data set describes symptoms that one may suffer from after being exposed to certain chemicals. It contains data corresponding to 415 chemicals (scenarios for ODT) and 79 symptoms (elements with binary outcomes). Given a patient exhibiting certain symptoms, the goal is to identify the chemical to which the patient has been exposed (by testing as few symptoms as possible). This data set is used for evaluating algorithms for similar problems in other papers <ref type="bibr">(Bhavnani et al. 2007;</ref><ref type="bibr">Bellala et al. 2011</ref><ref type="bibr">Bellala et al. , 2012;;</ref><ref type="bibr">Navidi et al. 2020)</ref>. For each symptom-chemical pair, the data specifies whether someone exposed to the chemical exhibits the given symptom. However, the WISER data has "unknown" entries for some pairs. In order to obtain instances for ODT from this, we generate 10 different data sets by assigning random binary values to the unknown entries. Then, we remove all identical scenarios to ensure that the ODT instance is feasible. We use the uniform probability distribution for the scenarios. Given these 10 data sets, we consider two cases. The first case (called WISER -U) has all tests with unit costs. In the second case, we generate costs randomly for each test from {1, 4, 7, 10} according to the probabilities [0:1, 0:2, 0:4, 0:3]; for example, with probability 0.4, a test is assigned cost 7. Varying cost captures the setting in which tests may have different costs, and we may not want to schedule an expensive test without sufficient information. We refer to this case as WISER -R.</p><p>We also test our algorithm on synthetic data that we generate as follows. We set s 10,000 and m 100. For each y &#8712; [s], we randomly generate a binary sequence of outcomes that correspond to how y reacts to the tests. We do this in two ways: for test e, we set y &#8712; T e with probability p &#8712; {0:2, 0:5}. If a sequence of outcomes is repeated, we discard the scenario to ensure feasibility of the ODT instance. We assign equal probability to each Notes. (a) WISER -U. (b) WISER -R. Notes. (a) SYN -U -0:2. (b) -U -0:5.</p><p>Ghuge, <ref type="bibr">Gupta, and Nagarajan: Adaptivity in Stochastic Submodular Cover Operations Research, 2024</ref><ref type="bibr">, vol. 72, no. 3, pp. 1156</ref><ref type="bibr">-1176</ref><ref type="bibr">, &#169; 2022 INFORMS</ref> scenario. We generate instances using (i) unit costs or (ii) random costs from {1, 4, 7, 10} with respective probabilities [0:1, 0:2, 0:4, 0:3]. Thus, we generate four types of instances with synthetic data. We refer to the instance generated with p 0.2 and unit costs as SYN -U -0:2. Other instances are named similarly.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.2.">Results</head><p>. We test our r-round adaptive algorithm on all of the aforementioned data sets. We vary r over integers in the interval [1, log s]. Again, we compare with our fully adaptive algorithm, which adapts after every probe: in each step, this algorithm probes the item that maximizes the score (17) in which S &#8709; and Z consists of a single part with all scenarios. We also implemented the fully adaptive "generalized binary search" algorithm from <ref type="bibr">Dasgupta (2004)</ref> as an algorithmic benchmark: on the instances considered, this algorithm performed nearly identically to our fully adaptive algorithm, so we exclude it from the plots. We compare with information-theoretic lower bounds obtained as follows. For instances with unit costs (WISER -U, SYN -U -0:2 and SYN -U -0:5) this lower bound corresponds to the entropy that is log 2 (s), where s is the number of scenarios. Recall that all instances have uniform probabilities across scenarios. For instances with nonuniform costs (WISER -R, SYN -R -0:2 and SYN -R -0:5), the entropy is no longer a valid lower bound. Instead, we use an integer programming formulation to compute the optimal cost to identify a given scenario (see Online Appendix F for details) and then average over all scenarios. We note that there are instances in which this informationtheoretic lower bound is smaller than the optimal adaptive cost by an O(s) factor.</p><p>For the WISER data sets, we compute averages using all scenarios. Figure <ref type="figure">3</ref>(a) plots our algorithm's expected cost as a function of r for WISER -U. We observe that our algorithm gets very close to the information-theoretic lower bound in only three rounds of adaptivity. Figure <ref type="figure">3</ref>(b) plots our algorithm's costs for WISER -R. Here, we observe a sharp decrease in costs within four rounds of adaptivity, after which our algorithm's cost is within &#8776; 50% of the information-theoretic lower bound. Note that we only plot results on our first data set for WISER -U and WISER -R. We include plots for all 10 WISER data sets in Online Appendix G.</p><p>For the synthetic data, we compute averages by sampling scenarios over 100 trials (because s 10,000, computing expectation over all s would be very slow). We plot the results in Figures <ref type="figure">4</ref> and <ref type="figure">5</ref>. We observe that, with six rounds of adaptivity, our algorithm's cost nearly matches that of the fully adaptive algorithm. Notes. (a) -R -0:2. (b) SYN -R -0:5. Ghuge, Gupta, and Nagarajan: Adaptivity in Stochastic Submodular Cover 1174</p><p>Operations <ref type="bibr">Research, 2024</ref><ref type="bibr">, vol. 72, no. 3, pp. 1156</ref><ref type="bibr">-1176</ref><ref type="bibr">, &#169; 2022 INFORMS Downloaded from informs.org by [141.211.4.224</ref>] on 07 July 2024, at 09:44 . For personal use only, all rights reserved.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Downloaded from informs.org by [141.211.4.224] on 07 July 2024, at 09:44 . For personal use only, all rights reserved.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1"><p>OperationsResearch, 2024Research,  , vol. 72, no. 3, pp. 1156Research,  -1176Research,  , &#169; 2022   INFORMS Downloaded from informs.org by [141.211.4.224] on 07 July 2024, at 09:44 . For personal use only, all rights reserved.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_2"><p>Ghuge, Gupta, and Nagarajan: Adaptivity in Stochastic Submodular Cover   </p></note>
		</body>
		</text>
</TEI>
