<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Uniform Loss Algorithms for Online Stochastic Decision-Making With Applications to Bin Packing</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>06/08/2020</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10201014</idno>
					<idno type="doi">10.1145/3393691.3394224</idno>
					<title level='j'>SIGMETRICS '20: Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Siddhartha Banerjee</author><author>Daniel Freund</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[We consider a general class of finite-horizon online decision-making problems, where in each period a controller is presented a stochastic arrival and must choose an action from a set of permissible actions, and the final objective depends only on the aggregate type-action counts. Such a framework encapsulates many online stochastic variants of common optimization problems including bin packing, generalized assignment, and network revenue management.In such settings, we study a natural model-predictive control algorithm that in each period, acts greedily based on an updated certainty-equivalent optimization problem. We introduce a simple, yet general, condition under which this algorithm obtains uniform additive loss (independent of the horizon) compared to an optimal solution with full knowledge of arrivals. Our condition is fulfilled by the above-mentioned problems, as well as more general settings involving piece-wise linear objectives and offline index policies, including an airline overbooking problem.
CCS CONCEPTS• Theory of computation → Design and analysis of algorithms.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>&#8226; Exogenous Randomness: System dynamics are subject to exogenous stochastic fluctuations, i.e., random arrivals and events which are conditionally independent of the system state.</p><p>&#8226; Limited Distributional Knowledge: The controller has limited distributional information (Eg. noisy mean estimates) about arrivals; these are obtained, for example, from historical traces. These three features are present in many problems of practical interest, across various fields. For now, the following canonical examples serve to illustrate some settings of interest:</p><p>&#8226; Resource allocation and generalized assignment: A finite supply of resources is allocated to arriving demand, with varying requirements and valuations, so as to maximize overall rewards. &#8226; Online bin-packing: Cloud-computing jobs with different requirements arrive over a finite horizon, and the controller needs to load them on existing VMs with sufficient free resources, or start-up new VMs, while minimizing operating costs. &#8226; Online routing in batch-processing systems: Partitioning arriving compute jobs into batches in batch-processing systems, shipments among trucks in logistics systems, and online financial transactions among payment-processing firms with different contract structures. In all the above settings, given distributional knowledge of arrivals, the underlying control problems can be formulated as Markov Decision Processes (MDP), and corresponding optimal solutions computed via dynamic programming. However, due to their combinatorial nature, these problems suffer from the 'curse of dimensionality', due to which computing optimal policies is typically intractable. Furthermore, such intractable optimal policies also do not provide easy benchmarks for evaluating the performance of simpler heuristics.</p><p>An alternate control paradigm in such settings is that of modelpredictive control, wherein the controller makes decisions based on some offline benchmark. One natural class of benchmarks are the so-called prophet benchmarks -hindsight optimal policies with full information of arrivals. Controls based on such benchmarks are often simple and easy to interpret; moreover, the benchmark also provides a natural way of computing performance bounds for such policies. However, most policies are known to incur an additive loss compared to the prophet benchmark <ref type="foot">1</ref> which grows with the horizon-length T ; in particular, for the problems we study, existing algorithms typically incur either constant-factor multiplicative losses (i.e., &#920;(T ) additive losses), or &#920;( &#8730; T ) additive losses. Significantly, however, such performance guarantees often coincide with minimax bounds for these settings, and thus hold even under adversarial arrivals. This leaves open the question as to whether these algorithms are too pessimistic, and if better performance is achievable in more structured settings, and in particular, given distributional assumptions on the arrivals (and knowledge of the corresponding distribution).</p><p>In light of the above, our aim in this work is to characterize conditions under which we can obtain algorithms that have a uniform additive loss (i.e., independent of T ) compared to a hindsight optimal policy. Such a guarantee is particularly strong, as it immediately implies small additive loss against the optimal policy, despite this being infeasible to compute in the settings we consider. We build on a recent line of work, started by <ref type="bibr">[1]</ref> and <ref type="bibr">[4]</ref>, that provides uniformloss algorithms for online multidimensional knapsack and matching problems, and extend these to obtain a simple but far-reaching condition for obtaining uniform-loss algorithms for a wide class of online decision-making problems. From a theoretical viewpoint, our approach helps characterize how performance guarantees for such policies depend on a combination of regularity properties of the stochastic arrival process, and the underlying geometry of the control problems. Significantly, our theoretical insights lead to new algorithms for several widely-studied problems, including the generalized assignment and bin-packing problems; our algorithms are simple, and yet perform order-wise better, both in theory and in experiments, than existing algorithms.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Our Contribution</head><p>Our main contributions fall into two categories. First, we propose a unified algorithmic framework for a wide class of finite-horizon online control problems, and provide conditions under which this gives a uniform loss guarantee -additive loss compared to a hindsight (or clairvoyant) optimal solution which is independent of the horizon length. Next, we apply the framework to several widelystudied problems, and show how it leads to algorithms with stronger guarantees compared to the state-of-the-art in these settings.</p><p>Framework, Algorithm and Performance Guarantees. Stated informally, we obtain uniform loss guarantees for the following kind of problem: over a horizon of T periods, the controller observes a random type arriving in each period, and needs to take a type-specific action in response. We assume that we have some limited distributional information about the type arrivals (e.g., a single sample of the arrivals over the full time horizon). For the random arrivals we require neither independence nor identical distributions, but instead impose two light regularity conditions. Finally, our terminal objective is based only on the type-action counts -the aggregate number of times each action was taken over the entire horizon.</p><p>For such settings, we provide a unified algorithm which we call Online Clairvoyant Optimum Pursuit (OCOP). In our main result we prove that under weak assumptions on the regularity of the arrival processes and the geometry of the objective function, OCOP obtains an additive loss guarantee (compared to the hindsight optimal solution with exact knowledge of arrivals) that is independent of T . Our conditions, in particular, are fulfilled by iid arrival processes and Regime Distribution-oblivious Known Distribution OCOP</p><p>O(1) Table <ref type="table">1</ref>: Improvement of OCOP over state-of-the art guarantees for known distributions.</p><p>piece-wise linear objective functions subject to a linear polytope, which can be used to encode various combinatorial problems.</p><p>Applications. We demonstrate how our general algorithm and conditions translate to specific problems. In particular, we show that our conditions for uniform loss hold in two general settings: (i) where the underlying objective is piecewise-linear with a fixed number of pieces, and (ii) where the objective function admits a class of offline index policies, i.e., policies based on computing an index for each type-action pair that is independent of the other types, and then sorting the types based on these indices. The most striking application of our results is to online stochastic bin packing. In this well-studied problem we are able to reduce the suboptimality gap from &#952; ( &#8730; T ) to &#952; (1), while generalizing beyond the assumption of iid arrivals. Further, our results work with just the first moment of the distribution, i.e., though our approach is not distributionoblivious, it also does not require full distributional information.</p><p>Traditionally, the bin packing literature has measured its objective as waste, i.e., the amount of empty bin space across all bins opened by an algorithm. It has long been known that in the case of discrete item-size distributions occurring iid, the amount of waste of an optimal packing (i.e., the waste the clairvoyant would incur) scales either as &#952; (T ), &#952; ( &#8730; T ) or &#952; (1) over a long time horizon -these regimes are referred to as linear waste (LW), perfectly packable (PP), and bounded waste (BW). Noticeably, even with LW it is possible to have the additional waste of an algorithm relative to the optimal packing scale as &#952; (1); this is exactly the guarantee we provide, and it holds regardless of regime. In contrast, state-of-the-art algorithms in the literature so far had only guaranteed &#952; (1) loss bounds for the bounded waste regime. We summarize the known results in Table <ref type="table">1</ref>, highlighting the difference between distributionoblivious algorithms and ones that require knowledge of the arrival distributions (such as ours). Full paper. A full version of this paper can be found at <ref type="url">https:// papers.ssrn.com/sol3/papers.cfm?abstract_id=3479189</ref>.</p></div>			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>The additive loss vis-&#224;-vis the hindsight-optimal is classically referred to as the regret of a policy; in particular, we focus on the expected sample-path regret. This term however is interpreted differently in different communities, and in lieu of this, we use additive loss throughout this work.</p></note>
		</body>
		</text>
</TEI>
