Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available July 1, 2025
-
We devise an online learning algorithm -- titled Switching via Monotone Adapted Regret Traces (SMART) -- that adapts to the data and achieves regret that is instance optimal, i.e., simultaneously competitive on every input sequence compared to the performance of the follow-the-leader (FTL) policy and the worst case guarantee of any other input policy. We show that the regret of the SMART policy on any input sequence is within a multiplicative factor e/(e−1)≈1.58 of the smaller of: 1) the regret obtained by FTL on the sequence, and 2) the upper bound on regret guaranteed by the given worst-case policy. This implies a strictly stronger guarantee than typical `best-of-both-worlds' bounds as the guarantee holds for every input sequence regardless of how it is generated. SMART is simple to implement as it begins by playing FTL and switches at most once during the time horizon to the worst-case algorithm. Our approach and results follow from an operational reduction of instance optimal online learning to competitive analysis for the ski-rental problem. We complement our competitive ratio upper bounds with a fundamental lower bound showing that over all input sequences, no algorithm can get better than a 1.43-fraction of the minimum regret achieved by FTL and the minimax-optimal policy. We also present a modification of SMART that combines FTL with a ``small-loss" algorithm to achieve instance optimality between the regret of FTL and the small loss regret bound.more » « lessFree, publicly-accessible full text available July 1, 2025
-
We design online algorithms for fair allocation of public goods to a set of N agents over a sequence of T rounds and focus on improving their performance using predictions. In the basic model, a public good arrives in each round, and every agent reveals their value for it upon arrival. The algorithm must irrevocably decide the investment in this good without exceeding a total budget of B across all rounds. The algorithm can utilize (potentially noisy) predictions of each agent’s total value for all remaining goods. The algorithm's performance is measured using a proportional fairness objective, which informally demands that every group of agents be rewarded proportional to its size and the cohesiveness of its preferences. We show that no algorithm can achieve better than Θ(T/B) proportional fairness without predictions. With reasonably accurate predictions, the situation improves significantly, and Θ(log(T/B)) proportional fairness is achieved. We also extend our results to a general setting wherein a batch of L public goods arrive in each round and O(log(min(N,L)T/B)) proportional fairness is achieved. Our exact bounds are parameterized as a function of the prediction error, with performance degrading gracefully with increasing errors.
-
Many resource management problems require sequential decision-making under uncertainty, where the only uncertainty affecting the decision outcomes are exogenous variables outside the control of the decision-maker. We model these problems as Exo-MDPs (Markov Decision Processes with Exogenous Inputs) and design a class of data-efficient algorithms for them termed Hindsight Learning (HL). Our HL algorithms achieve data efficiency by leveraging a key insight: having samples of the exogenous variables, past decisions can be revisited in hindsight to infer counterfactual consequences that can accelerate policy improvements. We compare HL against classic baselines in the multi-secretary and airline revenue management problems. We also scale our algorithms to a business-critical cloud resource management problem – allocating Virtual Machines (VMs) to physical machines, and simulate their performance with real datasets from a large public cloud provider. We find that HL algorithms outperform domain-specific heuristics, as well as state-of-the-art reinforcement learning methods.more » « less
-
We consider a practically motivated variant of the canonical online fair allocation problem: a decision-maker has a budget of resources to allocate over a fixed number of rounds. Each round sees a random number of arrivals, and the decision-maker must commit to an allocation for these individuals before moving on to the next round. In contrast to prior work, we consider a setting in which resources are perishable and individuals' utilities are potentially non-linear (e.g., goods exhibit complementarities). The goal is to construct a sequence of allocations that is envy-free and efficient. We design an algorithm that takes as input (i) a prediction of the perishing order, and (ii) a desired bound on envy. Given the remaining budget in each period, the algorithm uses forecasts of future demand and perishing to adaptively choose one of two carefully constructed guardrail quantities. We characterize conditions under which our algorithm achieves the optimal envy-efficiency Pareto frontier. We moreover demonstrate its strong numerical performance using data from a partnering food bank.more » « less
-
Tauman_Kalai, Yael (Ed.)Consider an agent exploring an unknown graph in search of some goal state. As it walks around the graph, it learns the nodes and their neighbors. The agent only knows where the goal state is when it reaches it. How do we reach this goal while moving only a small distance? This problem seems hopeless, even on trees of bounded degree, unless we give the agent some help. This setting with "help" often arises in exploring large search spaces (e.g., huge game trees) where we assume access to some score/quality function for each node, which we use to guide us towards the goal. In our case, we assume the help comes in the form of distance predictions: each node v provides a prediction f(v) of its distance to the goal vertex. Naturally if these predictions are correct, we can reach the goal along a shortest path. What if the predictions are unreliable and some of them are erroneous? Can we get an algorithm whose performance relates to the error of the predictions? In this work, we consider the problem on trees and give deterministic algorithms whose total movement cost is only O(OPT + Δ ⋅ ERR), where OPT is the distance from the start to the goal vertex, Δ the maximum degree, and the ERR is the total number of vertices whose predictions are erroneous. We show this guarantee is optimal. We then consider a "planning" version of the problem where the graph and predictions are known at the beginning, so the agent can use this global information to devise a search strategy of low cost. For this planning version, we go beyond trees and give an algorithms which gets good performance on (weighted) graphs with bounded doubling dimension.more » « less