We consider the online linear optimization problem with movement costs, a variant of online learning in which the learner must not only respond to cost vectors c_t with points x_t in order to maintain low regret, but is also penalized for movement by an additional cost. Classically, simple algorithms that obtain the optimal sqrt(T) regret already are very stable and do not incur a significant movement cost. However, recent work has shown that when the learning algorithm is provided with weak "hint" vectors that have a positive correlation with the costs, the regret can be significantly improved to log(T). In this work, we study the stability of such algorithms, and provide matching upper and lower bounds showing that incorporating movement costs results in intricate tradeoffs logarithmic and sqrt(T) regret.
more »
« less
Logarithmic Regret from Sublinear Hints
We consider the online linear optimization problem, where at every step the algorithm plays a point x_t in the unit ball, and suffers loss for some cost vector c_t that is then revealed to the algorithm. Recent work showed that if an algorithm receives a "hint" h_t that has non-trivial correlation with c_t before it plays x_t, then it can achieve a logarithmic regret guarantee, improving on the classical sqrt(T) bound. In this work, we study the question of whether an algorithm really requires a hint at every time step. Somewhat surprisingly, we show that an algorithm can obtain logarithmic regret with just O(sqrt(T)) hints under a natural query model. We give two applications of our result, to the well-studied setting of optimistic regret bounds and to the problem of online learning with abstention.
more »
« less
- Award ID(s):
- 2047288
- PAR ID:
- 10337201
- Date Published:
- Journal Name:
- Advances in Neural Information Processing Systems 34 (NeurIPS 2021)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We consider the problem of finding the maximally influential node in random networks where each node influences every other node with constant yet unknown probability. We develop an online algorithm that learns the relative influences of the nodes. It relaxes the assumption in the existing literature that a central observer can monitor the influence spread globally. The proposed algorithm delegates the online updates to the nodes on the network; hence requires only local observations at the nodes. We show that using an explore-then-commit learning strategy, the cumulative regret accumulated by the algorithm over horizon T approaches O(T2/3) for a network with a large number of nodes. Additionally, we show that, for fixed T, the worst case-regret grows linearly with the number n of nodes in the graph. Numerical experiments illustrate this linear dependence for Chung-Lu models. The experiments also demonstrate that ε-greedy learning strategies can achieve similar performance to the explore-then-commit strategy on Chung-Lu models.more » « less
-
This paper considers online convex optimization over a complicated constraint set, which typically consists of multiple functional constraints and a set constraint. The conventional online projection algorithm (Zinkevich, 2003) can be difficult to implement due to the potentially high computation complexity of the projection operation. In this paper, we relax the functional constraints by allowing them to be violated at each round but still requiring them to be satised in the long term. This type of relaxed online convex optimization (with long term constraints) was first considered in Mahdavi et al. (2012). That prior work proposes an algorithm to achieve O(sqrt(T)) regret and O(T^(3/4)) constraint violations for general problems and another algorithm to achieve an O(T^(2/3)) bound for both regret and constraint violations when the constraint set can be described by a nite number of linear constraints. A recent extension in Jenatton et al. (2016) can achieve O(T^(max(theta, 1-theta)) regret and O(T^(1-theta/2)) constraint violations where theta in (0,1). The current paper proposes a new simple algorithm that yields improved performance in comparison to prior works. The new algorithm achieves an O(sqrt(T)) regret bound with O(1) constraint violations.more » « less
-
We study variants of the online linear optimization (OLO) problem with bandit feedback, where the algorithm has access to external information about the unknown cost vector. Our motivation is the recent body of work on using such “hints” towards improving regret bounds for OLO problems in the full-information setting. Unlike in the full-information OLO setting, with bandit feedback, we first show that one cannot improve the standard regret bounds of O(\sqrt{T}) by using hints, even if they are always well-correlated with the cost vector. In contrast, if the algorithm is empowered to issue queries and if all the responses are correct, then we show O(\log(T)) regret is achievable. We then show how to make this result more robust — when some of the query responses can be adversarial — by using a little feedback on the quality of the responses.more » « less
-
In the problem of online portfolio selection as formulated by Cover (1991), the trader repeatedly distributes her capital over d assets in each of T>1 rounds, with the goal of maximizing the total return. Cover proposed an algorithm, termed Universal Portfolios, that performs nearly as well as the best (in hindsight) static assignment of a portfolio, with an O(dlog(T)) regret in terms of the logarithmic return. Without imposing any restrictions on the market this guarantee is known to be worst-case optimal, and no other algorithm attaining it has been discovered so far. Unfortunately, Cover's algorithm crucially relies on computing certain d-dimensional integral which must be approximated in any implementation; this results in a prohibitive O(d^4(T+d)^14) per-round runtime for the fastest known implementation due to Kalai and Vempala (2002). We propose an algorithm for online portfolio selection that admits essentially the same regret guarantee as Universal Portfolios -- up to a constant factor and replacement of log(T) with log(T+d) -- yet has a drastically reduced runtime of O(d^(T+d)) per round. The selected portfolio minimizes the current logarithmic loss regularized by the log-determinant of its Hessian -- equivalently, the hybrid logarithmic-volumetric barrier of the polytope specified by the asset return vectors. As such, our work reveals surprising connections of online portfolio selection with two classical topics in optimization theory: cutting-plane and interior-point algorithms.more » « less