skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Second Order Path Variationals in Non-Stationary Online Learning
We consider the problem of universal dynamic regret minimization under exp-concave and smooth losses. We show that appropriately designed Strongly Adaptive algorithms achieve a dynamic regret of $$\tilde O(d^2 n^{1/5} [\mathcal{TV}_1(w_{1:n})]^{2/5} \vee d^2)$$, where $$n$$ is the time horizon and $$\mathcal{TV}_1(w_{1:n})$$ a path variational based on second order differences of the comparator sequence. Such a path variational naturally encodes comparator sequences that are piece-wise linear – a powerful family that tracks a variety of non-stationarity patterns in practice (Kim et al., 2009). The aforementioned dynamic regret is shown to be optimal modulo dimension dependencies and poly-logarithmic factors of $$n$$. To the best of our knowledge, this path variational has not been studied in the non-stochastic online learning literature before. Our proof techniques rely on analysing the KKT conditions of the offline oracle and requires several non-trivial generalizations of the ideas in Baby and Wang (2021) where the latter work only implies an $$\tilde{O}(n^{1/3})$$ regret for the current problem.  more » « less
Award ID(s):
2007117
PAR ID:
10466935
Author(s) / Creator(s):
;
Editor(s):
Ruiz, Francisco and
Publisher / Repository:
PMLR
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
206
ISSN:
2640-3498
Page Range / eLocation ID:
9024--9075
Subject(s) / Keyword(s):
Online learning nonstationarity dynamic regret total variation
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We consider the framework of non-stationary stochastic optimization (Besbes et al., 2015) with squared error losses and noisy gradient feedback where the dynamic regret of an online learner against a time varying comparator sequence is studied. Motivated from the theory of non-parametric regression, we introduce a new variational constraint that enforces the comparator sequence to belong to a discrete k^{th} order Total Variation ball of radius C_n. This variational constraint models comparators that have piece-wise polynomial structure which has many relevant practical applications (Tibshirani, 2014). By establishing connections to the theory of wavelet based non-parametric regression, we design a polynomial time algorithm that achieves the nearly optimal dynamic regret of ~O(n^{1/(2k+3)} C_n^{2/(2k+3)}). The proposed policy is adaptive to the unknown radius C_n. Further, we show that the same policy is minimax optimal for several other non-parametric families of interest. 
    more » « less
  2. We study the framework of universal dynamic regret minimization with strongly convex losses. We answer an open problem in Baby and Wang 2021 by showing that in a proper learning setup, Strongly Adaptive algorithms can achieve the near optimal dynamic regret of 𝑂̃ (𝑑1/3𝑛1/3TV[𝑒1:𝑛]2/3βˆ¨π‘‘) against any comparator sequence 𝑒1,…,𝑒𝑛 simultaneously, where 𝑛 is the time horizon and TV[𝑒1:𝑛] is the Total Variation of comparator. These results are facilitated by exploiting a number of new structures imposed by the KKT conditions that were not considered in Baby and Wang 2021 which also lead to other improvements over their results such as: (a) handling non-smooth losses and (b) improving the dimension dependence on regret. Further, we also derive near optimal dynamic regret rates for the special case of proper online learning with exp-concave losses and an 𝐿∞ constrained decision set. 
    more » « less
  3. We investigate the behavior of higher-form symmetries at variousquantum phase transitions. We consider discrete 1-form symmetries, whichcan be either part of the generalized concept β€œcategorical symmetry”(labelled as \tilde{Z}_N^{(1)} Z Μƒ N ( 1 ) )introduced recently, or an explicit Z_N^{(1)} Z N ( 1 ) 1-form symmetry. We demonstrate that for many quantum phase transitionsinvolving a Z_N^{(1)} Z N ( 1 ) or \tilde{Z}_N^{(1)} Z Μƒ N ( 1 ) symmetry, the following expectation value \langle \left( O_\mathcal{C}\right)^2 \rangle ⟨ ( O π’ž ) 2 ⟩ takes the form \langle \left( \log O_\mathcal{C} \right)^2 \rangle \sim - \frac{A}{\epsilon} P + b \log P ⟨ ( log O π’ž ) 2 ⟩ ∼ βˆ’ A Ο΅ P + b log P , where O_\mathcal{C} O π’ž is an operator defined associated with loop \mathcal{C} π’ž (or its interior \mathcal{A} π’œ ),which reduces to the Wilson loop operator for cases with an explicit Z_N^{(1)} Z N ( 1 ) 1-form symmetry. P P is the perimeter of \mathcal{C} π’ž ,and the b \log P b log P term arises from the sharp corners of the loop \mathcal{C} π’ž ,which is consistent with recent numerics on a particular example. b b is a universal microscopic-independent number, which in (2+1)d ( 2 + 1 ) d is related to the universal conductivity at the quantum phasetransition. b b can be computed exactly for certain transitions using the dualitiesbetween (2+1)d ( 2 + 1 ) d conformal field theories developed in recent years. We also compute the"strange correlator" of O_\mathcal{C} O π’ž : S_{\mathcal{C}} = \langle 0 | O_\mathcal{C} | 1 \rangle / \langle 0 | 1 \rangle S π’ž = ⟨ 0 | O π’ž | 1 ⟩ / ⟨ 0 | 1 ⟩ where |0\rangle | 0 ⟩ and |1\rangle | 1 ⟩ are many-body states with different topological nature. 
    more » « less
  4. We investigate the problem of unconstrained combinatorial multi-armed bandits with full-bandit feedback and stochastic rewards for submodular maximization. Previous works investigate the same problem assuming a submodular and monotone reward function. In this work, we study a more general problem, i.e., when the reward function is not necessarily monotone, and the submodularity is assumed only in expectation. We propose Randomized Greedy Learning (RGL) algorithm and theoretically prove that it achieves a $$\frac{1}{2}$$-regret upper bound of $$\Tilde{\mathcal{O}}(n T^{\frac{2}{3}})$$ for horizon $$T$$ and number of arms $$n$$. We also show in experiments that RGL empirically outperforms other full-bandit variants in submodular and non-submodular settings. 
    more » « less
  5. Dueling bandits is a prominent framework for decision-making involving preferential feedback, a valuable feature that fits various applications involving human interaction, such as ranking, information retrieval, and recommendation systems. While substantial efforts have been made to minimize the cumulative regret in dueling bandits, a notable gap in the current research is the absence of regret bounds that account for the inherent uncertainty in pairwise comparisons between the dueling arms. Intuitively, greater uncertainty suggests a higher level of difficulty in the problem. To bridge this gap, this paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM). We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $$\tilde O\big(d\sqrt{\sum_{t=1}^T\sigma_t^2} + d\big)$$, where $$\sigma_t$$ is the variance of the pairwise comparison in round $$t$$, $$d$$ is the dimension of the context vectors, and $$T$$ is the time horizon. Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $$\tilde O(d)$$ regret. We perform empirical experiments on synthetic data to confirm the advantage of our method over previous variance-agnostic algorithms. 
    more » « less