Abstract In a recent work, Baladi and Demers constructed a measure of maximal entropy for finite horizon dispersing billiard maps and proved that it is unique, mixing and moreover Bernoulli. We show that this measure enjoys natural probabilistic properties for Hölder continuous observables, such as at least polynomial decay of correlations and the Central Limit Theorem. The results of Baladi and Demers are subject to a condition of sparse recurrence to singularities. We use a similar and slightly stronger condition, and it has a direct effect on our rate of decay of correlations. For billiard tables with bounded complexity (a property conjectured to be generic), we show that the sparse recurrence condition is always satisfied and the correlations decay at a super‐polynomial rate.
more »
« less
On mix-norms and the rate of decay of correlations
Abstract Two quantitative notions of mixing are the decay of correlations and the decay of a mix-norm—a negative Sobolev norm—and the intensity of mixing can be measured by the rates of decay of these quantities. From duality, correlations are uniformly dominated by a mix-norm; but can they decay asymptotically faster than the mix-norm? We answer this question by constructing an observable with correlation that comes arbitrarily close to achieving the decay rate of the mix-norm. Therefore the mix-norm is the sharpest rate of decay of correlations in both the uniform sense and the asymptotic sense. Moreover, there exists an observable with correlation that decays at the same rate as the mix-norm if and only if the rate of decay of the mix-norm is achieved by its projection onto low-frequency Fourier modes. In this case, the function being mixed is called q -recurrent ; otherwise it is q - transient . We use this classification to study several examples and raise questions for future investigations.
more »
« less
- Award ID(s):
- 1813003
- PAR ID:
- 10373959
- Date Published:
- Journal Name:
- Nonlinearity
- Volume:
- 34
- Issue:
- 6
- ISSN:
- 0951-7715
- Page Range / eLocation ID:
- 3762 to 3782
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Dasgupta, Sanjoy; Mandt, Stephan; Li, Yingzhen (Ed.)Average reward reinforcement learning (RL) provides a suitable framework for capturing the objective (i.e. long-run average reward) for continuing tasks, where there is often no natural way to identify a discount factor. However, existing average reward RL algorithms with sample complexity guarantees are not feasible, as they take as input the (unknown) mixing time of the Markov decision process (MDP). In this paper, we make initial progress towards addressing this open problem. We design a feasible average-reward $$Q$$-learning framework that requires no knowledge of any problem parameter as input. Our framework is based on discounted $$Q$$-learning, while we dynamically adapt the discount factor (and hence the effective horizon) to progressively approximate the average reward. In the synchronous setting, we solve three tasks: (i) learn a policy that is $$\epsilon$$-close to optimal, (ii) estimate optimal average reward with $$\epsilon$$-accuracy, and (iii) estimate the bias function (similar to $$Q$$-function in discounted case) with $$\epsilon$$-accuracy. We show that with carefully designed adaptation schemes, (i) can be achieved with $$\tilde{O}(\frac{SA t_{\mathrm{mix}}^{8}}{\epsilon^{8}})$$ samples, (ii) with $$\tilde{O}(\frac{SA t_{\mathrm{mix}}^5}{\epsilon^5})$$ samples, and (iii) with $$\tilde{O}(\frac{SA B}{\epsilon^9})$$ samples, where $$t_\mathrm{mix}$$ is the mixing time, and $B > 0$ is an MDP-dependent constant. To our knowledge, we provide the first finite-sample guarantees that are polynomial in $$S, A, t_{\mathrm{mix}}, \epsilon$$ for a feasible variant of $$Q$$-learning. That said, the sample complexity bounds have tremendous room for improvement, which we leave for the community’s best minds. Preliminary simulations verify that our framework is effective without prior knowledge of parameters as input.more » « less
-
Dasgupta, Sanjoy; Mandt, Stephan; Li, Yingzhen (Ed.)Average reward reinforcement learning (RL) provides a suitable framework for capturing the objective (i.e. long-run average reward) for continuing tasks, where there is often no natural way to identify a discount factor. However, existing average reward RL algorithms with sample complexity guarantees are not feasible, as they take as input the (unknown) mixing time of the Markov decision process (MDP). In this paper, we make initial progress towards addressing this open problem. We design a feasible average-reward $$Q$$-learning framework that requires no knowledge of any problem parameter as input. Our framework is based on discounted $$Q$$-learning, while we dynamically adapt the discount factor (and hence the effective horizon) to progressively approximate the average reward. In the synchronous setting, we solve three tasks: (i) learn a policy that is $$\epsilon$$-close to optimal, (ii) estimate optimal average reward with $$\epsilon$$-accuracy, and (iii) estimate the bias function (similar to $$Q$$-function in discounted case) with $$\epsilon$$-accuracy. We show that with carefully designed adaptation schemes, (i) can be achieved with $$\tilde{O}(\frac{SA t_{\mathrm{mix}}^{8}}{\epsilon^{8}})$$ samples, (ii) with $$\tilde{O}(\frac{SA t_{\mathrm{mix}}^5}{\epsilon^5})$$ samples, and (iii) with $$\tilde{O}(\frac{SA B}{\epsilon^9})$$ samples, where $$t_\mathrm{mix}$$ is the mixing time, and $B > 0$ is an MDP-dependent constant. To our knowledge, we provide the first finite-sample guarantees that are polynomial in $$S, A, t_{\mathrm{mix}}, \epsilon$$ for a feasible variant of $$Q$$-learning. That said, the sample complexity bounds have tremendous room for improvement, which we leave for the community’s best minds. Preliminary simulations verify that our framework is effective without prior knowledge of parameters as input.more » « less
-
Dasgupta, Sanjoy; Mandt, Stephan; Li, Yingzhen (Ed.)Average reward reinforcement learning (RL) provides a suitable framework for capturing the objective (i.e. long-run average reward) for continuing tasks, where there is often no natural way to identify a discount factor. However, existing average reward RL algorithms with sample complexity guarantees are not feasible, as they take as input the (unknown) mixing time of the Markov decision process (MDP). In this paper, we make initial progress towards addressing this open problem. We design a feasible average-reward $$Q$$-learning framework that requires no knowledge of any problem parameter as input. Our framework is based on discounted $$Q$$-learning, while we dynamically adapt the discount factor (and hence the effective horizon) to progressively approximate the average reward. In the synchronous setting, we solve three tasks: (i) learn a policy that is $$\epsilon$$-close to optimal, (ii) estimate optimal average reward with $$\epsilon$$-accuracy, and (iii) estimate the bias function (similar to $$Q$$-function in discounted case) with $$\epsilon$$-accuracy. We show that with carefully designed adaptation schemes, (i) can be achieved with $$\tilde{O}(\frac{SA t_{\mathrm{mix}}^{8}}{\epsilon^{8}})$$ samples, (ii) with $$\tilde{O}(\frac{SA t_{\mathrm{mix}}^5}{\epsilon^5})$$ samples, and (iii) with $$\tilde{O}(\frac{SA B}{\epsilon^9})$$ samples, where $$t_\mathrm{mix}$$ is the mixing time, and $B > 0$ is an MDP-dependent constant. To our knowledge, we provide the first finite-sample guarantees that are polynomial in $$S, A, t_{\mathrm{mix}}, \epsilon$$ for a feasible variant of $$Q$$-learning. That said, the sample complexity bounds have tremendous room for improvement, which we leave for the community’s best minds. Preliminary simulations verify that our framework is effective without prior knowledge of parameters as input.more » « less
-
We consider the problem of determining the manifold $$n$$-widths of Sobolev and Besov spaces with error measured in the $$L_p$$-norm. The manifold widths control how efficiently these spaces can be approximated by general non-linear parametric methods with the restriction that the parameter selection and parameterization maps must be continuous. Existing upper and lower bounds only match when the Sobolev or Besov smoothness index $$q$$ satisfies $$q\leq p$$ or $$1 \leq p \leq 2$$. We close this gap and obtain sharp lower bounds for all $$1 \leq p,q \leq \infty$$ for which a compact embedding holds. A key part of our analysis is to determine the exact value of the manifold widths of finite dimensional $$\ell^M_q$$-balls in the $$\ell_p$$-norm when $$p\leq q$$. Although this result is not new, we provide a new proof and apply it to lower bounding the manifold widths of Sobolev and Besov spaces. Our results show that the Bernstein widths, which are typically used to lower bound the manifold widths, decay asymptotically faster than the manifold widths in many cases.more » « less
An official website of the United States government

