skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Cross-Validatory Model Selection for Bayesian Autoregressions with Exogenous Regressors
Bayesian cross-validation (CV) is a popular method for predictive model assessment that is simple to implement and broadly applicable. A wide range of CV schemes is available for time series applications, including generic leave-one-out (LOO) and K-fold methods, as well as specialized approaches intended to deal with serial dependence such as leave-future-out (LFO), h-block, and hv-block. Existing large-sample results show that both specialized and generic methods are applicable to models of serially-dependent data. However, large sample consistency results overlook the impact of sampling variability on accuracy in finite samples. Moreover, the accuracy of a CV scheme depends on many aspects of the procedure. We show that poor design choices can lead to elevated rates of adverse selection. In this paper, we consider the problem of identifying the regression component of an important class of models of data with serial dependence, autoregressions of order p with q exogenous regressors (ARX(p,q)), under the logarithmic scoring rule. We show that when serial dependence is present, scores computed using the joint (multivariate) density have lower variance and better model selection accuracy than the popular pointwise estimator. In addition, we present a detailed case study of the special case of ARX models with fixed autoregressive structure and variance. For this class, we derive the finite-sample distribution of the CV estimators and the model selection statistic. We conclude with recommendations for practitioners.  more » « less
Award ID(s):
1921523
PAR ID:
10571658
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Steel, Mark
Publisher / Repository:
Project Euclid
Date Published:
Journal Name:
Bayesian Analysis
Volume:
-1
Issue:
-1
ISSN:
1936-0975
Subject(s) / Keyword(s):
Cross-validation Model comparison Serial dependence Uncertainty
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Brute force cross-validation (CV) is a method for predictive assessment and model selection that is general and applicable to a wide range of Bayesian models. Naive or ‘brute force’ CV approaches are often too computationally costly for interactive modeling workflows, especially when inference relies on Markov chain Monte Carlo (MCMC). We propose overcoming this limitation using massively parallel MCMC. Using accelerator hardware such as graphics processor units, our approach can be about as fast (in wall clock time) as a single full-data model fit. Parallel CV is flexible because it can easily exploit a wide range data partitioning schemes, such as those designed for non-exchangeable data. It can also accommodate a range of scoring rules. We propose MCMC diagnostics, including a summary of MCMC mixing based on the popular potential scale reduction factor (R-hat) and MCMC effective sample size (ESS) measures. We also describe a method for determining whether an R-hat diagnostic indicates approximate stationarity of the chains, that may be of more general interest for applications beyond parallel CV. Finally, we show that parallel CV and its diagnostics can be implemented with online algorithms, allowing parallel CV to scale up to very large blocking designs on memory-constrained computing accelerators. 
    more » « less
  2. AutoRegressive eXogenous (ARX) models form one of the most important model classes in control theory, econometrics, and statistics, but they are yet to be understood in terms of their finite sample identification analysis. The technical challenges come from the strong statistical dependency not only between data samples at different time instances but also between elements within each individual sample. In this work, for ARX models with potentially unknown orders, we study how ordinary least squares (OLS) estimator performs in terms of identifying model parameters from data collected from either a single length-T trajectory or N i.i.d. trajectories. Our main results show that as long as the orders of the model are chosen optimistically, i.e., we are learning an over-parameterized model compared to the ground truth ARX, the OLS will converge with the optimal rate O(1/√T) (or O(1/√N)) to the true (low-order) ARX parameters. This occurs without the aid of any regularization, thus is referred to as self-regularization. Our results imply that the oracle knowledge of the true orders and usage of regularizers are not necessary in learning ARX models — over-parameterization is all you need 
    more » « less
  3. AutoRegressive eXogenous (ARX) models form one of the most important model classes in control theory, econometrics, and statistics, but they are yet to be understood in terms of their finite sample identification analysis. The technical challenges come from the strong statistical dependency not only between data samples at different time instances but also between elements within each individual sample. In this work, for ARX models with potentially unknown orders, we study how ordinary least squares (OLS) estimator performs in terms of identifying model parameters from data collected from either a single length-T trajectory or N i.i.d. trajectories. Our main results show that as long as the orders of the model are chosen optimistically, i.e., we are learning an over-parameterized model compared to the ground truth ARX, the OLS will converge with the optimal rate O(1/√T) (or O(1/√N)) to the true (low-order) ARX parameters. This occurs without the aid of any regularization, thus is referred to as self-regularization. Our results imply that the oracle knowledge of the true orders and usage of regularizers are not necessary in learning ARX models — over-parameterization is all you need. 
    more » « less
  4. This paper develops a unified Lyapunov framework for finite-sample analysis of a Markovian stochastic approximation (SA) algorithm under a contraction operator with respect to an arbitrary norm. The main novelty lies in the construction of a valid Lyapunov function called the generalized Moreau envelope. The smoothness and an approximation property of the generalized Moreau envelope enable us to derive a one-step Lyapunov drift inequality, which is the key to establishing the finite-sample bounds. Our SA result has wide applications, especially in the context of reinforcement learning (RL). Specifically, we show that a large class of value-based RL algorithms can be modeled in the exact form of our Markovian SA algorithm. Therefore, our SA results immediately imply finite-sample guarantees for popular RL algorithms such as n-step temporal difference (TD) learning, TD(𝜆), off-policy V-trace, and Q-learning. As byproducts, by analyzing the convergence bounds of n-step TD and TD(𝜆), we provide theoretical insight into the problem about the efficiency of bootstrapping. Moreover, our finite-sample bounds of off-policy V-trace explicitly capture the tradeoff between the variance of the stochastic iterates and the bias in the limit. 
    more » « less
  5. Dasgupta, Sanjoy; Mandt, Stephan; Li, Yingzhen (Ed.)
    Average reward reinforcement learning (RL) provides a suitable framework for capturing the objective (i.e. long-run average reward) for continuing tasks, where there is often no natural way to identify a discount factor. However, existing average reward RL algorithms with sample complexity guarantees are not feasible, as they take as input the (unknown) mixing time of the Markov decision process (MDP). In this paper, we make initial progress towards addressing this open problem. We design a feasible average-reward $$Q$$-learning framework that requires no knowledge of any problem parameter as input. Our framework is based on discounted $$Q$$-learning, while we dynamically adapt the discount factor (and hence the effective horizon) to progressively approximate the average reward. In the synchronous setting, we solve three tasks: (i) learn a policy that is $$\epsilon$$-close to optimal, (ii) estimate optimal average reward with $$\epsilon$$-accuracy, and (iii) estimate the bias function (similar to $$Q$$-function in discounted case) with $$\epsilon$$-accuracy. We show that with carefully designed adaptation schemes, (i) can be achieved with $$\tilde{O}(\frac{SA t_{\mathrm{mix}}^{8}}{\epsilon^{8}})$$ samples, (ii) with $$\tilde{O}(\frac{SA t_{\mathrm{mix}}^5}{\epsilon^5})$$ samples, and (iii) with $$\tilde{O}(\frac{SA B}{\epsilon^9})$$ samples, where $$t_\mathrm{mix}$$ is the mixing time, and $B > 0$ is an MDP-dependent constant. To our knowledge, we provide the first finite-sample guarantees that are polynomial in $$S, A, t_{\mathrm{mix}}, \epsilon$$ for a feasible variant of $$Q$$-learning. That said, the sample complexity bounds have tremendous room for improvement, which we leave for the community’s best minds. Preliminary simulations verify that our framework is effective without prior knowledge of parameters as input. 
    more » « less