skip to main content


Title: A Self-Normalized Approach to Confidence Interval Construction in Time Series
Summary

We propose a new method to construct confidence intervals for quantities that are associated with a stationary time series, which avoids direct estimation of the asymptotic variances. Unlike the existing tuning-parameter-dependent approaches, our method has the attractive convenience of being free of any user-chosen number or smoothing parameter. The interval is constructed on the basis of an asymptotically distribution-free self-normalized statistic, in which the normalizing matrix is computed by using recursive estimates. Under mild conditions, we establish the theoretical validity of our method for a broad class of statistics that are functionals of the empirical distribution of fixed or growing dimension. From a practical point of view, our method is conceptually simple, easy to implement and can be readily used by the practitioner. Monte Carlo simulations are conducted to compare the finite sample performance of the new method with those delivered by the normal approximation and the block bootstrap approach.

 
more » « less
PAR ID:
10401385
Author(s) / Creator(s):
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series B: Statistical Methodology
Volume:
72
Issue:
3
ISSN:
1369-7412
Page Range / eLocation ID:
p. 343-366
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    We propose a new procedure for inference on optimal treatment regimes in the model‐free setting, which does not require to specify an outcome regression model. Existing model‐free estimators for optimal treatment regimes are usually not suitable for the purpose of inference, because they either have nonstandard asymptotic distributions or do not necessarily guarantee consistent estimation of the parameter indexing the Bayes rule due to the use of surrogate loss. We first study a smoothed robust estimator that directly targets the parameter corresponding to the Bayes decision rule for optimal treatment regimes estimation. This estimator is shown to have an asymptotic normal distribution. Furthermore, we verify that a resampling procedure provides asymptotically accurate inference for both the parameter indexing the optimal treatment regime and the optimal value function. A new algorithm is developed to calculate the proposed estimator with substantially improved speed and stability. Numerical results demonstrate the satisfactory performance of the new methods.

     
    more » « less
  2. Summary

    Decentralized waste water treatment facilities monitor many features that are complexly related. The ability to detect the onset of a fault and to identify variables accurately that have shifted because of the fault are vital to maintaining proper system operation and high quality produced water. Various multivariate methods have been proposed to perform fault detection and isolation, but the methods require data to be independent and identically distributed when the process is in control, and most require a distributional assumption. We propose a distribution-free retrospective change-point-detection method for auto-correlated and non-stationary multivariate processes. We detrend the data by using observations from an in-control time period to account for expected changes due to external or user-controlled factors. Next, we perform the fused lasso, which penalizes differences in consecutive observations, to detect faults and to identify shifted variables. To account for auto-correlation, the regularization parameter is chosen by using an estimated effective sample size in the extended Bayesian information criterion. We demonstrate the performance of our method compared with a competitor in simulation. Finally, we apply our method to waste water treatment facility data with a known fault, and the variables identified by our proposed method are consistent with the operators’ diagnosis of the fault's cause.

     
    more » « less
  3. We develop differentially private hypothesis testing methods for the small sample regime. Given a sample  from a categorical distribution p over some domain Σ, an explicitly described distribution q over Σ, some privacy parameter ε, accuracy parameter α, and requirements βI and βII for the type I and type II errors of our test, the goal is to distinguish between p=q and dTV(p,q)≥α. We provide theoretical bounds for the sample size || so that our method both satisfies (ε,0)-differential privacy, and guarantees βI and βII type I and type II errors. We show that differential privacy may come for free in some regimes of parameters, and we always beat the sample complexity resulting from running the χ2-test with noisy counts, or standard approaches such as repetition for endowing non-private χ2-style statistics with differential privacy guarantees. We experimentally compare the sample complexity of our method to that of recently proposed methods for private hypothesis testing. 
    more » « less
  4. Abstract

    Motivated by applications in text mining and discrete distribution inference, we test for equality of probability mass functions of K groups of high-dimensional multinomial distributions. Special cases of this problem include global testing for topic models, two-sample testing in authorship attribution, and closeness testing for discrete distributions. A test statistic, which is shown to have an asymptotic standard normal distribution under the null hypothesis, is proposed. This parameter-free limiting null distribution holds true without requiring identical multinomial parameters within each group or equal group sizes. The optimal detection boundary for this testing problem is established, and the proposed test is shown to achieve this optimal detection boundary across the entire parameter space of interest. The proposed method is demonstrated in simulation studies and applied to analyse two real-world datasets to examine, respectively, variation among customer reviews of Amazon movies and the diversity of statistical paper abstracts.

     
    more » « less
  5. Abstract

    We propose a model-based clustering method for high-dimensional longitudinal data via regularization in this paper. This study was motivated by the Trial of Activity in Adolescent Girls (TAAG), which aimed to examine multilevel factors related to the change of physical activity by following up a cohort of 783 girls over 10 years from adolescence to early adulthood. Our goal is to identify the intrinsic grouping of subjects with similar patterns of physical activity trajectories and the most relevant predictors within each group. The previous analyses conducted clustering and variable selection in two steps, while our new method can perform the tasks simultaneously. Within each cluster, a linear mixed-effects model (LMM) is fitted with a doubly penalized likelihood to induce sparsity for parameter estimation and effect selection. The large-sample joint properties are established, allowing the dimensions of both fixed and random effects to increase at an exponential rate of the sample size, with a general class of penalty functions. Assuming subjects are drawn from a Gaussian mixture distribution, model effects and cluster labels are estimated via a coordinate descent algorithm nested inside the Expectation-Maximization (EM) algorithm. Bayesian Information Criterion (BIC) is used to determine the optimal number of clusters and the values of tuning parameters. Our numerical studies show that the new method has satisfactory performance and is able to accommodate complex data with multilevel and/or longitudinal effects.

     
    more » « less