We consider the problem of constructing asymptotically valid confidence intervals for the change point in a high-dimensional covariance shift setting. A novel estimator for the change point parameter is developed, and its asymptotic distribution under high dimen- sional scaling obtained. We establish that the proposed estimator exhibits a sharp Op(ψ−2) rate of convergence, wherein ψ represents the jump size between model parameters before and after the change point. Further, the form of the asymptotic distributions under both a vanishing and a non-vanishing regime of the jump size are characterized. In the former case, it corresponds to the argmax of an asymmetric Brownian motion, while in the latter case to the argmax of an asymmetric random walk. We then obtain the relationship be- tween these distributions, which allows construction of regime (vanishing vs non-vanishing) adaptive confidence intervals. Easy to implement algorithms for the proposed methodology are developed and their performance illustrated on synthetic and real data sets.
more »
« less
This content will become publicly available on May 1, 2026
Inference for change points in high-dimensional mean shift models
We consider the problem of constructing confidence intervals for the locations of change points in a high-dimensional mean shift model. We develop a locally refitted least squares estimator and obtain component-wise and simultaneous rates of estimation of change points. The simultaneous rate is the sharpest available by at least a factor of log p, while the component-wise one is optimal. These results enable existence of limiting distributions for the locations of the change points. Subsequently, component-wise distributions are characterized under both vanishing and non-vanishing jump size regimes, while joint distributions of change point estimates are characterized under the latter regime, which also yields asymptotic independence of these estimates. We provide the relationship between these distributions, which allows construction of regime adaptive confidence intervals. All results are established under a high dimensional scaling, in the presence of diverging number of change points. They are illustrated on synthetic data and on sensor measurements from smartphones for activity recognition.
more »
« less
- Award ID(s):
- 2348640
- PAR ID:
- 10610983
- Publisher / Repository:
- https://www3.stat.sinica.edu.tw/statistica/J35n2/j35n223/j35n223.html
- Date Published:
- Journal Name:
- Statistica sinica
- ISSN:
- 1017-0405
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Gridded datasets occur in several domains. These datasets comprise (un)structured grid points, where each grid point is characterized by XY(Z) coordinates in a spatial referencing system. The data available at individual grid points are high-dimensional encapsulating multiple variables of interest. This study has two thrusts. The first targets supporting effective management of voluminous gridded datasets while reconciling challenges relating to colocation and dispersion. The second thrust is to support sliding (temporal) window queries over the gridded dataset. Such queries involve sliding a temporal window over the data to identify spatial locations and chronological time points where the specified predicate evaluates to true. Our methodology includes support for a space-efficient data structure for organizing information within the data, query decomposition based on dyadic intervals, support for temporal anchoring, query transformations, and effective evaluation of query predicates. Our empirical benchmarks are conducted on representative voluminous high dimensional datasets such as gridMET (historical meteorological data) and MACA (future climate datasets based on the RCP 8.5 greenhouse gas trajectory). In our benchmarks, our system can handle throughputs of over 3000 multi-predicate sliding window queries per second.more » « less
-
We offer a survey of recent results on covariance estimation for heavy- tailed distributions. By unifying ideas scattered in the literature, we propose user-friendly methods that facilitate practical implementation. Specifically, we introduce element-wise and spectrum-wise truncation operators, as well as their M-estimator counterparts, to robustify the sample covariance matrix. Different from the classical notion of robustness that is characterized by the breakdown property, we focus on the tail robustness which is evidenced by the connection between nonasymptotic deviation and confidence level. The key observation is that the estimators needs to adapt to the sample size, dimensional- ity of the data and the noise level to achieve optimal tradeoff between bias and robustness. Furthermore, to facilitate their practical use, we propose data-driven procedures that automatically calibrate the tuning parameters. We demonstrate their applications to a series of structured models in high dimensions, including the bandable and low-rank covariance matrices and sparse precision matrices. Numerical studies lend strong support to the proposed methods.more » « less
-
Scaling regions—intervals on a graph where the dependent variable depends linearly on the independent variable—abound in dynamical systems, notably in calculations of invariants like the correlation dimension or a Lyapunov exponent. In these applications, scaling regions are generally selected by hand, a process that is subjective and often challenging due to noise, algorithmic effects, and confirmation bias. In this paper, we propose an automated technique for extracting and characterizing such regions. Starting with a two-dimensional plot—e.g., the values of the correlation integral, calculated using the Grassberger–Procaccia algorithm over a range of scales—we create an ensemble of intervals by considering all possible combinations of end points, generating a distribution of slopes from least squares fits weighted by the length of the fitting line and the inverse square of the fit error. The mode of this distribution gives an estimate of the slope of the scaling region (if it exists). The end points of the intervals that correspond to the mode provide an estimate for the extent of that region. When there is no scaling region, the distributions will be wide and the resulting error estimates for the slope will be large. We demonstrate this method for computations of dimension and Lyapunov exponent for several dynamical systems and show that it can be useful in selecting values for the parameters in time-delay reconstructions.more » « less
-
Rosen, D (Ed.)This paper proposes a new test for a change point in the mean of high-dimensional data based on the spatial sign and self-normalization. The test is easy to implement with no tuning parameters, robust to heavy-tailedness and theoretically justified with both fixed-and sequential asymptotics under both null and alternatives, where n is the sample size. We demonstrate that the fixed-n asymptotics provide a better approximation to the finite sample distribution and thus should be preferred in both testing and testing-based estimation. To estimate the number and locations when multiple change-points are present, we propose to combine the p-value under the fixed-n asymptotics with the seeded binary segmentation (SBS) algorithm. Through numerical experiments, we show that the spatial sign based procedures are robust with respect to the heavy-tailedness and strong coordinate-wise dependence, whereas their non-robust counterparts proposed in Wang et al. (2022) [28] appear to under-perform. A real data example is also provided to illustrate the robustness and broad applicability of the proposed test and its corresponding estimation algorithm.more » « less
An official website of the United States government
