We consider the problem of detecting a change in mean in a sequence of high-dimensional Gaussian vectors. The change in mean may be occurring simultaneously in an unknown subset components. We propose a hypothesis test to detect the presence of a change-point and establish the detection boundary in different regimes under the assumption that the dimension tends to infinity and the length of the sequence grows with the dimension. A remarkable feature of the proposed test is that it does not require any knowledge of the subset of components in which the change in mean is occurring and yet automatically adapts to yield optimal rates of convergence over a wide range of statistical regimes.
more »
« less
High Dimensional Change Point Inference: Recent Developments and Extensions
Change point analysis aims to detect structural changes in a data sequence. It has always been an active research area since it was introduced in the 1950s. In modern statistical applications, however, high-throughput data with increasing dimensions are ubiquitous in fields ranging from economics, finance to genetics and engineering. For those problems, the earlier works are typically no longer applicable. As a result, the problem of testing a change point for high dimensional data sequences has been an important yet challenging task. In this paper, we first focus on models for at most one change point, and review recent state-of-art techniques for change point testing of high dimensional mean vectors and compare their theoretical properties. Based on that, we provide a survey of some extensions to general high dimensional parameters beyond mean vectors as well as strategies for testing multiple change points in high dimensions. Finally, we discuss some open problems for possible future research directions.
more »
« less
- Award ID(s):
- 2100729
- PAR ID:
- 10345675
- Date Published:
- Journal Name:
- Journal of Multivariate Analysis
- Volume:
- 188
- ISSN:
- 0047-259X
- Page Range / eLocation ID:
- 104833
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Cumulative sum (CUSUM) statistics are widely used in the change point inference and identification. For the problem of testing for existence of a change point in an independent sample generated from the mean-shift model, we introduce a Gaussian multiplier bootstrap to calibrate critical values of the CUSUM test statistics in high dimensions. The proposed bootstrap CUSUM test is fully data dependent and it has strong theoretical guarantees under arbitrary dependence structures and mild moment conditions. Specifically, we show that with a boundary removal parameter the bootstrap CUSUM test enjoys the uniform validity in size under the null and it achieves the minimax separation rate under the sparse alternatives when the dimension p can be larger than the sample size n. Once a change point is detected, we estimate the change point location by maximising the ℓ∞-norm of the generalised CUSUM statistics at two different weighting scales corresponding to covariance stationary and non-stationary CUSUM statistics. For both estimators, we derive their rates of convergence and show that dimension impacts the rates only through logarithmic factors, which implies that consistency of the CUSUM estimators is possible when p is much larger than n. In the presence of multiple change points, we propose a principled bootstrap-assisted binary segmentation (BABS) algorithm to dynamically adjust the change point detection rule and recursively estimate their locations. We derive its rate of convergence under suitable signal separation and strength conditions. The results derived in this paper are non-asymptotic and we provide extensive simulation studies to assess the finite sample performance. The empirical evidence shows an encouraging agreement with our theoretical results.more » « less
-
Binary decision diagrams (BDDs) have been a huge success story in hardware and software verification and are increasingly applied to a wide range of combinatorial problems. While BDDs can encode boolean-valued functions of boolean-valued variables, many BDD variants have been proposed, not just to improve their efficiency, but to manage multivalued domains (a straightforward extension), multivalued ranges (using several competitive alternatives), and two-dimensional data (relations and matrices instead of sets or vectors). Orthogonally to these extensions, much effort has been spent on variable order heuristics, an essential aspect that can affect memory and time requirements by up to an exponential factor. We survey some of these exciting results and discuss some fruitful research directions for further work. Index Terms—Binary decision diagrams, canonicity, discrete function encoding, variable order heuristics, Markov chainsmore » « less
-
We study the problem of estimating at a central server the mean of a set of vectors distributed across several nodes (one vector per node). When the vectors are high-dimensional, the communication cost of sending entire vectors may be prohibitive, and it may be imperative for them to use sparsification techniques. While most existing work on sparsified mean estimation is agnostic to the characteristics of the data vectors, in many practical applications such as federated learning, there may be spatial correlations (similarities in the vectors sent by different nodes) or temporal correlations (similarities in the data sent by a single node over different iterations of the algorithm) in the data vectors. We leverage these correlations by simply modifying the decoding method used by the server to estimate the mean. We provide an analysis of the resulting estimation error as well as experiments for PCA, K-Means and Logistic Regression, which show that our estimators consistently outperform more sophisticated and expensive sparsification methods.more » « less
-
A Backward Procedure for Change-point Detection with Application to Copy Number Variation Detection.Yao, Fang (Ed.)Change-point detection regains much attention recently for analyzing array or sequencing data for copy number variation (CNV) detection. In such applications, the true signals are typically very short and buried in the long data sequence, which makes it challenging to identify the variations efficiently and accurately. In this article, we propose a new change-point detection method, a backward procedure, which is not only fast and simple enough to exploit high-dimensional data but also performs very well for detecting short signals. Although motivated by CNV detection, the backward procedure is generally applicable to assorted change-point problems that arise in a variety of scientific applications. It is illustrated by both simulated and real CNV data that the backward detection has clear advantages over other competing methods, especially when the true signal is short.more » « less
An official website of the United States government

