NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Finite Sample Change Point Inference and Identification for High-Dimensional Mean Vectors

https://doi.org/10.1111/rssb.12406

Yu, Mengjia; Chen, Xiaohui (December 2020, Journal of the Royal Statistical Society Series B: Statistical Methodology)

Abstract Cumulative sum (CUSUM) statistics are widely used in the change point inference and identification. For the problem of testing for existence of a change point in an independent sample generated from the mean-shift model, we introduce a Gaussian multiplier bootstrap to calibrate critical values of the CUSUM test statistics in high dimensions. The proposed bootstrap CUSUM test is fully data dependent and it has strong theoretical guarantees under arbitrary dependence structures and mild moment conditions. Specifically, we show that with a boundary removal parameter the bootstrap CUSUM test enjoys the uniform validity in size under the null and it achieves the minimax separation rate under the sparse alternatives when the dimension p can be larger than the sample size n. Once a change point is detected, we estimate the change point location by maximising the ℓ∞-norm of the generalised CUSUM statistics at two different weighting scales corresponding to covariance stationary and non-stationary CUSUM statistics. For both estimators, we derive their rates of convergence and show that dimension impacts the rates only through logarithmic factors, which implies that consistency of the CUSUM estimators is possible when p is much larger than n. In the presence of multiple change points, we propose a principled bootstrap-assisted binary segmentation (BABS) algorithm to dynamically adjust the change point detection rule and recursively estimate their locations. We derive its rate of convergence under suitable signal separation and strength conditions. The results derived in this paper are non-asymptotic and we provide extensive simulation studies to assess the finite sample performance. The empirical evidence shows an encouraging agreement with our theoretical results.
more » « less
Stratified incomplete local simplex tests for curvature of nonparametric multiple regression

https://doi.org/10.3150/22-BEJ1459

Song, Yanglei; Chen, Xiaohui; Kato, Kengo (February 2023, Bernoulli)

Full Text Available
Robust Inference for Partially Observed Functional Response Data

https://doi.org/10.5705/ss.202020.0358

Park, Yeonjoo; Chen, Xiaohui; Simpson, Douglas (January 2023, Statistica Sinica)

Full Text Available
Sketch-and-Lift: Scalable Subsampled Semidefinite Program for K-means Clustering

Zhuang, Yubo; Chen, Xiaohui; Yang, Yun (March 2022, International Conference on Artificial Intelligence and Statistics)

Semidefinite programming (SDP) is a powerful tool for tackling a wide range of computationally hard problems such as clustering. Despite the high accuracy, semidefinite programs are often too slow in practice with poor scalability on large (or even moderate) datasets. In this paper, we introduce a linear time complexity algorithm for approximating an SDP relaxed K-means clustering. The proposed sketch-and-lift (SL) approach solves an SDP on a subsampled dataset and then propagates the solution to all data points by a nearest-centroid rounding procedure. It is shown that the SL approach enjoys a similar exact recovery threshold as the K-means SDP on the full dataset, which is known to be information-theoretically tight under the Gaussian mixture model. The SL method can be made adaptive with enhanced theoretic properties when the cluster sizes are unbalanced. Our simulation experiments demonstrate that the statistical accuracy of the proposed method outperforms state-of-the-art fast clustering algorithms without sacrificing too much computational efficiency, and is comparable to the original K-means SDP with substantially reduced runtime.
more » « less
Full Text Available
A robust bootstrap change point test for high-dimensional location parameter

https://doi.org/10.1214/21-EJS1915

Yu, Mengjia; Chen, Xiaohui (January 2022, Electronic Journal of Statistics)

Full Text Available
Mean-field nonparametric estimation of interacting particle systems

Yao, Rentian; Chen, Xiaohui; Yang, Yun (January 2022, Proceedings of Thirty Fifth Conference on Learning Theory)

Full Text Available
Cutoff for Exact Recovery of Gaussian Mixture Models

https://doi.org/10.1109/TIT.2021.3063155

Chen, Xiaohui; Yang, Yun (June 2021, IEEE Transactions on Information Theory)
null (Ed.)
Full Text Available
Hanson–Wright inequality in Hilbert spaces with application to $$K$$-means clustering for non-Euclidean data

https://doi.org/10.3150/20-BEJ1251

Chen, Xiaohui; Yang, Yun (February 2021, Bernoulli)
null (Ed.)
Full Text Available
Maximum likelihood estimation of potential energy in interacting particle systems from single-trajectory data

https://doi.org/10.1214/21-ECP416

Chen, Xiaohui (January 2021, Electronic Communications in Probability)

Full Text Available
Jackknife multiplier bootstrap: finite sample approximations to the U-process supremum with applications

https://doi.org/10.1007/s00440-019-00936-y

Chen, Xiaohui; Kato, Kengo (April 2020, Probability Theory and Related Fields)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records