skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Homogeneity tests of covariance matrices with high-dimensional longitudinal data
Summary This paper deals with the detection and identification of changepoints among covariances of high-dimensional longitudinal data, where the number of features is greater than both the sample size and the number of repeated measurements. The proposed methods are applicable under general temporal-spatial dependence. A new test statistic is introduced for changepoint detection, and its asymptotic distribution is established. If a changepoint is detected, an estimate of the location is provided. The rate of convergence of the estimator is shown to depend on the data dimension, sample size, and signal-to-noise ratio. Binary segmentation is used to estimate the locations of possibly multiple changepoints, and the corresponding estimator is shown to be consistent under mild conditions. Simulation studies provide the empirical size and power of the proposed test and the accuracy of the changepoint estimator. An application to a time-course microarray dataset identifies gene sets with significant gene interaction changes over time.  more » « less
Award ID(s):
1820702
PAR ID:
10146724
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Biometrika
Volume:
106
Issue:
3
ISSN:
0006-3444
Page Range / eLocation ID:
619 to 634
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract We propose the multiple changepoint isolation (MCI) method for detecting multiple changes in the mean and covariance of a functional process. We first introduce a pair of projections to represent the variability “between” and “within” the functional observations. We then present an augmented fused lasso procedure to split the projections into multiple regions robustly. These regions act to isolate each changepoint away from the others so that the powerful univariate CUSUM statistic can be applied region‐wise to identify the changepoints. Simulations show that our method accurately detects the number and locations of changepoints under many different scenarios. These include light and heavy tailed data, data with symmetric and skewed distributions, sparsely and densely sampled changepoints, and mean and covariance changes. We show that our method outperforms a recent multiple functional changepoint detector and several univariate changepoint detectors applied to our proposed projections. We also show that MCI is more robust than existing approaches and scales linearly with sample size. Finally, we demonstrate our method on a large time series of water vapor mixing ratio profiles from atmospheric emitted radiance interferometer measurements. 
    more » « less
  2. In the 1-dimensional multiple changepoint detection problem, we derive a new fast error rate for the fused lasso estimator, under the assumption that the mean vector has a sparse number of changepoints. This rate is seen to be suboptimal (compared to the minimax rate) by only a factor of loglogn. Our proof technique is centered around a novel construction that we call a lower interpolant. We extend our results to misspecified models and exponential family distributions. We also describe the implications of our error analysis for the approximate screening of changepoints. 
    more » « less
  3. This paper explores the use of changepoint detection (CPD) for an improved time-localization of forced oscillations (FOs) in measured power system data. In order for the autoregressive moving average plus sinusoids (ARMA+S) class of electromechanical mode meters to successfully estimate modal frequency and damping from data that contains a FO, accurate estimates of where the FO exists in time series are needed. Compared to the existing correlation-based method, the proposed CPD method is based on upon a maximum likelihood estimator (MLE) for the detection of an unknown number changes in signal mean to unknown levels at unknown times. Using the pruned exact linear time (PELT) dynamic programming algorithm along with a novel refinement technique, the proposed approach is shown to provide a dramatic improvement in FO start/stop time estimation accuracy while being robust to intermittent FOs. These findings were supported though simulations with the minniWECC model. 
    more » « less
  4. Large-scale software exhibits periods of increased defect discovery when blocks of less thoroughly tested code are introduced into an existing codebase. For example, the mission systems schedule of software intensive government acquisition programs includes multiple overlapping software blocks associated with various capabilities. Software reliability researchers have proposed changepoint models to characterize periods of increased defect discovery. However, these models attempt to identify the location of these changepoints after testing has been performed, which is counter-intuitive because conscious decisions such as testing new functionality drive software changepoints. Existing changepoint models are therefore difficult to employ in a predictive manner. To overcome this limitation, this paper proposes a covariate software defect discovery model capable of explaining changepoints in terms of common software testing activities and metrics such as software size estimation, code coverage, and defect density. The proposed and past changepoint models are compared with respect to their predictive accuracy and computational efficiency. Our results indicate that the proposed approach is more computationally efficient and enables accurate prediction of the time needed to achieve a desired defect discovery intensity or mean time to failure despite the occurrence of changepoints during software testing. 
    more » « less
  5. This paper proposes an iterative method of estimating power system forced oscillation (FO) amplitude, frequency, phase, and start/stop times from measured data. It combines three algorithms with favorable asymptotic statistical properties: a periodogram-based iterative frequency estimator, a Discrete-Time Fourier Transform (DTFT)-based method of estimating amplitude and phase, and a changepoint detection (CPD) method for estimating the FO start and stop samples. Each of these have been shown in the literature to be approximate maximum likelihood estimators (MLE), meaning that for large enough sample size or signal-to-noise ratio (SNR), they can be unbiased and reach the Cramer-Rao Lower Bound in variance. The proposed method is shown through Monte Carlo simulations of a low-order model of the Western Electricity Coordinating Council (WECC) power system to achieve statistical efficiency for low SNR values. The proposed method is validated with data measured from the January 11, 2019 US Eastern Interconnection (EI) FO event. It is shown to accurately extract the FO parameters and remove electromechanical mode meter bias, even with a time-varying FO amplitude. 
    more » « less