Abstract We introduce a novel geometry‐oriented methodology, based on the emerging tools of topological data analysis, into the change‐point detection framework. The key rationale is that change points are likely to be associated with changes in geometry behind the data‐generating process. While the applications of topological data analysis to change‐point detection are potentially very broad, in this paper, we primarily focus on integrating topological concepts with the existing nonparametric methods for change‐point detection. In particular, the proposed new geometry‐oriented approach aims to enhance detection accuracy of distributional regime shift locations. Our simulation studies suggest that integration of topological data analysis with some existing algorithms for change‐point detection leads to consistently more accurate detection results. We illustrate our new methodology in application to the two closely related environmental time series data sets—ice phenology of the Lake Baikal and the North Atlantic Oscillation indices, in a research query for a possible association between their estimated regime shift locations. 
                        more » 
                        « less   
                    
                            
                            An encoding approach for stable change point detection
                        
                    
    
            Abstract Without imposing prior distributional knowledge underlying multivariate time series of interest, we propose a nonparametric change-point detection approach to estimate the number of change points and their locations along the temporal axis. We develop a structural subsampling procedure such that the observations are encoded into multiple sequences of Bernoulli variables. A maximum likelihood approach in conjunction with a newly developed searching algorithm is implemented to detect change points on each Bernoulli process separately. Then, aggregation statistics are proposed to collectively synthesize change-point results from all individual univariate time series into consistent and stable location estimations. We also study a weighting strategy to measure the degree of relevance for different subsampled groups. Simulation studies are conducted and shown that the proposed change-point methodology for multivariate time series has favorable performance comparing with currently available state-of-the-art nonparametric methods under various settings with different degrees of complexity. Real data analyses are finally performed on categorical, ordinal, and continuous time series taken from fields of genetics, climate, and finance. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1934568
- PAR ID:
- 10555969
- Publisher / Repository:
- Springer Nature
- Date Published:
- Journal Name:
- Machine Learning
- Volume:
- 113
- Issue:
- 7
- ISSN:
- 0885-6125
- Page Range / eLocation ID:
- 4133 to 4163
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Information from frequency bands in biomedical time series provides useful summaries of the observed signal. Many existing methods consider summaries of the time series obtained over a few well-known, pre-defined frequency bands of interest. However, there is a dearth of data-driven methods for identifying frequency bands that optimally summarize frequency-domain information in the time series. A new method to identify partition points in the frequency space of a multivariate locally stationary time series is proposed. These partition points signify changes across frequencies in the time-varying behavior of the signal and provide frequency band summary measures that best preserve nonstationary dynamics of the observed series. An $$L_2$$-norm based discrepancy measure that finds differences in the time-varying spectral density matrix is constructed, and its asymptotic properties are derived. New nonparametric bootstrap tests are also provided to identify significant frequency partition points and to identify components and cross-components of the spectral matrix exhibiting changes over frequencies. Finite-sample performance of the proposed method is illustrated via simulations. The proposed method is used to develop optimal frequency band summary measures for characterizing time-varying behavior in resting-state electroencephalography time series, as well as identifying components and cross-components associated with each frequency partition point.more » « less
- 
            Change point detection is widely used for finding transitions between states of data generation within a time series. Methods for change point detection currently assume this transition is instantaneous and therefore focus on finding a single point of data to classify as a change point. However, this assumption is flawed because many time series actually display short periods of transitions between different states of data generation. Previous work has shown Bayesian Online Change Point Detection (BOCPD) to be the most effective method for change point detection on a wide range of different time series. This paper explores adapting the change point detection algorithms to detect abrupt changes over short periods of time. We design a segment-based mechanism to examine a window of data points within a time series, rather than a single data point, to determine if the window captures abrupt change. We test our segment-based Bayesian change detection algorithm on 36 different time series and compare it to the original BOCPD algorithm. Our results show that, for some of these 36 time series, the segment-based approach for detecting abrupt changes can much more accurately identify change points based on standard metrics.more » « less
- 
            Abstract We consider the problem of estimating multiple change points for a functional data process. There are numerous examples in science and finance in which the process of interest may be subject to some sudden changes in the mean. The process data that are not in a close vicinity of any change point can be analysed by the usual nonparametric smoothing methods. However, the data close to change points and contain the most pertinent information of structural breaks need to be handled with special care. This paper considers a half-kernel approach that addresses the inference of the total number, locations and jump sizes of the changes. Convergence rates and asymptotic distributional results for the proposed procedures are thoroughly investigated. Simulations are conducted to examine the performance of the approach, and a number of real data sets are analysed to provide an illustration.more » « less
- 
            Abstract Most current algorithms for multivariate time series classification tend to overlook the correlations between time series of different variables. In this research, we propose a framework that leverages Eigen-entropy along with a cumulative moving window to derive time series signatures to support the classification task. These signatures are enumerations of correlations among different time series considering the temporal nature of the dataset. To manage dataset’s dynamic nature, we employ preprocessing with dense multi scale entropy. Consequently, the proposed framework, Eigen-entropy-based Time Series Signatures, captures correlations among multivariate time series without losing its temporal and dynamic aspects. The efficacy of our algorithm is assessed using six binary datasets sourced from the University of East Anglia, in addition to a publicly available gait dataset and an institutional sepsis dataset from the Mayo Clinic. We use recall as the evaluation metric to compare our approach against baseline algorithms, including dependent dynamic time warping with 1 nearest neighbor and multivariate multi-scale permutation entropy. Our method demonstrates superior performance in terms of recall for seven out of the eight datasets.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    