skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Harnessing the power of topological data analysis to detect change points
Abstract We introduce a novel geometry‐oriented methodology, based on the emerging tools of topological data analysis, into the change‐point detection framework. The key rationale is that change points are likely to be associated with changes in geometry behind the data‐generating process. While the applications of topological data analysis to change‐point detection are potentially very broad, in this paper, we primarily focus on integrating topological concepts with the existing nonparametric methods for change‐point detection. In particular, the proposed new geometry‐oriented approach aims to enhance detection accuracy of distributional regime shift locations. Our simulation studies suggest that integration of topological data analysis with some existing algorithms for change‐point detection leads to consistently more accurate detection results. We illustrate our new methodology in application to the two closely related environmental time series data sets—ice phenology of the Lake Baikal and the North Atlantic Oscillation indices, in a research query for a possible association between their estimated regime shift locations.  more » « less
Award ID(s):
1925346 1824716 1736368
PAR ID:
10127861
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Environmetrics
Volume:
31
Issue:
1
ISSN:
1180-4009
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Recent proliferation of cryptocurrencies that allow for pseudo-anonymous transactions has resulted in a spike of various e-crime activities and, particularly, cryptocurrency payments in hacking attacks demanding ransom by encrypting sensitive user data. Currently, most hackers use Bitcoin for payments, and existing ransomware detection tools depend only on a couple of heuristics and/or tedious data gathering steps. By capitalizing on the recent advances in Topological Data Analysis, we propose a novel efficient and tractable framework to automatically predict new ransomware transactions in a ransomware family, given only limited records of past transactions. Moreover, our new methodology exhibits high utility to detect emergence of new ransomware families, that is, detecting ransomware with no past records of transactions. 
    more » « less
  2. Abstract Change‐point detection studies the problem of detecting the changes in the underlying distribution of the data stream as soon as possible after the change happens. Modern large‐scale, high‐dimensional, and complex streaming data call for computationally (memory) efficient sequential change‐point detection algorithms that are also statistically powerful. This gives rise to a computation versus statistical power trade‐off, an aspect less emphasized in the past in classic literature. This tutorial takes this new perspective and reviews several sequential change‐point detection procedures, ranging from classic sequential change‐point detection algorithms to more recent non‐parametric procedures that consider computation, memory efficiency, and model robustness in the algorithm design. Our survey also contains classic performance analysis, which provides useful techniques for analyzing new procedures. This article is categorized under:Statistical Models > Time Series ModelsAlgorithms and Computational Methods > AlgorithmsData: Types and Structure > Time Series, Stochastic Processes, and Functional Data 
    more » « less
  3. Abstract Without imposing prior distributional knowledge underlying multivariate time series of interest, we propose a nonparametric change-point detection approach to estimate the number of change points and their locations along the temporal axis. We develop a structural subsampling procedure such that the observations are encoded into multiple sequences of Bernoulli variables. A maximum likelihood approach in conjunction with a newly developed searching algorithm is implemented to detect change points on each Bernoulli process separately. Then, aggregation statistics are proposed to collectively synthesize change-point results from all individual univariate time series into consistent and stable location estimations. We also study a weighting strategy to measure the degree of relevance for different subsampled groups. Simulation studies are conducted and shown that the proposed change-point methodology for multivariate time series has favorable performance comparing with currently available state-of-the-art nonparametric methods under various settings with different degrees of complexity. Real data analyses are finally performed on categorical, ordinal, and continuous time series taken from fields of genetics, climate, and finance. 
    more » « less
  4. We consider the problem of constructing confidence intervals for the locations of change points in a high-dimensional mean shift model. We develop a locally refitted least squares estimator and obtain component-wise and simultaneous rates of estimation of change points. The simultaneous rate is the sharpest available by at least a factor of log p, while the component-wise one is optimal. These results enable existence of limiting distributions for the locations of the change points. Subsequently, component-wise distributions are characterized under both vanishing and non-vanishing jump size regimes, while joint distributions of change point estimates are characterized under the latter regime, which also yields asymptotic independence of these estimates. We provide the relationship between these distributions, which allows construction of regime adaptive confidence intervals. All results are established under a high dimensional scaling, in the presence of diverging number of change points. They are illustrated on synthetic data and on sensor measurements from smartphones for activity recognition. 
    more » « less
  5. M. Ranzato; A. Beygelzimer; Y. Dauphin; P.S. Liang; J. Wortman Vaughan (Ed.)
    The null space of the k-th order Laplacian Lk, known as the {\em k-th homology vector space}, encodes the non-trivial topology of a manifold or a network. Understanding the structure of the homology embedding can thus disclose geometric or topological information from the data. The study of the null space embedding of the graph Laplacian L0 has spurred new research and applications, such as spectral clustering algorithms with theoretical guarantees and estimators of the Stochastic Block Model. In this work, we investigate the geometry of the k-th homology embedding and focus on cases reminiscent of spectral clustering. Namely, we analyze the {\em connected sum} of manifolds as a perturbation to the direct sum of their homology embeddings. We propose an algorithm to factorize the homology embedding into subspaces corresponding to a manifold's simplest topological components. The proposed framework is applied to the {\em shortest homologous loop detection} problem, a problem known to be NP-hard in general. Our spectral loop detection algorithm scales better than existing methods and is effective on diverse data such as point clouds and images. 
    more » « less