skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Fast and Scalable Algorithm for Detection of Structural Breaks in Big VAR Models
Many real time series datasets exhibit structural changes over time. A popular model for capturing their temporal dependence is that of vector autoregressions (VAR), which can accommodate structural changes through time evolving transition matrices. The problem then becomes to both estimate the (unknown) number of structural break points, together with the VAR model parameters. An additional challenge emerges in the presence of very large datasets, namely on how to accomplish these two objectives in a computational efficient manner. In this article, we propose a novel procedure which leverages a block segmentation scheme (BSS) that reduces the number of model parameters to be estimated through a regularized least-square criterion. Specifically, BSS examines appropriately defined blocks of the available data, which when combined with a fused lasso-based estimation criterion, leads to significant computational gains without compromising on the statistical accuracy in identifying the number and location of the structural breaks. This procedure is further coupled with new local and exhaustive search steps to consistently estimate the number and relative location of the break points. The procedure is scalable to big high-dimensional time series datasets with a computational complexity that can achieve O(n), where n is the length of the time series (sample size), compared to an exhaustive procedure that requires steps. Extensive numerical work on synthetic data supports the theoretical findings and illustrates the attractive properties of the procedure. Finally, an application to a neuroscience dataset exhibits its usefulness in applications. Supplementary files for this article are available online.  more » « less
Award ID(s):
1830175 2124507
PAR ID:
10312308
Author(s) / Creator(s):
; ;
Publisher / Repository:
Taylor and Francis
Date Published:
Journal Name:
Journal of Computational and Graphical Statistics
ISSN:
1061-8600
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Without imposing prior distributional knowledge underlying multivariate time series of interest, we propose a nonparametric change-point detection approach to estimate the number of change points and their locations along the temporal axis. We develop a structural subsampling procedure such that the observations are encoded into multiple sequences of Bernoulli variables. A maximum likelihood approach in conjunction with a newly developed searching algorithm is implemented to detect change points on each Bernoulli process separately. Then, aggregation statistics are proposed to collectively synthesize change-point results from all individual univariate time series into consistent and stable location estimations. We also study a weighting strategy to measure the degree of relevance for different subsampled groups. Simulation studies are conducted and shown that the proposed change-point methodology for multivariate time series has favorable performance comparing with currently available state-of-the-art nonparametric methods under various settings with different degrees of complexity. Real data analyses are finally performed on categorical, ordinal, and continuous time series taken from fields of genetics, climate, and finance. 
    more » « less
  2. A factor-augmented vector autoregressive (FAVAR) model is defined by a VAR equation that captures lead-lag correlations amongst a set of observed variables X and latent factors F, and a calibration equation that relates another set of observed variables Y with F and X. The latter equation is used to estimate the factors that are subsequently used in estimating the parameters of the VAR system. The FAVAR model has become popular in applied economic research, since it can summarize a large number of variables of interest as a few factors through the calibration equation and subsequently examine their influence on core variables of primary interest through the VAR equation. However, there is increasing need for examining lead-lag relationships between a large number of time series, while incorporating information from another high-dimensional set of variables. Hence, in this paper we investigate the FAVAR model under high-dimensional scaling. We introduce an appropriate identification constraint for the model parameters, which when incorporated into the formulated optimization problem yields estimates with good statistical properties. Further, we address a number of technical challenges introduced by the fact that estimates of the VAR system model parameters are based on estimated rather than directly observed quantities. The performance of the proposed estimators is evaluated on synthetic data. Further, the model is applied to commodity prices and reveals interesting and interpretable relationships between the prices and the factors extracted from a set of global macroeconomic indicators. 
    more » « less
  3. null (Ed.)
    A factor-augmented vector autoregressive (FAVAR) model is defined by a VAR equation that captures lead-lag correlations amongst a set of observed variables X and latent factors F, and a calibration equation that relates another set of observed variables Y with F and X. The latter equation is used to estimate the factors that are subsequently used in estimating the parameters of the VAR system. The FAVAR model has become popular in applied economic research, since it can summarize a large number of variables of interest as a few factors through the calibration equation and subsequently examine their influence on core variables of primary interest through the VAR equation. However, there is increasing need for examining lead-lag relationships between a large number of time series, while incorporating information from another high-dimensional set of variables. Hence, in this paper we investigate the FAVAR model under high-dimensional scaling. We introduce an appropriate identification constraint for the model parameters, which when incorporated into the formulated optimization problem yields estimates with good statistical properties. Further, we address a number of technical challenges introduced by the fact that estimates of the VAR system model parameters are based on estimated rather than directly observed quantities. The performance of the proposed estimators is evaluated on synthetic data. Further, the model is applied to commodity prices and reveals interesting and interpretable relationships between the prices and the factors extracted from a set of global macroeconomic indicators. 
    more » « less
  4. This paper presents a generative statistical model for analyzing time series of planar shapes. Using elastic shape analysis, we separate object kinematics (rigid motions and speed variability) from morphological evolution, representing the latter through transported velocity fields (TVFs). A principal component analysis (PCA) based dimensionality reduction of the TVF representation provides a finite-dimensional Euclidean framework, enabling traditional time-series analysis. We then fit a vector auto-regressive (VAR) model to the TVF-PCA time series, capturing the statistical dynamics of shape evolution. To characterize morphological changes,we use VAR model parameters for model comparison, synthesis, and sequence classification. Leveraging these parameters, along with machine learning classifiers, we achieve high classification accuracy. Extensive experiments on cell motility data validate our approach, demonstrating its effectiveness in modeling and classifying migrating cells based on morphological evolution—marking a novel contribution to the field. 
    more » « less
  5. Statistical analysis of shape evolution during cell migration is important for gaining insights into biological processes. This paper develops a time-series model for temporal evolution of cellular shapes during cell motility. It uses elastic shape analysis to represent and analyze shapes of cell boundaries (as planar closed curves), thus separating cell shape changes from cell kinematics. Specifically, it utilizes Transported Square-Root Velocity Field (TSRVF), to map non-Euclidean shape sequences into a Euclidean time series. It then uses PCA to reduce Euclidean dimensions and imposes a Vector Auto-Regression (VAR) model on the resulting low-dimensional time series. Finally, it presents some results from VAR-based statistical analysis: estimation of model parameters and diagnostics, synthesis of new shape sequences, and predictions of future shapes given past shapes. 
    more » « less