skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on August 11, 2026

Title: VISTA-SSM: Varying and irregular sampling time-series analysis via state-space models.
We introduce VISTA, a clustering approach for multivariate and irregularly sampled time series based on a parametric state space mixture model. VISTA is specifically designed for the unsupervised identification of groups in datasets originating from healthcare and psychology where such sampling issues are commonplace. Our approach adapts linear Gaussian state space models (LGSSMs) to provide a flexible parametric framework for fitting a wide range of time series dynamics. The clustering approach itself is based on the assumption that the population can be represented as a mixture of a fixed number of LGSSMs. VISTA’s model formulation allows for an explicit derivation of the log-likelihood function, from which we develop an expectation-maximization scheme for fitting model parameters to the observed data samples. Our algorithmic implementation is designed to handle populations of multivariate time series that can exhibit large changes in sampling rate as well as irregular sampling. We evaluate the versatility and accuracy of our approach on simulated and real-world datasets, including demographic trends, wearable sensor data, epidemiological time series, and ecological momentary assessments. Our results indicate that VISTA outperforms most comparable standard times series clustering methods. We provide an open-source implementation of VISTA in Python.  more » « less
Award ID(s):
2402555
PAR ID:
10645712
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
American Psychological Association
Date Published:
Journal Name:
Psychological Methods
ISSN:
1082-989X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. With the booming of online service systems, anomaly detection on multivariate time series, such as a combination of CPU utilization, average response time, and requests per second, is important for system reliability. Although a collection of learning-based approaches have been designed for this purpose, our empirical study shows that these approaches suffer from long initialization time for sufficient training data. In this paper, we introduce the Compressed Sensing technique to multivariate time series anomaly detection for rapid initialization. To build a jump-starting anomaly detector, we propose an approach named JumpStarter. Based on domainspecific insights, we design a shape-based clustering algorithm as well as an outlier-resistant sampling algorithm for JumpStarter.With real-world multivariate time series datasets collected from two Internet companies, our results show that JumpStarter achieves an average F1 score of 94.12%, significantly outperforming the state-of-the-art anomaly detection algorithms, with a much shorter initialization time of twenty minutes. We have applied JumpStarter in online service systems and gained useful lessons in real-world scenarios. 
    more » « less
  2. We study the problem of learning a mixture model of non-parametric product distributions. The problem of learning a mixture model is that of finding the component distributions along with the mixing weights using observed samples generated from the mixture. The problem is well-studied in the parametric setting, i.e., when the component distributions are members of a parametric family - such as Gaussian distributions. In this work, we focus on multivariate mixtures of non-parametric product distributions and propose a two-stage approach which recovers the component distributions of the mixture under a smoothness condition. Our approach builds upon the identifiability properties of the canonical polyadic (low-rank) decomposition of tensors, in tandem with Fourier and Shannon-Nyquist sampling staples from signal processing. We demonstrate the effectiveness of the approach on synthetic and real datasets. 
    more » « less
  3. We propose a statistical framework for clustering multiple time series that exhibit nonlinear dynamics into an a-priori-unknown number of sub-groups that each comprise time series with similar dynamics. Our motivation comes from neuroscience where an important problem is to identify, within a large assembly of neurons, sub-groups that respond similarly to a stimulus or contingency. In the neural setting, conditioned on cluster membership and the parameters governing the dynamics, time series within a cluster are assumed independent and generated according to a nonlinear binomial state-space model. We derive a Metropolis-within-Gibbs algorithm for full Bayesian inference that alternates between sampling of cluster membership and sampling of parameters of interest. The Metropolis step is a PMMH iteration that requires an unbiased, low variance estimate of the likelihood function of a nonlinear state- space model. We leverage recent results on controlled sequential Monte Carlo to estimate likelihood functions more efficiently compared to the bootstrap particle filter. We apply the framework to time series acquired from the prefrontal cortex of mice in an experiment designed to characterize the neural underpinnings of fear. 
    more » « less
  4. Multivariate time series anomaly detection has become an active area of research in recent years, with Deep Learning models outperforming previous approaches on benchmark datasets. Among reconstruction-based models, most previous work has focused on Variational Autoencoders and Generative Adversarial Networks. This work presents DGHL, a new family of generative models for time series anomaly detection, trained by maximizing the observed likelihood by posterior sampling and alternating back-propagation. A top-down Convolution Network maps a novel hierarchical latent space to time series windows, exploiting temporal dynamics to encode information efficiently. Despite relying on posterior sampling, it is computationally more efficient than current approaches, with up to 10x shorter training times than RNN based models. Our method outperformed current state-of-the-art models on four popular benchmark datasets. Finally, DGHL is robust to variable features between entities and accurate even with large proportions of missing values, settings with increasing relevance with the advent of IoT. We demonstrate the superior robustness of DGHL with novel occlusion experiments in this literature. Our code is available at https://github. com/cchallu/dghl. 
    more » « less
  5. Abstract The foundation of Empirical dynamic modeling (EDM) is in representing time-series data as the trajectory of a dynamic system in a multidimensional state space rather than as a collection of traces of individual variables changing through time. Takens’s theorem provides a rigorous basis for adopting this state-space view of time-series data even from just a single time series, but there is considerable additional value to building out a state space with explicit covariates. Multivariate EDM case studies to-date, however, generally rely on building up understanding first from univariate to multivariate and use lag-coordinate embeddings for critical steps along the path of analysis. Here, we propose an alternative set of steps for multivariate EDM analysis when the traditional roadmap is not practicable. The general approach borrows ideas of random data projection from compressed sensing, but additional justification is described within the framework of Takens’s theorem. We then detail algorithms that implement this alternative method and validate through application to simulated model data. The model demonstrations are constructed to explicitly demonstrate the possibility for this approach to extend EDM application from time-series trajectories to effectively realizations of the underlying vector field, i.e. data sets that measure change over time with very short formal time series but are otherwise “big” in terms of number of variables and samples. 
    more » « less