skip to main content


Title: Robust Multi-Variate Temporal Features of Multi-Variate Time Series

Many applications generate and/or consume multi-variate temporal data, and experts often lack the means to adequately and systematically search for and interpret multi-variate observations. In this article, we first observe that multi-variate time series often carry localized multi-variate temporal features that are robust against noise. We then argue that these multi-variate temporal features can be extracted by simultaneously considering, at multiple scales, temporal characteristics of the time seriesalong with external knowledge, including variate relationships that are known a priori. Relying on these observations, we develop data models and algorithms to detectrobust multi-variate temporal(RMT) features that can be indexed for efficient and accurate retrieval and can be used for supporting data exploration and analysis tasks. Experiments confirm that the proposed RMT algorithm is highly effective and efficient in identifyingrobustmulti-scale temporal features of multi-variate time series.

 
more » « less
Award ID(s):
1633381
NSF-PAR ID:
10482649
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
ACM Transactions on Multimedia Computing, Communications, and Applications
Volume:
14
Issue:
1
ISSN:
1551-6857
Page Range / eLocation ID:
1 to 24
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Real-world applications often involve irregular time series, for which the time intervals between successive observations are non-uniform. Irregularity across multiple features in a multi-variate time series further results in a different subset of features at any given time (i.e., asynchronicity). Existing pre-training schemes for time-series, however, often assume regularity of time series and make no special treatment of irregularity. We argue that such irregularity offers insight about domain property of the data—for example, frequency of hospital visits may signal patient health condition—that can guide representation learning. In this work, we propose PrimeNet to learn a self-supervised representation for irregular multivariate time-series. Specifically, we design a timesensitive contrastive learning and data reconstruction task to pre-train a model. Irregular time-series exhibits considerable variations in sampling density over time. Hence, our triplet generation strategy follows the density of the original data points, preserving its native irregularity. Moreover, the sampling density variation over time makes data reconstruction difficult for different regions. Therefore, we design a data masking technique that always masks a constant time duration to accommodate reconstruction for regions of different sampling density. We learn with these tasks using unlabeled data to build a pre-trained model and fine-tune on a downstream task with limited labeled data, in contrast with existing fully supervised approach for irregular time-series, requiring large amounts of labeled data. Experiment results show that PrimeNet significantly outperforms state-of-the-art methods on naturally irregular and asynchronous data from Healthcare and IoT applications for several downstream tasks, including classification, interpolation, and regression. 
    more » « less
  2. Summary

    Marine microbes often show a high degree of physiological or ecological diversity below the species level. This microdiversity raises questions about the processes that drive diversification and permit coexistence of diverse yet closely related marine microbes, especially given the theoretical efficiency of competitive exclusion. Here, we provide insight with an 8‐year time series of diversity withinSynechococcus, a widespread and important marine picophytoplankter. The population ofSynechococcuson the Northeast U.S. Shelf is comprised of six main types, each of which displays a distinct and consistent seasonal pattern. With compositional data analysis, we show that these patterns can be reproduced with a simple model that couples differential responses to temperature and light with the seasonal cycle of the physical environment. These observations support the hypothesis that temporal variability in environmental factors can maintain microdiversity in marine microbial populations. We also identify how seasonal diversity patterns directly determine overarchingSynechococcuspopulation abundance features.

     
    more » « less
  3. null (Ed.)

    Real-world spatio-temporal data is often incomplete or inaccurate due to various data loading delays. For example, a location-disease-time tensor of case counts can have multiple delayed updates of recent temporal slices for some locations or diseases. Recovering such missing or noisy (under-reported) elements of the input tensor can be viewed as a generalized tensor completion problem. Existing tensor completion methods usually assume that i) missing elements are randomly distributed and ii) noise for each tensor element is i.i.d. zero-mean. Both assumptions can be violated for spatio-temporal tensor data. We often observe multiple versions of the input tensor with different under-reporting noise levels. The amount of noise can be time- or location-dependent as more updates are progressively introduced to the tensor. We model such dynamic data as a multi-version tensor with an extra tensor mode capturing the data updates. We propose a low-rank tensor model to predict the updates over time. We demonstrate that our method can accurately predict the ground-truth values of many real-world tensors. We obtain up to 27.2% lower root mean-squared-error compared to the best baseline method. Finally, we extend our method to track the tensor data over time, leading to significant computational savings.

     
    more » « less
  4. null (Ed.)
    Abstract. Satellite remote sensing provides a global view to processes on Earth that has unique benefits compared to making measurements on the ground, such as global coverage and enormous data volume. The typical downsides are spatial and temporal gaps and potentially low data quality. Meaningful statistical inference from such data requires overcoming these problems and developing efficient and robust computational tools.We design and implement a computationally efficient multi-scale Gaussian process (GP) software package, satGP, geared towards remote sensing applications. The software is able to handle problems of enormous sizes and to compute marginals and sample from the random field conditioning on at least hundreds of millions of observations. This is achieved by optimizing the computation by, e.g., randomization and splitting the problem into parallel local subproblems which aggressively discard uninformative data. We describe the mean function of the Gaussian process by approximating marginals of a Markov random field (MRF). Variability around the mean is modeled with a multi-scale covariance kernel, which consists of Matérn, exponential, and periodic components. We also demonstrate how winds can be used to inform covariances locally.The covariance kernel parameters are learned by calculating an approximate marginal maximum likelihood estimate, and the validity of both the multi-scale approach and the method used to learn the kernel parameters is verified in synthetic experiments. We apply these techniques to a moderate size ozone data set produced by an atmospheric chemistry model and to the very large number of observations retrieved from the Orbiting Carbon Observatory 2 (OCO-2) satellite. The satGP software is released under an open-source license. 
    more » « less
  5. Madarshahian, Ramin ; Hemez, Francois (Ed.)
    Validation of state observers for high-rate structural health monitoring requires the testing of state observers on a large library of pre-recorded signals, both uni- and multi-variate. However, experimental testing of high-value structures can be cost and time prohibitive. While finite element modeling can generate additional datasets, it lacks the fidelity to reproduce the non-stationarities present in the signal, particularly at the higher end of the digitized signal's frequency band. In this preliminary work, generative adversarial networks are investigated for the synthesis of uni- and multi-variate acceleration signals for an electronics package under shock. Generative adversarial networks are a class of deep learning approach that learns to generate new data that is statistically similar to the original data but not identical and thus augmenting the data diversity and balance. This chapter presents a methodology for synthesizing statistically indistinguishable time-series data for a structure under shock. Results show that generative adversarial networks are capable of producing material reminiscent of that obtained through experimental testing. The generated data is compared statistically to experimental data, and the accuracy, diversity, and limitations of the method are discussed. 
    more » « less