skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: Matrix Profile XVIII: Time Series Mining in the Face of Fast Moving Streams using a Learned Approximate Matrix Profile
Award ID(s):
1763795
NSF-PAR ID:
10170617
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
International Conference on Data Mining
Page Range / eLocation ID:
936 to 945
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Discovery and classification of motifs (repeated patterns) and discords (anomalies) in time series is fundamental to many scientific fields. These and related problems have effectively been solved for offline analysis of time series; however, these approaches are computationally intensive and do not lend themselves to streaming time series, such as those produced by IoT sensors, where the sampling rate imposes real-time constraints on computation and there is strong desire to locate computation as close as possible to the sensor. One promising solution is to use low-cost machine learning models to provide approximate answers to these problems. For example, prior work has trained models to predict the similarity of the most recently sampled window of data points to the time series used for training. This work addresses a more challenging problem, which is to predict not only the “strength” of the match, but also the relative location in the representative time series where the strongest matching subsequences occur. We evaluate our approach on two different real world datasets; we demonstrate speedups as high as about 30x compared to exact computations, with predictive accuracy as high as 87.95%, depending on the granularity of the prediction. 
    more » « less
  2. Abstract

    Template matching has proven to be an effective method for seismic event detection, but is biased toward identifying events similar to previously known events, and thus is ineffective at discovering events with non‐matching waveforms (e.g., those dissimilar to existing catalog events). In principle, this limitation can be overcome by cross‐correlating every segment (possible template) of a seismogram with every other segment to identify all similar event pairs, but doing so has been previously considered computationally infeasible for long time series. Here we describe a method, called the ‘Matrix Profile’ (MP), a “correlate everything with everything” calculation that can be efficiently and scalably computed. The MP returns the maximum value of the correlation coefficient of every sub‐window of continuous data with every other sub‐window, as well as the best‐correlated sub‐window location. Here we show how MP methods can obtain valuable results when applied to months and years of continuous seismic data in both local and global case studies. We find that the MP can identify many new events in Parkfield, California seismicity that are not contained in existing event catalogs and that it can efficiently find clusters of similar earthquakes in global seismic data. Either used by itself, or as a starting point for subsequent template matching calculations, the MP is likely to provide a useful new tool for seismology research.

     
    more » « less