Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
Monitoring systems have hundreds or thousands of distributed sensors gathering and transmitting real-time streaming data. The early detection of events in these systems, such as an earthquake in a seismic monitoring system, is the base for essential tasks as warning generations. To detect such events is usual to compute pairwise correlation across the disparate signals generated by the sensors. Since the data sources (e.g., sensors) are spatially separated, it is essential to consider the lagged correlation between the signals. Besides, many applications require to process a specific band of frequencies depending on the event’s type, demanding a pre-processing step of filtering before computing correlations. Due to the high speed of data generation and a large number of sensors in these systems, the operations of filtering and lagged cross-correlation need to be efficient to provide real-time responses without data losses. This article proposes a technique named FilCorr that efficiently computes both operations in one single step. We achieve an order of magnitude speedup by maintaining frequency transforms over sliding windows. Our method is exact, devoid of sensitive parameters, and easily parallelizable. Besides our algorithm, we also provide a publicly available real-time system named Seisviz that employs FilCorr in its core mechanismmore »Free, publicly-accessible full text available October 31, 2023
This paper introduces a new pattern mining task that considers aligning or joining a set of time series based on an arbitrary number of subsequences (i.e., patterns) with arbitrary lengths. Joining multiple time series along common patterns can be pivotal in clustering and summarizing large time series datasets. An exact algorithm to join hundreds of time series based on multi-length patterns is impractical due to the high computational costs. This paper proposes a fast algorithm named MultiPAL to join multiple time series at interactive speed to summarize large time series datasets. The algorithm exploits Matrix Profiles of the individual time series to enable a greedy search over possible joins. The algorithm is orders of magnitude faster than the exact solution and can utilize hundreds of Matrix Profiles. We evaluate our algorithm for sequential mining on data from various real-world domains, including power management and bioacoustics monitoring.
Stream mining considers the online arrival of examples at high speed and the possibility of changes in its descriptive features or class definitions compared with past knowledge (i.e., concept drifts). The fast detection of drifts is essential to keep the predictive model updated and stable in changing environments. For many applications, such as those related to smart sensors, the high number of features is an additional challenge in terms of memory and time for stream processing. This paper presents an unsupervised and model-independent concept drift detector suitable for high-speed and high-dimensional data streams. We propose a straightforward two-dimensional data representation that allows the faster processing of datasets with a large number of examples and dimensions. We developed an adaptive drift detector on this visual representation that is efficient for fast streams with thousands of features and is accurate as existing costly methods that perform various statistical tests considering each feature individually. Our method achieves better performance measured by execution time and accuracy in classification problems for different types of drifts. The experimental evaluation considering synthetic and real data demonstrates the method’s versatility in several domains, including entomology, medicine, and transportation systems.
Changes in data distribution of streaming data (i.e., concept drifts), constitute a central issue in online data mining. The main reason is that these changes are responsible for outdating stream learning models, reducing their predictive performance over time. A common approach adopted by real-time adaptive systems to deal with concept drifts is to employ detectors that indicate the best time for updates. However, an unrealistic assumption of most detectors is that the labels become available immediately after data arrives. In this paper, we introduce an unsupervised and model-independent concept drift detector suitable for high-speed and high-dimensional data streams in realistic scenarios with the scarcity of labels. We propose a straightforward two-dimensional representation of the data aiming faster processing for detection. We develop a simple adaptive drift detector on this visual representation that is efficient for fast streams with thousands of features and is accurate as existing costly methods that perform various statistical tests. Our method achieves better performance measured by execution time and accuracy in classification problems for different types of drifts, including abrupt, oscillating, and incremental. Experimental evaluation demonstrates the versatility of the method in several domains, including astronomy, entomology, public health, political science, and medical science.