Changes in data distribution of streaming data (i.e., concept drifts), constitute a central issue in online data mining. The main reason is that these changes are responsible for outdating stream learning models, reducing their predictive performance over time. A common approach adopted by real-time adaptive systems to deal with concept drifts is to employ detectors that indicate the best time for updates. However, an unrealistic assumption of most detectors is that the labels become available immediately after data arrives. In this paper, we introduce an unsupervised and model-independent concept drift detector suitable for high-speed and high-dimensional data streams in realistic scenarios with the scarcity of labels. We propose a straightforward two-dimensional representation of the data aiming faster processing for detection. We develop a simple adaptive drift detector on this visual representation that is efficient for fast streams with thousands of features and is accurate as existing costly methods that perform various statistical tests. Our method achieves better performance measured by execution time and accuracy in classification problems for different types of drifts, including abrupt, oscillating, and incremental. Experimental evaluation demonstrates the versatility of the method in several domains, including astronomy, entomology, public health, political science, and medical science.
Efficient unsupervised drift detector for fast and high-dimensional data streams
Stream mining considers the online arrival of examples at high speed and the possibility of changes in its descriptive features or class definitions compared with past knowledge (i.e., concept drifts). The fast detection of drifts is essential to keep the predictive model updated and stable in changing environments. For many applications, such as those related to smart sensors, the high number of features is an additional challenge in terms of memory and time for stream processing. This paper presents an unsupervised and model-independent concept drift detector suitable for high-speed and high-dimensional data streams. We propose a straightforward two-dimensional data representation that allows the faster processing of datasets with a large number of examples and dimensions. We developed an adaptive drift detector on this visual representation that is efficient for fast streams with thousands of features and is accurate as existing costly methods that perform various statistical tests considering each feature individually. Our method achieves better performance measured by execution time and accuracy in classification problems for different types of drifts. The experimental evaluation considering synthetic and real data demonstrates the method’s versatility in several domains, including entomology, medicine, and transportation systems.
- Award ID(s):
- 1757207
- Publication Date:
- NSF-PAR ID:
- 10227751
- Journal Name:
- Knowledge and Information Systems
- ISSN:
- 0219-1377
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Obeid, Iyad Selesnick (Ed.)Electroencephalography (EEG) is a popular clinical monitoring tool used for diagnosing brain-related disorders such as epilepsy [1]. As monitoring EEGs in a critical-care setting is an expensive and tedious task, there is a great interest in developing real-time EEG monitoring tools to improve patient care quality and efficiency [2]. However, clinicians require automatic seizure detection tools that provide decisions with at least 75% sensitivity and less than 1 false alarm (FA) per 24 hours [3]. Some commercial tools recently claim to reach such performance levels, including the Olympic Brainz Monitor [4] and Persyst 14 [5]. In this abstract, we describe our efforts to transform a high-performance offline seizure detection system [3] into a low latency real-time or online seizure detection system. An overview of the system is shown in Figure 1. The main difference between an online versus offline system is that an online system should always be causal and has minimum latency which is often defined by domain experts. The offline system, shown in Figure 2, uses two phases of deep learning models with postprocessing [3]. The channel-based long short term memory (LSTM) model (Phase 1 or P1) processes linear frequency cepstral coefficients (LFCC) [6] features from each EEGmore »
-
To understand the mechanism underlying the fast, reversible, phase transformation, information about the atomic structure and defects structures in phase change materials class is key. PCMs are investigated for many applications. These devices are chalcogenide based and use self heating to quickly switch between amorphous and crystalline phases, generating orders of magnitude differences in the electrical resistivity. The main challenges with PCMs have been the large power required to heat above crystallization or melting (for melt-quench amorphization) temperatures and limited reliability due to factors such as resistance drifts of the metastable phases, void formation and elemental segregation upon cycling. Characterization of devices and their unique switching behavior result in distinct material properties affected by the atomic arrangement in the respective phase. TEM is used to study the atomic structure of the metastable crystalline phase. The aim is to correlate the microstructure with results from electrical characterization, building on R vs T measurements on various thicknesses GST thin films. To monitor phase changes in real-time as a function of temperature, thin films are deposited directly onto Protochips carriers. The Protochips heating holders provides controlled temperature changes while imaging in the TEM. These studies can provide insights into how changes occur inmore »
-
Real-time decision making in emerging IoT applications typically relies on computing quantitative summaries of large data streams in an efficient and incremental manner. To simplify the task of programming the desired logic, we propose StreamQRE, which provides natural and high-level constructs for processing streaming data. Our language has a novel integration of linguistic constructs from two distinct programming paradigms: streaming extensions of relational query languages and quantitative extensions of regular expressions. The former allows the programmer to employ relational constructs to partition the input data by keys and to integrate data streams from different sources, while the latter can be used to exploit the logical hierarchy in the input stream for modular specifications. We first present the core language with a small set of combinators, formal semantics, and a decidable type system. We then show how to express a number of common patterns with illustrative examples. Our compilation algorithm translates the high-level query into a streaming algorithm with precise complexity bounds on per-item processing time and total memory footprint. We also show how to integrate approximation algorithms into our framework. We report on an implementation in Java, and evaluate it with respect to existing high-performance engines for processing streaming data.more »
-
Abstract. Free-drift estimates of sea ice motion are necessary to produce a seamless observational record combining buoy and satellite-derived sea ice motionvectors. We develop a new parameterization for the free drift of sea ice based on wind forcing, wind turning angle, sea ice state variables(thickness and concentration), and estimates of the ocean currents. Given the fact that the spatial distribution of the wind–ice–ocean transfercoefficient has a similar structure to that of the spatial distribution of sea ice thickness, we take the standard free-drift equation and introducea wind–ice–ocean transfer coefficient that scales linearly with ice thickness. Results show a mean bias error of −0.5 cm s−1(low-speed bias) and a root-mean-square error of 5.1 cm s−1, considering daily buoy drift data as truth. This represents a 35 %reduction of the error on drift speed compared to the free-drift estimates used in the Polar Pathfinder dataset (Tschudi et al., 2019b). Thethickness-dependent transfer coefficient provides an improved seasonality and long-term trend of the sea ice drift speed, with a minimum (maximum)drift speed in May (October), compared to July (January) for the constant transfer coefficient parameterizations which simply follow the peak inmean surface wind stresses. Over the 1979–2019 period, the trend in sea ice drift in this new model is +0.45 cm s−1 permore »