skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: TWIN-ADAPT: Continuous Learning for Digital Twin-Enabled Online Anomaly Classification in IoT-Driven Smart Labs
In the rapidly evolving landscape of scientific semiconductor laboratories (commonly known as, cleanrooms), integrated with Internet of Things (IoT) technology and Cyber-Physical Systems (CPSs), several factors including operational changes, sensor aging, software updates and the introduction of new processes or equipment can lead to dynamic and non-stationary data distributions in evolving data streams. This phenomenon, known as concept drift, poses a substantial challenge for traditional data-driven digital twin static machine learning (ML) models for anomaly detection and classification. Subsequently, the drift in normal and anomalous data distributions over time causes the model performance to decay, resulting in high false alarm rates and missed anomalies. To address this issue, we present TWIN-ADAPT, a continuous learning model within a digital twin framework designed to dynamically update and optimize its anomaly classification algorithm in response to changing data conditions. This model is evaluated against state-of-the-art concept drift adaptation models and tested under simulated drift scenarios using diverse noise distributions to mimic real-world distribution shift in anomalies. TWIN-ADAPT is applied to three critical CPS datasets of Smart Manufacturing Labs (also known as “Cleanrooms”): Fumehood, Lithography Unit and Vacuum Pump. The evaluation results demonstrate that TWIN-ADAPT’s continual learning model for optimized and adaptive anomaly classification achieves a high accuracy and F1 score of 96.97% and 0.97, respectively, on the Fumehood CPS dataset, showing an average performance improvement of 0.57% over the offline model. For the Lithography and Vacuum Pump datasets, TWIN-ADAPT achieves an average accuracy of 69.26% and 71.92%, respectively, with performance improvements of 75.60% and 10.42% over the offline model. These significant improvements highlight the efficacy of TWIN-ADAPT’s adaptive capabilities. Additionally, TWIN-ADAPT shows a very competitive performance when compared with other benchmark drift adaptation algorithms. This performance demonstrates TWIN-ADAPT’s robustness across different modalities and datasets, confirming its suitability for any IoT-driven CPS framework managing diverse data distributions in real time streams. Its adaptability and effectiveness make it a versatile tool for dynamic industrial settings.  more » « less
Award ID(s):
2126246
PAR ID:
10541917
Author(s) / Creator(s):
; ; ;
Corporate Creator(s):
Editor(s):
Lu, Xin; Wang, Wei; Wu, Dehao; Li, Xiaoxia
Publisher / Repository:
MDPI
Date Published:
Journal Name:
Future Internet
Edition / Version:
1
Volume:
16
Issue:
7
ISSN:
1999-5903
Page Range / eLocation ID:
239-274
Subject(s) / Keyword(s):
internet of things (IoT) cyber-physical systems (CPSs) concept drift digital twins online anomaly classification scientific laboratories cleanrooms
Format(s):
Medium: X Size: 7MB Other: pdf
Size(s):
7MB
Sponsoring Org:
National Science Foundation
More Like this
  1. Sensory IoT (Internet of Things) networks are widely applied and studied in recent years and have demonstrated their unique benefits in various areas. In this paper, we bring the sensor network to an application scenario that has rarely been studied - the academic cleanrooms. We design SENSELET++, a low-cost IoT sensing platform that can collect, manage and analyze a large amount of sensory data from heterogeneous sensors. Furthermore, we design a novel hybrid anomaly detection framework which can detect both time-critical and complex non-critical anomalies. We validate SENSELET++ through the deployment of the sensing platform in a lithography cleanroom. Our results show the scalability, flexibility, and reliability properties of the system design. Also, using real-world sensory data collected by SENSELET++, our system can analyze data streams in real-time and detect shape and trend anomalies with a 91% true positive rate. 
    more » « less
  2. With recent technological advances in sensor nodes, IoT enabled applications have great potential in many domains. However, sensing data may be inaccurate due to not only faults or failures in the sensor and network but also the limited resources and transmission capability available in sensor nodes. In this paper, we first model streams of IoT data as a handful of sampled data in the transformed domain while assuming the information attained by those sampled data reveal different sparsity profiles between normal and abnormal. We then present a novel approach called AD2 (Anomaly Detection using Approximated Data) that applies a transformation on the original data, samples top k-dominant components, and detects data anomalies based on the disparity in k values. To demonstrate the effectiveness of AD2 , we use IoT datasets (temperature, humidity, and CO) collected from real-world wireless sensor nodes. Our experimental evaluation demonstrates that AD2 can approximate and successfully detect 64%-94% of anomalies using only 1.9% of the original data and minimize false positive rates, which would otherwise require the entire dataset to achieve the same level of accuracy. 
    more » « less
  3. null (Ed.)
    Edge devices with attentive sensors enable various intelligent services by exploring streams of sensor data. However, anomalies, which are inevitable due to faults or failures in the sensor and network, can result in incorrect or unwanted operational decisions. While promptly ensuring the accuracy of IoT data is critical, lack of labels for live sensor data and limited storage resources necessitates efficient and reliable detection of anomalies at edge nodes. Motivated by the existence of unique sparsity profiles that express original signals as a combination of a few coefficients between normal and abnormal sensing periods, we propose a novel anomaly detection approach, called ADSP (Anomaly Detection with Sparsity Profile). The key idea is to apply a transformation on the raw data, identify top-K dominant components that represent normal data behaviors, and detect data anomalies based on the disparity from K values approximating the periods of normal data in an unsupervised manner. Our evaluation using a set of synthetic datasets demonstrates that ADSP can achieve 92%–100% of detection accuracy. To validate our anomaly detection approach on real-world cases, we label potential anomalies using a range of error boundary conditions using sensors exhibiting a straight line in Q-Q plot and strong Pearson correlation and conduct a controlled comparison of the detection accuracy. Our experimental evaluation using real-world datasets demonstrates that ADSP can detect 83%– 92% of anomalies using only 1.7% of the original data, which is comparable to the accuracy achieved by using the entire datasets. 
    more » « less
  4. Stream clustering is an important data mining technique to capture the evolving patterns in real-time data streams. Today’s data streams, e.g., IoT events and Web clicks, are usually high-speed and contain dynamically-changing patterns. Existing stream clustering algorithms usually follow an online-offline paradigm with a one-record-at-a-time update model, which was designed for running in a single machine. These stream clustering algorithms, with this sequential update model, cannot be efficiently parallelized and fail to deliver the required high throughput for stream clustering. In this paper, we present DistStream, a distributed framework that can effectively scale out online-offline stream clustering algorithms. To parallelize these algorithms for high throughput, we develop a mini-batch update model with efficient parallelization approaches. To maintain high clustering quality, DistStream’s mini-batch update model preserves the update order in all the computation steps during parallel execution, which can reflect the recent changes for dynamically-changing streaming data. We implement DistStream atop Spark Streaming, as well as four representative stream clustering algorithms based on DistStream. Our evaluation on three real-world datasets shows that DistStream-based stream clustering algorithms can achieve sublinear throughput gain and comparable (99%) clustering quality with their single-machine counterparts. 
    more » « less
  5. Online Anomaly Detection (OAD) is critical for identifying rare yet important data points in large, dynamic, and complex data streams. A key challenge lies in achieving accurate and consistent detection of anomalies while maintaining computational and memory efficiency. Conventional OAD approaches, which depend on distributional deviations and static thresholds, struggle with model update delays and catastrophic forgetting, leading to missed detections and high false positive rates. To address these limitations, we propose a novel Streaming Anomaly Detection (SAD) method, grounded in a sparse active online learning framework. Our approach uniquely integrates ℓ1,2-norm sparse online learning with CUR decomposition-based active learning, enabling simultaneous fast feature selection and dynamic instance selection. The efficient CUR decomposition further supports real-time residual analysis for anomaly scoring, eliminating the need for manual threshold settings about temporal data distributions. Extensive experiments on diverse streaming datasets demonstrate SAD's superiority, achieving a 14.06% reduction in detection error rates compared to five state-of-the-art competitors. 
    more » « less