skip to main content


Title: AD 2 : Improving Quality of IoT Data through Compressive Anomaly Detection
With recent technological advances in sensor nodes, IoT enabled applications have great potential in many domains. However, sensing data may be inaccurate due to not only faults or failures in the sensor and network but also the limited resources and transmission capability available in sensor nodes. In this paper, we first model streams of IoT data as a handful of sampled data in the transformed domain while assuming the information attained by those sampled data reveal different sparsity profiles between normal and abnormal. We then present a novel approach called AD2 (Anomaly Detection using Approximated Data) that applies a transformation on the original data, samples top k-dominant components, and detects data anomalies based on the disparity in k values. To demonstrate the effectiveness of AD2 , we use IoT datasets (temperature, humidity, and CO) collected from real-world wireless sensor nodes. Our experimental evaluation demonstrates that AD2 can approximate and successfully detect 64%-94% of anomalies using only 1.9% of the original data and minimize false positive rates, which would otherwise require the entire dataset to achieve the same level of accuracy.  more » « less
Award ID(s):
1751143
NSF-PAR ID:
10136594
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE International Conference on Big Data (Big Data)
Page Range / eLocation ID:
1662 to 1668
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Edge devices with attentive sensors enable various intelligent services by exploring streams of sensor data. However, anomalies, which are inevitable due to faults or failures in the sensor and network, can result in incorrect or unwanted operational decisions. While promptly ensuring the accuracy of IoT data is critical, lack of labels for live sensor data and limited storage resources necessitates efficient and reliable detection of anomalies at edge nodes. Motivated by the existence of unique sparsity profiles that express original signals as a combination of a few coefficients between normal and abnormal sensing periods, we propose a novel anomaly detection approach, called ADSP (Anomaly Detection with Sparsity Profile). The key idea is to apply a transformation on the raw data, identify top-K dominant components that represent normal data behaviors, and detect data anomalies based on the disparity from K values approximating the periods of normal data in an unsupervised manner. Our evaluation using a set of synthetic datasets demonstrates that ADSP can achieve 92%–100% of detection accuracy. To validate our anomaly detection approach on real-world cases, we label potential anomalies using a range of error boundary conditions using sensors exhibiting a straight line in Q-Q plot and strong Pearson correlation and conduct a controlled comparison of the detection accuracy. Our experimental evaluation using real-world datasets demonstrates that ADSP can detect 83%– 92% of anomalies using only 1.7% of the original data, which is comparable to the accuracy achieved by using the entire datasets. 
    more » « less
  2. null (Ed.)
    There is an increasing demand for performing machine learning tasks, such as human activity recognition (HAR) on emerging ultra-low-power internet of things (IoT) platforms. Recent works show substantial efficiency boosts from performing inference tasks directly on the IoT nodes rather than merely transmitting raw sensor data. However, the computation and power demands of deep neural network (DNN) based inference pose significant challenges when executed on the nodes of an energy-harvesting wireless sensor network (EH-WSN). Moreover, managing inferences requiring responses from multiple energy-harvesting nodes imposes challenges at the system level in addition to the constraints at each node. This paper presents a novel scheduling policy along with an adaptive ensemble learner to efficiently perform HAR on a distributed energy-harvesting body area network. Our proposed policy, Origin, strategically ensures efficient and accurate individual inference execution at each sensor node by using a novel activity-aware scheduling approach. It also leverages the continuous nature of human activity when coordinating and aggregating results from all the sensor nodes to improve final classification accuracy. Further, Origin proposes an adaptive ensemble learner to personalize the optimizations based on each individual user. Experimental results using two different HAR data-sets show Origin, while running on harvested energy, to be at least 2.5% more accurate than a classical battery-powered energy aware HAR classifier continuously operating at the same average power. 
    more » « less
  3. null (Ed.)
    The exponential growth of IoT end devices creates the necessity for cost-effective solutions to further increase the capacity of IEEE802.15.4g-based wireless sensor networks (WSNs). For this reason, the authors present a wireless sensor network concentrator (WSNC) that integrates multiple collocated collectors, each of them hosting an independent WSN on a unique frequency channel. A load balancing algorithm is implemented at the WSNC to uniformly distribute the number of aggregated sensor nodes across the available collectors. The WSNC is implemented using a BeagleBone board acting as the Network Concentrator (NC) whereas collectors and sensor nodes realizing the WSNs are built using the TI CC13X0 LaunchPads. The system is assessed using a testbed consisting of one NC with up to four collocated collectors and fifty sensor nodes. The performance evaluation is carried out under race conditions in the WSNs to emulate high dense networks with different network sizes and channel gaps. The experimental results show that the multicollector system with load balancing proportionally scales the capacity of the network, increases the packet delivery ratio, and reduces the energy consumption of the IoT end devices. 
    more » « less
  4. null (Ed.)
    We focus on sensor networks that are deployed in challenging environments, wherein sensors do not always have connected paths to a base station, and propose a new data resilience problem. We refer to it as DRE2: data resiliency in extreme environments. As there are no connected paths between sensors and the base station, the goal of DRE2 is to maximize data resilience by preserving the overflow data inside the network for maximum amount of time, considering that sensor nodes have limited storage capacity and unreplenishable battery power. We propose a quadratic programming-based algorithm to solve DRE2 optimally. As quadratic programming is NP-hard thus not scalable, we design two time efficient heuristics based on different network metrics. We show via extensive experiments that all algorithms can achieve high data resilience, while a minimum cost flow-based is most energy-efficient. Our algorithms tolerate node failures and network partitions caused by energy depletion of sensor nodes. Underlying our algorithms are flow networks that generalize the edge capacity constraint well-accepted in traditional network flow theory. 
    more » « less
  5. Sensory IoT (Internet of Things) networks are widely applied and studied in recent years and have demonstrated their unique benefits in various areas. In this paper, we bring the sensor network to an application scenario that has rarely been studied - the academic cleanrooms. We design SENSELET++, a low-cost IoT sensing platform that can collect, manage and analyze a large amount of sensory data from heterogeneous sensors. Furthermore, we design a novel hybrid anomaly detection framework which can detect both time-critical and complex non-critical anomalies. We validate SENSELET++ through the deployment of the sensing platform in a lithography cleanroom. Our results show the scalability, flexibility, and reliability properties of the system design. Also, using real-world sensory data collected by SENSELET++, our system can analyze data streams in real-time and detect shape and trend anomalies with a 91% true positive rate. 
    more » « less