skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on January 31, 2026

Title: Sensor-Aware Data Imputation for Time-Series Machine Learning on Low-Power Wearable Devices
Wearable devices that have low-power sensors, processors, and communication capabilities are gaining wide adoption in several health applications. The machine learning algorithms on these devices assume that data from all sensors are available during runtime. However, data from one or more sensors may be unavailable due to energy or communication challenges. This loss of sensor data can result in accuracy degradation of the application. Prior approaches to handle missing data, such as generative models or training multiple classifiers for each combination of missing sensors are not suitable for low-energy wearable devices due to their high overhead at runtime. In contrast to prior approaches, we present an energy-efficient approach, referred to as Sensor-Aware iMputation (SAM), to accurately impute missing data at runtime and recover application accuracy. SAM first uses unsupervised clustering to obtain clusters of similar sensor data patterns. Next, it learns inter-relationship between clusters to obtain imputation patterns for each combination of clusters using a principled sensor-aware search algorithm. Using sensor data for clustering before choosing imputation patterns ensures that the imputation isawareof sensor data observations. Experiments on seven diverse wearable sensor-based time-series datasets demonstrate that SAM is able to maintain accuracy within 5% of the baseline with no missing data when one sensor is missing. We also compare SAM against generative adversarial imputation networks (GAIN), transformers, and k-nearest neighbor methods. Results show that SAM outperforms all three approaches on average by more than 25% when two sensors are missing with negligible overhead compared to the baseline.  more » « less
Award ID(s):
2238257
PAR ID:
10599925
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Association for Computing Machinery
Date Published:
Journal Name:
ACM Transactions on Design Automation of Electronic Systems
Volume:
30
Issue:
1
ISSN:
1084-4309
Page Range / eLocation ID:
1 to 27
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Human activity recognition (HAR) is an important component in a number of health applications, including rehabilitation, Parkinson’s disease, daily activity monitoring, and fitness monitoring. State-of-the-art HAR approaches use multiple sensors on the body to accurately identify activities at runtime. These approaches typically assume that data from all sensors are available for runtime activity recognition. However, data from one or more sensors may be unavailable due to malfunction, energy constraints, or communication challenges between the sensors. Missing data can lead to significant degradation in the accuracy, thus affecting quality of service to users. A common approach for handling missing data is to train classifiers or sensor data recovery algorithms for each combination of missing sensors. However, this results in significant memory and energy overhead on resource-constrained wearable devices. In strong contrast to prior approaches, this paper presents a clustering-based approach (CIM) to impute missing data at runtime. We first define a set of possible clusters and representative data patterns for each sensor in HAR. Then, we create and store a mapping between clusters across sensors. At runtime, when data from a sensor are missing, we utilize the stored mapping table to obtain most likely cluster for the missing sensor. The representative window for the identified cluster is then used as imputation to perform activity classification. We also provide a method to obtain imputation-aware activity prediction sets to handle uncertainty in data when using imputation. Experiments on three HAR datasets show that CIM achieves accuracy within 10% of a baseline without missing data for one missing sensor when providing single activity labels. The accuracy gap drops to less than 1% with imputation-aware classification. Measurements on a low-power processor show that CIM achieves close to 100% energy savings compared to state-of-the-art generative approaches. 
    more » « less
  2. Wearable devices are being increasingly used in high-impact health applications including vital sign monitoring, rehabilitation, and movement disorders. Wearable health monitoring can aid in the United Nations social development goal of healthy lives by enabling early warning, risk reduction, and management of health risks. Health tasks on wearable devices employ multiple sensors to collect relevant parameters of user’s health and make decisions using machine learning (ML) algorithms. The ML algorithms assume that data from all sensors are available for the health monitoring tasks. However, the applications may encounter missing or incomplete data due to user error, energy limitations, or sensor malfunction. Missing data results in significant loss of accuracy and quality of service. This paper presents a novel Classifier-Aware iMputation (CAM) approach to impute missing data such that classifier accuracy for health tasks is not affected. Specifically, CAM employs unsupervised clustering followed by a principled search algorithm to uncover imputation patterns that maintain high accuracy. Evaluations on seven diverse health tasks show that CAM achieves accuracy within 5% of the baseline with no missing data when one sensor is missing. CAM also achieves significantly higher accuracy compared to generative approaches with negligible energy overhead, making it suitable for wide range of wearable applications. 
    more » « less
  3. Human activity recognition (HAR) and, more broadly, activities of daily life recognition using wearable devices have the potential to transform a number of applications, including mobile healthcare, smart homes, and fitness monitoring. Recent approaches for HAR use multiple sensors on various locations on the body to achieve higher accuracy for complex activities. While multiple sensors increase the accuracy, they are also susceptible to reliability issues when one or more sensors are unable to provide data to the application due to sensor malfunction, user error, or energy limitations. Training multiple activity classifiers that use a subset of sensors is not desirable, since it may lead to reduced accuracy for applications. To handle these limitations, we propose a novel generative approach that recovers the missing data of sensors using data available from other sensors. The recovered data are then used to seamlessly classify activities. Experiments using three publicly available activity datasets show that with data missing from one sensor, the proposed approach achieves accuracy that is within 10% of the accuracy with no missing data. Moreover, implementation on a wearable device prototype shows that the proposed approach takes about 1.5 ms for recovering data in the w-HAR dataset, which results in an energy consumption of 606 μJ. The low-energy consumption ensures that SensorGAN is suitable for effectively recovering data in tinyML applications on energy-constrained devices. 
    more » « less
  4. Imputing missing data is a critical task in data-driven intelligent transportation systems. During recent decades there has been a considerable investment in developing various types of sensors and smart systems, including stationary devices (e.g., loop detectors) and floating vehicles equipped with global positioning system (GPS) trackers to collect large-scale traffic data. However, collected data may not include observations from all road segments in a traffic network for different reasons, including sensor failure, transmission error, and because GPS-equipped vehicles may not always travel through all road segments. The first step toward developing real-time traffic monitoring and disruption prediction models is to estimate missing values through a systematic data imputation process. Many of the existing data imputation methods are based on matrix completion techniques that utilize the inherent spatiotemporal characteristics of traffic data. However, these methods may not fully capture the clustered structure of the data. This paper addresses this issue by developing a novel data imputation method using PARATUCK2 decomposition. The proposed method captures both spatial and temporal information of traffic data and constructs a low-dimensional and clustered representation of traffic patterns. The identified spatiotemporal clusters are used to recover network traffic profiles and estimate missing values. The proposed method is implemented using traffic data in the road network of Manhattan in New York City. The performance of the proposed method is evaluated in comparison with two state-of-the-art benchmark methods. The outcomes indicate that the proposed method outperforms the existing state-of-the-art imputation methods in complex and large-scale traffic networks. 
    more » « less
  5. Battery-free sensing devices harvest energy from their surrounding environment to perform sensing, computation, and communication. This enables previously impossible applications in the Internet-of-Things. A core challenge for these devices is maintaining usefulness despite erratic, random or irregular energy availability; which causes inconsistent execution, loss of service and power failures. Adapting execution (degrading or upgrading) seems promising as a way to stave off power failures, meet deadlines, or increase throughput. However, because of constrained resources and limited local information, it is a challenge to decide when would be the best time to adapt, and how exactly to adapt execution. In this paper, we systematically explore the fundamental mechanisms of energy-aware adaptation, and propose heuristic adaptation as a method for modulating the performance of tasks to enable higher sensor coverage, completion rates, or throughput, depending on the application. We build a task based adaptive runtime system for intermittently powered sensors embodying this concept. We complement this runtime with a user facing simulator that enables programmers to conceptualize the tradeoffs they make when choosing what tasks to adapt, and how, relative to real world energy harvesting environment traces. While we target battery-free, intermittently powered sensors, we see general application to all energy harvesting devices. We explore heuristic adaptation with varied energy harvesting modalities and diverse applications: machine learning, activity recognition, and greenhouse monitoring, and find that the adaptive version of our ML app performs up to 46% more classifications with only a 5% drop in accuracy; the activity recognition app captures 76% more classifications with only nominal down-sampling; and find that heuristic adaptation leads to higher throughput versus non-adaptive in all cases. 
    more » « less