skip to main content

This content will become publicly available on June 4, 2024

Title: ERSAM: Neural Architecture Search for Energy-Efficient and Real-Time Social Ambiance Measurement
Social ambiance describes the context in which social interactions happen, and can be measured using speech audio by counting the number of concurrent speakers. This measurement has enabled various mental health tracking and human-centric IoT applications. While on-device Socal Ambiance Measure (SAM) is highly desirable to ensure user privacy and thus facilitate wide adoption of the aforementioned applications, the required computational complexity of state-of-the-art deep neural networks (DNNs) powered SAM solutions stands at odds with the often constrained resources on mobile devices. Furthermore, only limited labeled data is available or practical when it comes to SAM under clinical settings due to various privacy constraints and the required human effort, further challenging the achievable accuracy of on-device SAM solutions. To this end, we propose a dedicated neural architecture search framework for Energy-efficient and Real-time SAM (ERSAM). Specifically, our ERSAM framework can automatically search for DNNs that push forward the achievable accuracy vs. hardware efficiency frontier of mobile SAM solutions. For example, ERSAM-delivered DNNs only consume 40 mW • 12 h energy and 0.05 seconds processing latency for a 5 seconds audio segment on a Pixel 3 phone, while only achieving an error rate of 14.3% on a social ambiance dataset generated by LibriSpeech. We can expect that our ERSAM framework can pave the way for ubiquitous on-device SAM solutions which are in growing demand.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Date Published:
Journal Name:
2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Page Range / eLocation ID:
1 to 5
Medium: X
Rhodes Island, Greece
Sponsoring Org:
National Science Foundation
More Like this
  1. Deep neural networks (DNNs) have gained considerable attention in various real-world applications due to the strong performance on representation learning. However, a DNN needs to be trained many epochs for pursuing a higher inference accuracy, which requires storing sequential versions of DNNs and releasing the updated versions to users. As a result, large amounts of storage and network resources are required, which significantly hamper DNN utilization on resource-constrained platforms (e.g., IoT, mobile phone). In this paper, we present a novel delta compression framework called Delta-DNN, which can efficiently compress the float-point numbers in DNNs by exploiting the floats similarity existing in DNNs during training. Specifically, (1) we observe the high similarity of float-point numbers between the neighboring versions of a neural network in training; (2) inspired by delta compression technique, we only record the delta (i.e., the differences) between two neighboring versions, instead of storing the full new version for DNNs; (3) we use the error-bounded lossy compression to compress the delta data for a high compression ratio, where the error bound is strictly assessed by an acceptable loss of DNNs’ inference accuracy; (4) we evaluate Delta-DNN’s performance on two scenarios, including reducing the transmission of releasing DNNs over network and saving the storage space occupied by multiple versions of DNNs. According to experimental results on six popular DNNs, DeltaDNN achieves the compression ratio 2x~10x higher than state-ofthe-art methods, while without sacrificing inference accuracy and changing the neural network structure. 
    more » « less
  2. Neural architecture search (NAS) is a promising technique to design efficient and high-performance deep neural networks (DNNs). As the performance requirements of ML applications grow continuously, the hardware accelerators start playing a central role in DNN design. This trend makes NAS even more complicated and time-consuming for most real applications. This paper proposes FLASH, a very fast NAS methodology that co-optimizes the DNN accuracy and performance on a real hardware platform. As the main theoretical contribution, we first propose the NN-Degree, an analytical metric to quantify the topological characteristics of DNNs with skip connections (e.g., DenseNets, ResNets, Wide-ResNets, and MobileNets). The newly proposed NN-Degree allows us to do training-free NAS within one second and build an accuracy predictor by training as few as 25 samples out of a vast search space with more than 63 billion configurations. Second, by performing inference on the target hardware, we fine-tune and validate our analytical models to estimate the latency, area, and energy consumption of various DNN architectures while executing standard ML datasets. Third, we construct a hierarchical algorithm based on simplicial homology global optimization (SHGO) to optimize the model-architecture co-design process, while considering the area, latency, and energy consumption of the target hardware. We demonstrate that, compared to the state-of-the-art NAS approaches, our proposed hierarchical SHGO-based algorithm enables more than four orders of magnitude speedup (specifically, the execution time of the proposed algorithm is about 0.1 seconds). Finally, our experimental evaluations show that FLASH is easily transferable to different hardware architectures, thus enabling us to do NAS on a Raspberry Pi-3B processor in less than 3 seconds. 
    more » « less
  3. Eye tracking has become an essential human-machine interaction modality for providing immersive experience in numerous virtual and augmented reality (VR/AR) applications desiring high throughput (e.g., 240 FPS), small-form, and enhanced visual privacy. However, existing eye tracking systems are still limited by their: (1) large form-factor largely due to the adopted bulky lens-based cameras; (2) high communication cost required between the camera and backend processor; and (3) potentially concerned low visual privacy, thus prohibiting their more extensive applications. To this end, we propose, develop, and validate a lensless FlatCambased eye tracking algorithm and accelerator co-design framework dubbed EyeCoD to enable eye tracking systems with a much reduced form-factor and boosted system efficiency without sacrificing the tracking accuracy, paving the way for next-generation eye tracking solutions. On the system level, we advocate the use of lensless FlatCams instead of lens-based cameras to facilitate the small form-factor need in mobile eye tracking systems, which also leaves rooms for a dedicated sensing-processor co-design to reduce the required camera-processor communication latency. On the algorithm level, EyeCoD integrates a predict-then-focus pipeline that first predicts the region-of-interest (ROI) via segmentation and then only focuses on the ROI parts to estimate gaze directions, greatly reducing redundant computations and data movements. On the hardware level, we further develop a dedicated accelerator that (1) integrates a novel workload orchestration between the aforementioned segmentation and gaze estimation models, (2) leverages intra-channel reuse opportunities for depth-wise layers, (3) utilizes input feature-wise partition to save activation memory size, and (4) develops a sequential-write-parallel-read input buffer to alleviate the bandwidth requirement for the activation global buffer. On-silicon measurement and extensive experiments validate that our EyeCoD consistently reduces both the communication and computation costs, leading to an overall system speedup of 10.95×, 3.21×, and 12.85× over general computing platforms including CPUs and GPUs, and a prior-art eye tracking processor called CIS-GEP, respectively, while maintaining the tracking accuracy. Codes are available at 
    more » « less
  4. null (Ed.)
    We propose Zygarde --- which is an energy- and accuracy-aware soft real-time task scheduling framework for batteryless systems that flexibly execute deep learning tasks1 that are suitable for running on microcontrollers. The sporadic nature of harvested energy, resource constraints of the embedded platform, and the computational demand of deep neural networks (DNNs) pose a unique and challenging real-time scheduling problem for which no solutions have been proposed in the literature. We empirically study the problem and model the energy harvesting pattern as well as the trade-off between the accuracy and execution of a DNN. We develop an imprecise computing-based scheduling algorithm that improves the timeliness of DNN tasks on intermittently powered systems. We evaluate Zygarde using four standard datasets as well as by deploying it in six real-life applications involving audio and camera sensor systems. Results show that Zygarde decreases the execution time by up to 26% and schedules 9% -- 34% more tasks with up to 21% higher inference accuracy, compared to traditional schedulers such as the earliest deadline first (EDF). 
    more » « less
  5. By 2018, it is no secret to the global networking community: Internet of Things (IoT) devices, usually controlled by IoT applications and applets, have dominated human lives. It has been shown that popular applet platforms (including If This Then That (IFTTT)) are susceptible to attacks that try to exfiltrate private photos, leak user location, etc. As new attacks might show up very frequently, tracking them fast and in an efficient and scalable manner is a daunting task due to the limited (e.g., memory, energy) resources at the IoT/mobile device and the large network size. Towards that direction, in this paper we propose a decentralized Dynamic Information Flow Tracking (DDIFT) framework that overcomes these challenges, better adapts to the IoT context, and further, is able to illuminate IoT applet attacks. In doing so, we leverage the synergy between: (i) a dynamic information flow tracking module that considers the application of tags with different types along with provenance information and runs in the mobile device at a fast timescale, (ii) a forensics analysis module running in the cloud at a slow timescale, (iii) distributed optimization to optimize various functionalities of the above modules as well as their interaction. We show that our framework is able to detect IoT applet attacks with higher accuracy (on average 81% improvement for different URL upload attack scenarios) and decreases resource wastage (on average 71% less memory usage under different integrity attack scenarios) compared to traditional DIFT, opening new horizons for IoT privacy and security. 
    more » « less