skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: VeriCompress: A Tool to Streamline the Synthesis of Verified Robust Compressed Neural Networks from Scratch
AI's widespread integration has led to neural networks (NN) deployment on edge and similar limited-resource platforms for safety-critical scenarios. Yet, NN's fragility raises concerns about reliable inference. Moreover, constrained platforms demand compact networks. This study introduces VeriCompress, a tool that automates the search and training of compressed models with robustness guarantees. These models are well-suited for safety-critical applications and adhere to predefined architecture and size limitations, making them deployable on resource-restricted platforms. The method trains models 2-3 times faster than the state-of-the-art approaches, surpassing them by average accuracy and robustness gains of 15.1 and 9.8 percentage points, respectively. When deployed on a resource-restricted generic platform, these models require 5-8 times less memory and 2-4 times less inference time than models used in verified robustness literature. Our comprehensive evaluation across various model architectures and datasets, including MNIST, CIFAR, SVHN, and a relevant pedestrian detection dataset, showcases VeriCompress's capacity to identify compressed verified robust models with reduced computation overhead compared to current standards. This underscores its potential as a valuable tool for end users, such as developers of safety-critical applications on edge or Internet of Things platforms, empowering them to create suitable models for safety-critical, resource-constrained platforms in their respective domains.  more » « less
Award ID(s):
2124285 2526174
PAR ID:
10560319
Author(s) / Creator(s):
; ;
Publisher / Repository:
AAAI
Date Published:
Journal Name:
Proceedings of the AAAI Conference on Artificial Intelligence
Volume:
38
Issue:
21
ISSN:
2159-5399
Page Range / eLocation ID:
22899 to 22905
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Deep learning methods have been very successful at radio frequency fingerprinting tasks, predicting the identity of transmitting devices with high accuracy. We study radio frequency fingerprinting deployments at resource-constrained edge devices. We use structured pruning to jointly train and sparsify neural networks tailored to edge hardware implementations. We compress convolutional layers by a 27.2× factor while incurring a negligible prediction accuracy decrease (less than 1%). We demonstrate the efficacy of our approach over multiple edge hardware platforms, including a Samsung Galaxy S10 phone and a Xilinx-ZCU104 FPGA. Our method yields significant inference speedups, 11.5× on the FPGA and 3× on the smartphone, as well as high efficiency: the FPGA processing time is 17× smaller than in a V100 GPU. To the best of our knowledge, we are the first to explore the possibility of compressing networks for radio frequency fingerprinting; as such, our experiments can be seen as a means of characterizing the informational capacity associated with this specific learning task. 
    more » « less
  2. Since emerging edge applications such as Internet of Things (IoT) analytics and augmented reality have tight latency constraints, hardware AI accelerators have been recently proposed to speed up deep neural network (DNN) inference run by these applications. Resource-constrained edge servers and accelerators tend to be multiplexed across multiple IoT applications, introducing the potential for performance interference between latency-sensitive workloads. In this article, we design analytic models to capture the performance of DNN inference workloads on shared edge accelerators, such as GPU and edgeTPU, under different multiplexing and concurrency behaviors. After validating our models using extensive experiments, we use them to design various cluster resource management algorithms to intelligently manage multiple applications on edge accelerators while respecting their latency constraints. We implement a prototype of our system in Kubernetes and show that our system can host 2.3× more DNN applications in heterogeneous multi-tenant edge clusters with no latency violations when compared to traditional knapsack hosting algorithms. 
    more » « less
  3. Not AvailablDeploying monocular depth estimation on resource-constrained edge devices is a significant challenge, particularly when attempting to perform both training and inference concurrently. Current lightweight, self-supervised approaches typically rely on complex frameworks that are hard to implement and deploy in real-world settings. To address this gap, we introduce the first framework for Lightweight Training and Inference (LITI) that combines ready-to-deploy models with streamlined code and fully functional, parallel training and inference pipelines. Our experiments show various models being deployed for inference, training, or both inference and training, leveraging inputs from a real-time RGB camera sensor. Thus, our framework enables training and inference on resource-constrained edge devices for complex applications such as depth estimation. 
    more » « less
  4. Natural-language interaction between passengers and autonomous vehicles is essential for trust, safety, and user experience, but deploying Large Language Models (LLMs) on automotive edge platforms is constrained by compute, memory, energy, and privacy. We present Pi-talk, an edge-only system that enables real-time passenger–vehicle dialogue using a Small Language Model (SLM) running entirely on embedded hardware. Pi-talk performs multimodal fusion of onboard camera, ultrasonic distance, and navigation context via a lightweight encoder–adapter module that aligns modalities into compact semantic tokens for a pre-trained SLM. The SLM produces context-aware explanations of driving decisions, route options, and situational updates without cloud connectivity. Safety is enforced through a real-time safety envelope that gates responses and actions using distance thresholds and timing constraints. We further adapter-tune the SLM (on-device or offline) and deploy it with INT8 quantization and an Open Neural Network Exchange (ONNX) runtime to achieve efficient batch = 1 inference on Raspberry-Pi–class hardware. We evaluate task quality (evaluation loss), end-to-end latency, CPU utilization, and memory footprint, and include ablations contrasting unimodal vs. fused inputs. Results show that Pi-talk sustains few-second, edge-only inference while meeting stringent resource and latency limits and maintaining the safety envelope required for autonomous operation. To our knowledge, Pi-talk is among the first edgeonly, multimodal passenger–vehicle dialogue systems that both fine-tune and run a small language model entirely on Raspberry Pi–class, CPU-only hardware with an explicit while enforcing a runtime safety envelope. 
    more » « less
  5. The ever increasing size of deep neural network (DNN) models once implied that they were only limited to cloud data centers for runtime inference. Nonetheless, the recent plethora of DNN model compression techniques have successfully overcome this limit, turning into a reality that DNN-based inference can be run on numerous resource-constrained edge devices including mobile phones, drones, robots, medical devices, wearables, Internet of Things devices, among many others. Naturally, edge devices are highly heterogeneous in terms of hardware specification and usage scenarios. On the other hand, compressed DNN models are so diverse that they exhibit different tradeoffs in a multi-dimension space, and not a single model can achieve optimality in terms of all important metrics such as accuracy, latency and energy consumption. Consequently, how to automatically select a compressed DNN model for an edge device to run inference with optimal quality of experience (QoE) arises as a new challenge. The state-of-the-art approaches either choose a common model for all/most devices, which is optimal for a small fraction of edge devices at best, or apply device-specific DNN model compression, which is not scalable. In this paper, by leveraging the predictive power of machine learning and keeping end users in the loop, we envision an automated device-level DNN model selection engine for QoE-optimal edge inference. To concretize our vision, we formulate the DNN model selection problem into a contextual multi-armed bandit framework, where features of edge devices and DNN models are contexts and pre-trained DNN models are arms selected online based on the history of actions and users' QoE feedback. We develop an efficient online learning algorithm to balance exploration and exploitation. Our preliminary simulation results validate our algorithm and highlight the potential of machine learning for automating DNN model selection to achieve QoE-optimal edge inference. 
    more » « less