Many IoT applications have increasingly adopted machine learning (ML) techniques, such as classification and detection, to enhance automation and decision-making processes. With advances in hardware accelerators such as Nvidia’s Jetson embedded GPUs, the computational capabilities of end devices, particularly for ML inference workloads, have significantly improved in recent years. These advances have opened opportunities for distributing computation across the edge network, enabling optimal resource utilization and reducing request latency. Previous research has demonstrated promising results in collaborative inference, where processing units in the edge network, such as end devices and edge servers, collaboratively execute an inference request to minimize latency.This paper explores approaches for implementing collaborative inference on a single model in resource-constrained edge networks, including on-device, device-edge, and edge-edge collaboration. We present preliminary results from proof-of-concept experiments to support each case. We discuss dynamic factors that can impact the performance of these inference execution strategies, such as network variability, thermal constraints, and workload fluctuations. Finally, we outline potential directions for future research.
more »
« less
Enhancing Resilience in Distributed ML Inference Pipelines for Edge Computing
As edge computing and sensing devices continue to proliferate, distributed machine learning (ML) inference pipelines are becoming popular for enabling low-latency, real-time decision-making at scale. However, the geographically dispersed and often resource-constrained nature of edge devices makes them susceptible to various failures, such as hardware malfunctions, network disruptions, and device overloading. These edge failures can significantly affect the performance and availability of inference pipelines and the sensing-to-decision-making loops they enable. In addition, the complexity of task dependencies amplifies the difficulty of maintaining performant and reliable ML operations. To address these challenges and minimize the impact of edge failures on inference pipelines, this paper presents several fault-tolerant approaches, including sensing redundancy, structural resilience, failover replication, and pipeline reconfiguration. For each approach, we explain the key techniques and highlight their effectiveness and tradeoffs. Finally, we discuss the challenges associated with these approaches and outline future directions.
more »
« less
- Award ID(s):
- 2325956
- PAR ID:
- 10591318
- Publisher / Repository:
- IEEE
- Date Published:
- ISBN:
- 979-8-3503-7423-0
- Page Range / eLocation ID:
- 1 to 6
- Format(s):
- Medium: X
- Location:
- Washington, DC, USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)There is an increasing emphasis on securing deep learning (DL) inference pipelines for mobile and IoT applications with privacy-sensitive data. Prior works have shown that privacy-sensitive data can be secured throughout deep learning inferences on cloud-offloaded models through trusted execution environments such as Intel SGX. However, prior solutions do not address the fundamental challenges of securing the resource-intensive inference tasks on low-power, low-memory devices (e.g., mobile and IoT devices), while achieving high performance. To tackle these challenges, we propose SecDeep, a low-power DL inference framework demonstrating that both security and performance of deep learning inference on edge devices are well within our reach. Leveraging TEEs with limited resources, SecDeep guarantees full confidentiality for input and intermediate data, as well as the integrity of the deep learning model and framework. By enabling and securing neural accelerators, SecDeep is the first of its kind to provide trusted and performant DL model inferencing on IoT and mobile devices. We implement and validate SecDeep by interfacing the ARM NN DL framework with ARM TrustZone. Our evaluation shows that we can securely run inference tasks with 16× to 172× faster performance than no acceleration approaches by leveraging edge-available accelerators.more » « less
-
null (Ed.)With the explosion in Big Data, it is often forgotten that much of the data nowadays is generated at the edge. Specifically, a major source of data is users' endpoint devices like phones, smart watches, etc., that are connected to the internet, also known as the Internet-of-Things (IoT). This "edge of data" faces several new challenges related to hardware-constraints, privacy-aware learning, and distributed learning (both training as well as inference). So what systems and machine learning algorithms can we use to generate or exploit data at the edge? Can network science help us solve machine learning (ML) problems? Can IoT-devices help people who live with some form of disability and many others benefit from health monitoring? In this tutorial, we introduce the network science and ML techniques relevant to edge computing, discuss systems for ML (e.g., model compression, quantization, HW/SW co-design, etc.) and ML for systems design (e.g., run-time resource optimization, power management for training and inference on edge devices), and illustrate their impact in addressing concrete IoT applications.more » « less
-
Not AvailablDeploying monocular depth estimation on resource-constrained edge devices is a significant challenge, particularly when attempting to perform both training and inference concurrently. Current lightweight, self-supervised approaches typically rely on complex frameworks that are hard to implement and deploy in real-world settings. To address this gap, we introduce the first framework for Lightweight Training and Inference (LITI) that combines ready-to-deploy models with streamlined code and fully functional, parallel training and inference pipelines. Our experiments show various models being deployed for inference, training, or both inference and training, leveraging inputs from a real-time RGB camera sensor. Thus, our framework enables training and inference on resource-constrained edge devices for complex applications such as depth estimation.more » « less
-
Decision forest, including RandomForest, XGBoost, and Light-GBM, dominates the machine learning tasks over tabular data. Recently, several frameworks were developed for decision forest inference, such as ONNX, TreeLite from Amazon, TensorFlow Decision Forest from Google, HummingBirdfrom Microsoft, Nvidia FIL, and lleaves. While these frameworks are fully optimized for inference computations, they are all decoupled with databases and general data management frameworks, which leads to cross-system performance overheads. We first provided a DICT model to understand the performance gaps between decoupled and in-database inference. We further identified that for in-database inference, in addition to the popular UDF-centric representation that encapsulates the ML into one User Defined Function(UDF), there also exists a relation-centric representation that breaks down the decision forest inference into several fine-grained SQL operations. The relation-centric representation can achieve significantly better performance for large models. We optimized both implementations and conducted a comprehensive benchmark to compare these two implementations to the aforementioned decoupled inference pipelines and existing in-database inference pipelines such as Spark-SQL and PostgresML. The evaluation results validated the DICT model and demonstrated the superior performance of our in-database inference design compared to the baselines.more » « less
An official website of the United States government

