skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on June 29, 2026

Title: ART: Customizing Accelerators for DNN-Enabled Real-Time Safety-Critical Systems
Real-time systems are widely applied in different areas like autonomous vehicles, where safety is the key metric. However, on the FPGA platform, most of the prior accelerator frameworks omit discussing the schedulability in such real-time safety-critical systems, leaving deadlines unmet, which can lead to catastrophic system failures. To address this, we propose the ART framework, a hardware-software co-design approach that transforms baseline accelerators into “real-time guaranteed" accelerators. On the software side, ART performs schedulability analysis and preemption point placement, optimizing task scheduling to meet deadlines and enhance throughput. On the hardware side, ART integrates the Global Earliest Deadline First (GEDF) scheduling algorithm, implements preemption, and conducts source code transformation to transform baseline HLS-based accelerators into designs targeted for real-time systems capable of saving and resuming tasks. ART also includes integration, debugging, and testing tools for full-system implementation. We demonstrate the methodology of ART on two kinds of popular accelerator models and evaluate on AMD Versal VCK190 platform, where ART meets schedulability requirements that baseline accelerators fail. ART is lightweight, utilizing <0.5% resources. With about 100 lines of user input, ART generates about 2.5k lines of accelerator code, making it a push-button solution.  more » « less
Award ID(s):
2324864 2328972 2217003 2213701 2536952 2103604
PAR ID:
10612758
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
ACM
Date Published:
ISBN:
9798400714962
Page Range / eLocation ID:
442 to 449
Subject(s) / Keyword(s):
Real-time System Safety-critical System Hardware Accelerator
Format(s):
Medium: X
Location:
New Orleans LA USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Pellizzoni, Rodolfo (Ed.)
    Scheduling real-time tasks that utilize GPUs with analyzable guarantees poses a significant challenge due to the intricate interaction between CPU and GPU resources, as well as the complex GPU hardware and software stack. While much research has been conducted in the real-time research community, several limitations persist, including the absence or limited availability of GPU-level preemption, extended blocking times, and/or the need for extensive modifications to program code. In this paper, we propose GCAPS, a GPU Context-Aware Preemptive Scheduling approach for real-time GPU tasks. Our approach exerts control over GPU context scheduling at the device driver level and enables preemption of GPU execution based on task priorities by simply adding one-line macros to GPU segment boundaries. In addition, we provide a comprehensive response time analysis of GPU-using tasks for both our proposed approach as well as the default Nvidia GPU driver scheduling that follows a work-conserving round-robin policy. Through empirical evaluations and case studies, we demonstrate the effectiveness of the proposed approaches in improving taskset schedulability and response time. The results highlight significant improvements over prior work as well as the default scheduling approach, with up to 40% higher schedulability, while also achieving predictable worst-case behavior on Nvidia Jetson embedded platforms. 
    more » « less
  2. Data movement latency when using on-chip accelerators in emerging heterogeneous architectures is a serious performance bottleneck. While hardware/software mechanisms such as peer-to-peer DMA between producer/consumer accelerators allow bypassing main memory and significantly reduce main memory contention, schedulers in both the hardware and software domains remain oblivious to their presence. Instead, most contemporary schedulers tend to be deadline-driven, with improved utilization and/or throughput serving as secondary or co-primary goals. This lack of focus on data communication will only worsen execution times as accelerator latencies reduce. In this paper, we present RELIEF (RElaxing Least-laxIty to Enable Forwarding), an online least laxity-driven accelerator scheduling policy that relieves memory pressure in accelerator-rich architectures via data movement-aware scheduling. RELIEF leverages laxity (time margin to a deadline) to opportunistically utilize available hardware data forwarding mechanisms while minimizing quality-of-service (QoS) degradation and unfairness. RELIEF achieves up to 50% more forwards compared to state-of- the-art policies, reducing main memory traffic and energy consumption by up to 32% and 18%, respectively. At the same time, RELIEF meets 14% more task deadlines on average and reduces worst-case deadline violation by 14%, highlighting QoS and fairness improvements. 
    more » « less
  3. High-performance kernel libraries are critical to exploiting accelerators and specialized instructions in many applications. Because compilers are difficult to extend to support diverse and rapidly-evolving hardware targets, and automatic optimization is often insufficient to guarantee state-of-the-art performance, these libraries are commonly still coded and optimized by hand, at great expense, in low-level C and assembly. To better support development of high-performance libraries for specialized hardware, we propose a new programming language, Exo, based on the principle of exocompilation: externalizing target-specific code generation support and optimization policies to user-level code. Exo allows custom hardware instructions, specialized memories, and accelerator configuration state to be defined in user libraries. It builds on the idea of user scheduling to externalize hardware mapping and optimization decisions. Schedules are defined as composable rewrites within the language, and we develop a set of effect analyses which guarantee program equivalence and memory safety through these transformations. We show that Exo enables rapid development of state-of-the-art matrix-matrix multiply and convolutional neural network kernels, for both an embedded neural accelerator and x86 with AVX-512 extensions, in a few dozen lines of code each. 
    more » « less
  4. This paper proposes a Priority-driven Accelerator Access Management (PAAM) framework for multi-process robotic applications built on top of the Robot Operating System (ROS) 2 middleware platform. The framework addresses the issue of predictable execution of time- and safety-critical callback chains that require hardware accelerators such as GPUs and TPUs. PAAM provides a standalone ROS executor that acts as an accelerator resource server, arbitrating accelerator access requests from all other callbacks at the application layer. This approach enables coordinated and priority-driven accelerator access management in multi-process robotic systems. The framework design is directly applicable to all types of accelerators and enables granular control over how specific chains access accelerators, making it possible to achieve predictable real-time support for accelerators used by safety-critical callback chains without making changes to underlying accelerator device drivers. The paper shows that PAAM also offers a theoretical analysis that can upper bound the worst-case response time of safety-critical callback chains that necessitate accelerator access. This paper also demonstrates that complex robotic systems with extensive accelerator usage that are integrated with PAAM may achieve up to a 91% reduction in end-to-end response time of their critical callback chains. 
    more » « less
  5. The reliability and security of safety-critical real-time systems are of utmost importance because the failure of these systems could incur severe consequences (e.g., loss of lives or failure of a mission). Such properties require strong isolation between components and they rely on enforcement mechanisms provided by the underlying operating system (OS) kernel. In addition to spatial isolation which is commonly provided by OS kernels to various extents, it also requires temporal isolation, that is, properties on the schedule of one component (e.g., schedulability) are independent of behaviors of other components. The strict isolation between components relies critically on algorithmic properties of theconcrete implementationof the scheduler, such as timely provision of time slots, obliviousness to preemption, etc. However, existing work either only reasons about an abstract model of the scheduler, or proves properties of the scheduler implementation that are not rich enough to establish the isolation between different components. In this paper, we present a novel compositional framework for reasoning about algorithmic properties of the concrete implementation of preemptive schedulers. In particular, we usevirtual timeline, a variant of the supply bound function used in real-time scheduling analysis, to specify and reason about the scheduling of each component in isolation. We show that the properties proved on this abstraction carry down to the generated assembly code of the OS kernel. Using this framework, we successfully verify a real-time OS kernel, which extends mCertiKOS, a single-processor non-preemptive kernel, with user-level preemption, a verified timer interrupt handler, and a verified real-time scheduler. We prove that in the absence of microarchitectural-level timing channels, this new kernel enjoys temporal and spatial isolation on top of the functional correctness guarantee. All the proofs are implemented in the Coq proof assistant. 
    more » « less