Deep neural network (DNN) models are increasingly deployed in real-time, safety-critical systems such as autonomous vehicles, driving the need for specialized AI accelerators. However, most existing accelerators support only non-preemptive execution or limited preemptive scheduling at the coarse granularity of DNN layers. This restriction leads to frequent priority inversion due to the scarcity of preemption points, resulting in unpredictable execution behavior and, ultimately, system failure. To address these limitations and improve the real-time performance of AI accelerators, we propose DERCA, a novel accelerator architecture that supports fine-grained, intra-layer flexible preemptive scheduling with cycle-level determinism. DERCA incorporates an on-chip Earliest Deadline First (EDF) scheduler to reduce both scheduling latency and variance, along with a customized dataflow design that enables intralayer preemption points (PPs) while minimizing the overhead associated with preemption. Leveraging the limited preemptive task model, we perform a comprehensive predictability analysis of DERCA, enabling formal schedulability analysis and optimized placement of preemption points within the constraints of limited preemptive scheduling. We implement DERCA on the AMD ACAP VCK190 reconfigurable platform. Experimental results show that DERCA outperforms state-of-the-art designs using non-preemptive and layer-wise preemptive dataflows, with less than 5 % overhead in worst-case execution time (WCET) and only 6% additional resource utilization. DERCA is open-sourced on GitHub: https://github.com/arc-research-lab/DERCA
more »
« less
This content will become publicly available on June 29, 2026
ART: Customizing Accelerators for DNN-Enabled Real-Time Safety-Critical Systems
Real-time systems are widely applied in different areas like autonomous vehicles, where safety is the key metric. However, on the FPGA platform, most of the prior accelerator frameworks omit discussing the schedulability in such real-time safety-critical systems, leaving deadlines unmet, which can lead to catastrophic system failures. To address this, we propose the ART framework, a hardware-software co-design approach that transforms baseline accelerators into “real-time guaranteed" accelerators. On the software side, ART performs schedulability analysis and preemption point placement, optimizing task scheduling to meet deadlines and enhance throughput. On the hardware side, ART integrates the Global Earliest Deadline First (GEDF) scheduling algorithm, implements preemption, and conducts source code transformation to transform baseline HLS-based accelerators into designs targeted for real-time systems capable of saving and resuming tasks. ART also includes integration, debugging, and testing tools for full-system implementation. We demonstrate the methodology of ART on two kinds of popular accelerator models and evaluate on AMD Versal VCK190 platform, where ART meets schedulability requirements that baseline accelerators fail. ART is lightweight, utilizing <0.5% resources. With about 100 lines of user input, ART generates about 2.5k lines of accelerator code, making it a push-button solution.
more »
« less
- PAR ID:
- 10612758
- Publisher / Repository:
- ACM
- Date Published:
- Page Range / eLocation ID:
- 442 to 449
- Subject(s) / Keyword(s):
- Real-time System Safety-critical System Hardware Accelerator
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Pellizzoni, Rodolfo (Ed.)Scheduling real-time tasks that utilize GPUs with analyzable guarantees poses a significant challenge due to the intricate interaction between CPU and GPU resources, as well as the complex GPU hardware and software stack. While much research has been conducted in the real-time research community, several limitations persist, including the absence or limited availability of GPU-level preemption, extended blocking times, and/or the need for extensive modifications to program code. In this paper, we propose GCAPS, a GPU Context-Aware Preemptive Scheduling approach for real-time GPU tasks. Our approach exerts control over GPU context scheduling at the device driver level and enables preemption of GPU execution based on task priorities by simply adding one-line macros to GPU segment boundaries. In addition, we provide a comprehensive response time analysis of GPU-using tasks for both our proposed approach as well as the default Nvidia GPU driver scheduling that follows a work-conserving round-robin policy. Through empirical evaluations and case studies, we demonstrate the effectiveness of the proposed approaches in improving taskset schedulability and response time. The results highlight significant improvements over prior work as well as the default scheduling approach, with up to 40% higher schedulability, while also achieving predictable worst-case behavior on Nvidia Jetson embedded platforms.more » « less
-
Data movement latency when using on-chip accelerators in emerging heterogeneous architectures is a serious performance bottleneck. While hardware/software mechanisms such as peer-to-peer DMA between producer/consumer accelerators allow bypassing main memory and significantly reduce main memory contention, schedulers in both the hardware and software domains remain oblivious to their presence. Instead, most contemporary schedulers tend to be deadline-driven, with improved utilization and/or throughput serving as secondary or co-primary goals. This lack of focus on data communication will only worsen execution times as accelerator latencies reduce. In this paper, we present RELIEF (RElaxing Least-laxIty to Enable Forwarding), an online least laxity-driven accelerator scheduling policy that relieves memory pressure in accelerator-rich architectures via data movement-aware scheduling. RELIEF leverages laxity (time margin to a deadline) to opportunistically utilize available hardware data forwarding mechanisms while minimizing quality-of-service (QoS) degradation and unfairness. RELIEF achieves up to 50% more forwards compared to state-of- the-art policies, reducing main memory traffic and energy consumption by up to 32% and 18%, respectively. At the same time, RELIEF meets 14% more task deadlines on average and reduces worst-case deadline violation by 14%, highlighting QoS and fairness improvements.more » « less
-
High-performance kernel libraries are critical to exploiting accelerators and specialized instructions in many applications. Because compilers are difficult to extend to support diverse and rapidly-evolving hardware targets, and automatic optimization is often insufficient to guarantee state-of-the-art performance, these libraries are commonly still coded and optimized by hand, at great expense, in low-level C and assembly. To better support development of high-performance libraries for specialized hardware, we propose a new programming language, Exo, based on the principle of exocompilation: externalizing target-specific code generation support and optimization policies to user-level code. Exo allows custom hardware instructions, specialized memories, and accelerator configuration state to be defined in user libraries. It builds on the idea of user scheduling to externalize hardware mapping and optimization decisions. Schedules are defined as composable rewrites within the language, and we develop a set of effect analyses which guarantee program equivalence and memory safety through these transformations. We show that Exo enables rapid development of state-of-the-art matrix-matrix multiply and convolutional neural network kernels, for both an embedded neural accelerator and x86 with AVX-512 extensions, in a few dozen lines of code each.more » « less
-
This paper proposes a Priority-driven Accelerator Access Management (PAAM) framework for multi-process robotic applications built on top of the Robot Operating System (ROS) 2 middleware platform. The framework addresses the issue of predictable execution of time- and safety-critical callback chains that require hardware accelerators such as GPUs and TPUs. PAAM provides a standalone ROS executor that acts as an accelerator resource server, arbitrating accelerator access requests from all other callbacks at the application layer. This approach enables coordinated and priority-driven accelerator access management in multi-process robotic systems. The framework design is directly applicable to all types of accelerators and enables granular control over how specific chains access accelerators, making it possible to achieve predictable real-time support for accelerators used by safety-critical callback chains without making changes to underlying accelerator device drivers. The paper shows that PAAM also offers a theoretical analysis that can upper bound the worst-case response time of safety-critical callback chains that necessitate accelerator access. This paper also demonstrates that complex robotic systems with extensive accelerator usage that are integrated with PAAM may achieve up to a 91% reduction in end-to-end response time of their critical callback chains.more » « less
An official website of the United States government
