The fixed preemption point (FPP) model has been studied as an alternative to fully preemptive and non-preemptive models, as restricting preemptions to specific, predictable locations within a task’s execution can simplify overhead analysis without disallowing preemptions entirely. Prior work has produced response-time analyses for global Earliest Deadline First (G-EDF) scheduling under the FPP model. However, scheduling decisions based solely on task deadlines may be too coarsegrained and may not lead to the lowest response times. In this paper, we propose global FPP EDF-like (G-FPP-EL) scheduling, which assigns a priority point in time for each non-preemptive region of a task. We adapt compliant-vector analysis (CVA) to our model and present general response-time bounds for G-FPPEL schedulers. We then demonstrate that it is possible to design G-FPP-EL schedulers acheiving response-time bounds optimal under CVA and argue that such schedulers should replace global FPP EDF.
more »
« less
This content will become publicly available on December 2, 2026
DERCA: DetERministic Cycle-Level Accelerator on Reconfigurable Platforms in DNN-Enabled Real-Time Safety-Critical Systems
Deep neural network (DNN) models are increasingly deployed in real-time, safety-critical systems such as autonomous vehicles, driving the need for specialized AI accelerators. However, most existing accelerators support only non-preemptive execution or limited preemptive scheduling at the coarse granularity of DNN layers. This restriction leads to frequent priority inversion due to the scarcity of preemption points, resulting in unpredictable execution behavior and, ultimately, system failure. To address these limitations and improve the real-time performance of AI accelerators, we propose DERCA, a novel accelerator architecture that supports fine-grained, intra-layer flexible preemptive scheduling with cycle-level determinism. DERCA incorporates an on-chip Earliest Deadline First (EDF) scheduler to reduce both scheduling latency and variance, along with a customized dataflow design that enables intralayer preemption points (PPs) while minimizing the overhead associated with preemption. Leveraging the limited preemptive task model, we perform a comprehensive predictability analysis of DERCA, enabling formal schedulability analysis and optimized placement of preemption points within the constraints of limited preemptive scheduling. We implement DERCA on the AMD ACAP VCK190 reconfigurable platform. Experimental results show that DERCA outperforms state-of-the-art designs using non-preemptive and layer-wise preemptive dataflows, with less than 5 % overhead in worst-case execution time (WCET) and only 6% additional resource utilization. DERCA is open-sourced on GitHub: https://github.com/arc-research-lab/DERCA
more »
« less
- PAR ID:
- 10657467
- Publisher / Repository:
- IEEE
- Date Published:
- ISSN:
- 2576-3172
- ISBN:
- 979-8-3315-9642-2
- Page Range / eLocation ID:
- 392 to 405
- Format(s):
- Medium: X
- Location:
- Boston, MA, USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Deep neural networks (DNNs) are gaining popularity in a wide range of domains, ranging from speech and video recognition to healthcare. With this increased adoption comes the pressing need for securing DNN execution environments on CPUs, GPUs, and ASICs. While there are active research efforts in supporting a trusted execution environment (TEE) on CPUs, the exploration in supporting TEEs on accelerators is limited, with only a few solutions available. A key limitation along this line of work is that these secure DNN accelerators narrowly consider a few specific architectures. The design choices and the associated cost for securing these architectures do not transfer to other diverse architectures. This paper strives to address this limitation by developing a design space exploration tool for supporting TEEs on diverse DNN accelerators. We target secure DNN accelerators equipped with cryptographic engines where the cryptographic operations are closely coupled with the data movement in the accelerators. These operations significantly complicate the scheduling for DNN accelerators, as the scheduling needs to account for the extra on-chip computation and off-chip memory accesses introduced by these cryptographic operations, and even needs to account for potential interactions across DNN layers. We tackle these challenges in our tool, called SecureLoop, by introducing a scheduling search engine with the following attributes: 1) considers the cryptographic overhead associated with every offchip data access, 2) uses an efficient modular arithmetic technique to compute the optimal authentication block assignment for each individual layer, and 3) uses a simulated annealing algorithm to perform cross-layer optimizations. Compared to the conventional schedulers, our tool finds the schedule for secure DNN designs with up to 33.2% speedup and 50.2% improvement of energy-delay product.more » « less
-
Pellizzoni, Rodolfo (Ed.)Scheduling real-time tasks that utilize GPUs with analyzable guarantees poses a significant challenge due to the intricate interaction between CPU and GPU resources, as well as the complex GPU hardware and software stack. While much research has been conducted in the real-time research community, several limitations persist, including the absence or limited availability of GPU-level preemption, extended blocking times, and/or the need for extensive modifications to program code. In this paper, we propose GCAPS, a GPU Context-Aware Preemptive Scheduling approach for real-time GPU tasks. Our approach exerts control over GPU context scheduling at the device driver level and enables preemption of GPU execution based on task priorities by simply adding one-line macros to GPU segment boundaries. In addition, we provide a comprehensive response time analysis of GPU-using tasks for both our proposed approach as well as the default Nvidia GPU driver scheduling that follows a work-conserving round-robin policy. Through empirical evaluations and case studies, we demonstrate the effectiveness of the proposed approaches in improving taskset schedulability and response time. The results highlight significant improvements over prior work as well as the default scheduling approach, with up to 40% higher schedulability, while also achieving predictable worst-case behavior on Nvidia Jetson embedded platforms.more » « less
-
Brandenburg, Björn B (Ed.)Safety-critical embedded systems such as autonomous vehicles typically have only very limited computational capabilities on board that must be carefully managed to provide required enhanced functionalities. As these systems become more complex and inter-connected, some parts may need to be secured to prevent unauthorized access, or isolated to ensure correctness. We propose the multi-phase secure (MPS) task model as a natural extension of the widely used sporadic task model for modeling both the timing and the security (and isolation) requirements for such systems. Under MPS, task phases reflect execution using different security mechanisms which each have associated execution time costs for startup and teardown. We develop corresponding limited-preemption EDF scheduling algorithms and associated pseudo-polynomial schedulability tests for constrained-deadline MPS tasks. In doing so, we provide a correction to a long-standing schedulability condition for EDF under limited-preemption. Evaluation shows that the proposed tests are efficient to compute for bounded utilizations. We empirically demonstrate that the MPS model successfully schedules more task sets compared to non-preemptive approaches.more » « less
-
The problem of selecting "effective preemption points" in a program --- points in the code at which to permit preemption --- in order to minimize overall running time is considered. Prior solutions that have been proposed for this problem are based on workload models in which worst-case known upper bounds are assumed for the duration needed to perform preemptions at particular points in the code, and of the time needed to non-preemptively execute the code between preemption points. Since these solutions are based on worst-case assumptions, they tend to select effective preemption points in a conservative manner; consequently the overall execution time of the program may be needlessly large under most typical run-time circumstances. We consider a more general workload model in which "typical" values, as well as upper bounds, are assumed to be known for the preemption durations and the non-preemptive code-execution durations; given such information, we derive algorithms for the optimal placement of preemption points in a manner that minimizes the typical overall running time (while continuing to guarantee, if needed, upper bounds on the worst-case over-all running time). Both off-line solutions (in which all preemption points are selected prior to run-time) and on-line solutions (where the selection of some of the preemption points is made during run-time and therefore can exploit knowledge of the actual durations of prior preemptions and of the executions of already executed pieces of code) are presented and proved optimal.more » « less
An official website of the United States government
