skip to main content


Title: Efficient Deterministic Federated Scheduling for Parallel Real-Time Tasks
Federated scheduling is a generalization of partitioned scheduling for parallel tasks on multiprocessors, and has been shown to be a competitive scheduling approach. However, federated scheduling may waste resources due to its dedicated allocation of processors to parallel tasks. In this work we introduce a novel algorithm for scheduling parallel tasks that require more than one processor to meet their deadlines (i.e., heavy tasks). The proposed algorithm computes a deterministic schedule for each heavy task based on its internal graph structure. It efficiently exploits the processors allocated to each task and thus reduces the number of processors required by the task. Experimental evaluation shows that our new federated scheduling algorithm significantly outperforms other state-of-the-art federated-based scheduling approaches, including semi-federated scheduling and reservation-based federated scheduling, that were developed to tackle resource waste in federated scheduling, and a stretching algorithm that also uses the tasks' graph structures.  more » « less
Award ID(s):
1814739
NSF-PAR ID:
10289282
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2020 IEEE 26th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)
Page Range / eLocation ID:
1 to 10
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Both energy-efficiency and real-time performance are critical requirements in many embedded systems applications such as self-driving car, robotic system, disaster response, and security/safety control. These systems entail a myriad of real-time tasks, where each task itself is a parallel task that can utilize multiple computing units at the same time. Driven by the increasing demand for parallel tasks, multi-core embedded processors are inevitably evolving to many-core. Existing work on real-time parallel tasks mostly focused on real-time scheduling without addressing energy consumption. In this paper, we address hard real-time scheduling of parallel tasks while minimizing their CPU energy consumption on multicore embedded systems. Each task is represented as a directed acyclic graph (DAG) with nodes indicating different threads of execution and edges indicating their dependencies. Our technique is to determine the execution speeds of the nodes of the DAGs to minimize the overall energy consumption while meeting all task deadlines. It incorporates a frequency optimization engine and the dynamic voltage and frequency scaling (DVFS) scheme into the classical real-time scheduling policies (both federated and global) and makes them energy-aware. The contributions of this paper thus include the first energy-aware online federated scheduling and also the first energy-aware global scheduling of DAGs. Evaluation using synthetic workload through simulation shows that our energy-aware real-time scheduling policies can achieve up to 68% energy-saving compared to classical (energy-unaware) policies. We have also performed a proof of concept system evaluation using physical hardware demonstrating the energy efficiency through our proposed approach. 
    more » « less
  2. Multiprocessor scheduling of hard real-time tasks modeled by directed acyclic graphs (DAGs) exploits the inherent parallelism presented by the model. For DAG tasks, a node represents a request to execute an object on one of the available processors. In one DAG task, there may be multiple execution requests for one object, each represented by a distinct node. These distinct execution requests offer an opportunity to reduce their combined cache overhead through coordinated scheduling of objects as threads within a parallel task. The goal of this work is to realize this opportunity by incorporating the cache-aware BUNDLE-scheduling algorithm into federated scheduling of sporadic DAG task sets.This is the first work to incorporate instruction cache sharing into federated scheduling. The result is a modification of the DAG model named the DAG with objects and threads (DAG-OT). Under the DAG-OT model, descriptions of nodes explicitly include their underlying executable object and number of threads. When possible, nodes assigned the same executable object are collapsed into a single node; joining their threads when BUNDLE-scheduled. Compared to the DAG model, the DAG-OT model with cache-aware scheduling reduces the number of cores allocated to individual tasks by approximately 20 percent in the synthetic evaluation and up to 50 percent on a novel parallel computing platform implementation. By reducing the number of allocated cores, the DAG-OT model is able to schedule a subset of previously infeasible task sets. 
    more » « less
  3. With the technology trend of hardware and workload consolidation for embedded systems and the rapid development of edge computing, there has been increasing interest in supporting parallel real-time tasks to better utilize the multi-core platforms while meeting the stringent real-time constraints. For parallel real-time tasks, the federated scheduling paradigm, which assigns each parallel task a set of dedicated cores, achieves good theoretical bounds by ensuring exclusive use of processing resources to reduce interferences. However, because cores share the last-level cache and memory bandwidth resources, in practice tasks may still interfere with each other despite executing on dedicated cores. Such resource interferences due to concurrent accesses can be even more severe for embedded platforms or edge servers, where the computing power and cache/memory space are limited. To tackle this issue, in this work, we present a holistic resource allocation framework for parallel real-time tasks under federated scheduling. Under our proposed framework, in addition to dedicated cores, each parallel task is also assigned with dedicated cache and memory bandwidth resources. Further, we propose a holistic resource allocation algorithm that well balances the allocation between different resources to achieve good schedulability. Additionally, we provide a full implementation of our framework by extending the federated scheduling system with Intel’s Cache Allocation Technology and MemGuard. Finally, we demonstrate the practicality of our proposed framework via extensive numerical evaluations and empirical experiments using real benchmark programs. 
    more » « less
  4. For a CPU-GPU heterogeneous computing system, different types of processors have load balancing problems in the calculation process. What’s more, multitasking cannot be matched to the appropriate processor core is also an urgent problem to be solved. In this paper, we propose a task scheduling strategy for high-performance CPU-GPU heterogeneous computing platform to solve these problems. For the single task model, a task scheduling strategy based on loadaware for CPU-GPU heterogeneous computing platform is proposed. This strategy detects the computing power of the CPU and GPU to process specified tasks, and allocates computing tasks to the CPU and GPU according to the perception ratio. The tasks are stored in a bidirectional queue to reduce the additional overhead brought by scheduling. For the multi-task model, a task scheduling strategy based on the genetic algorithm for CPU-GPU heterogeneous computing platform is proposed. The strategy aims at improving the overall operating efficiency of the system, and accurately binds the execution relationship between different types of tasks and heterogeneous processing cores. Our experimental results show that the scheduling strategy can improve the efficiency of parallel computing as well as system performance. 
    more » « less
  5. We consider the problem of orchestrating the execution of workflow applications structured as Directed Acyclic Graphs (DAGs) on parallel computing platforms that are subject to fail-stop failures. The objective is to minimize expected overall execution time, or makespan. A solution to this problem consists of a schedule of the workflow tasks on the available processors and of a decision of which application data to checkpoint to stable storage, so as to mitigate the impact of processor failures. For general DAGs this problem is hopelessly intractable. In fact, given a solution, computing its expected makespan is still a difficult problem. To address this challenge, we consider a restricted class of graphs, Minimal Series-Parallel Graphs (M-SPGS). It turns out that many real-world workflow applications are naturally structured as M-SPGS. For this class of graphs, we propose a recursive list-scheduling algorithm that exploits the M-SPG structure to assign sub-graphs to individual processors, and uses dynamic programming to decide which tasks in these sub-gaphs should be checkpointed. Furthermore, it is possible to efficiently compute the expected makespan for the solution produced by this algorithm, using a first-order approximation of task weights and existing evaluation algorithms for 2-state probabilistic DAGs. We assess the performance of our algorithm for production workflow configurations, comparing it to (i) an approach in which all application data is checkpointed, which corresponds to the standard way in which most production workflows are executed today; and (ii) an approach in which no application data is checkpointed. Our results demonstrate that our algorithm strikes a good compromise between these two approaches, leading to lower checkpointing overhead than the former and to better resilience to failure than the latter. 
    more » « less