This paper proposes a Priority-driven Accelerator Access Management (PAAM) framework for multi-process robotic applications built on top of the Robot Operating System (ROS) 2 middleware platform. The framework addresses the issue of predictable execution of time- and safety-critical callback chains that require hardware accelerators such as GPUs and TPUs. PAAM provides a standalone ROS executor that acts as an accelerator resource server, arbitrating accelerator access requests from all other callbacks at the application layer. This approach enables coordinated and priority-driven accelerator access management in multi-process robotic systems. The framework design is directly applicable to all types of accelerators and enables granular control over how specific chains access accelerators, making it possible to achieve predictable real-time support for accelerators used by safety-critical callback chains without making changes to underlying accelerator device drivers. The paper shows that PAAM also offers a theoretical analysis that can upper bound the worst-case response time of safety-critical callback chains that necessitate accelerator access. This paper also demonstrates that complex robotic systems with extensive accelerator usage that are integrated with PAAM may achieve up to a 91% reduction in end-to-end response time of their critical callback chains.
more »
« less
PiCAS: New Design of Priority-Driven Chain-Aware Scheduling for ROS2
In ROS (Robot Operating System), most applications in time- and safety-critical domain are constructed in the form of callback chains with data dependencies. Due to the shortcomings in its real-time support, ROS does not provide a strong timing guarantee and may lead to disastrous results. Although ROS2 claims to enhance the real-time capability, ensuring predictable end-to-end chain latency still remains a challenging problem. In this paper, we propose a new priority-driven chain-aware scheduler for the ROS2 framework and present end-to-end latency analysis for the proposed scheduler. With our scheduler, callbacks are prioritized based on the given timing requirements of the corresponding chains so that the end-to-end latency of critical chains can be improved with a predictable bound. The proposed scheduling design includes priority assignment and resource allocation considering all ROS2 scheduling-related abstractions, e.g., callbacks, nodes, and executors. To the best of our knowledge, this is the first work to address the inherent limitations of ROS2 in end-to-end latency by proposing a new scheduler design. We have implemented our scheduler in ROS2 running on NVIDIA Xavier NX. We have conducted case studies and schedulability experiments. The results show that the proposed scheduler yields a substantial improvement in end-to-end latency over the default ROS2 scheduler and the latest work in real-world scenarios.
more »
« less
- Award ID(s):
- 1943265
- PAR ID:
- 10276465
- Date Published:
- Journal Name:
- 2021 IEEE 27th Real-Time and Embedded Technology and Applications Symposium (RTAS)
- Page Range / eLocation ID:
- 251 to 263
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract—Lingua Franca is a programming paradigm that eases the development of distributed cyber-physical systems and ensures determinism. These systems are subject to stringent timing constraints, generally expressed as task deadlines, and meeting them requires real-time scheduling. This work presents a layered scheduling strategy for Lingua Franca for enhanced real-time performance that builds upon any priority-based operating system thread scheduler. The application designers need to specify only the application-specific deadlines, and the Lingua Franca runtime automatically converts them into appropriate priority values for the OS scheduler to obtain earliest deadline first scheduling.more » « less
-
Networks in many safety-critical systems like avionics, automotive, and industrial plants have strict end-to-end delay requirements to be met for correct system operation. Existing software-defined real-time networks do not support data plane programmability provided by recent protocol-independent switch architectures such as P4. Our research enables time-aware flow forwarding in P4-enabled software-defined time-critical networks. In this paper, we introduce time-aware flow scheduling for P4-enabled SDN architectures. We study two scheduling policies: the first one prioritizes flows based on slack (i.e., how much time is left to reach the destination), and the second one uses finish time as a priority metric, which is determined from its data rate requirements. Both approaches were implemented and tested in the P4 software stack. We find that the slack-based forwarding scheme performs better in retaining real-time requirements. Our publicly released scheduler implementations will assist network engineers in adapting programmable switches to safety-critical applications that demand precise timing guarantees.more » « less
-
Increasingly popular Robot Operating System (ROS) framework allows building robotic systems by integrating newly developed and/or reused modules, where the modules can use different versions of the framework (e.g., ROS1 or ROS2) and programming language (e.g. C++ or Python). The majority of such robotic systems' work happens in callbacks. The framework provides various elements for initializing callbacks and for setting up the execution of callbacks. It is the responsibility of developers to compose callbacks and their execution setup elements, and hence can lead to inconsistencies related to the setup of callback execution due to developer's incomplete knowledge of the semantics of elements in various versions of the framework. Some of these inconsistencies do not throw errors at runtime, making their detection difficult for developers. We propose a static approach to detecting such inconsistencies by extracting a static view of the composition of robotic system's callbacks and their execution setup, and then checking it against the composition conventions based on the elements' semantics. We evaluate our ROSCallBaX prototype on the dataset created from the posts on developer forums and ROS projects that are publicly available. The evaluation results show that our approach can detect real inconsistencies.more » « less
-
Reducing tail latency has become a crucial issue for optimizing the performance of online cloud services and distributed applications. In distributed applications, there are many causes of high end-to-end tail latency, including operating system delays, request re-ordering due to fan-out/fanin, and network congestion. Although recent research has focused on reducing tail latency for individual application components, such as by replicating requests and scheduling, in this paper, we argue for a holistic approach for reducing the end-to-end tail latency across application components. We propose TailClipper, a distributed scheduler that tags each arriving request with an arrival timestamp, and propagates it across the microservices' call chain. TailClipper then uses arrival timestamps to implement an oldest request first scheduler that combines global first-come first serve with a limited form of processor sharing to reduce end-to-end tail latency. In doing so, TailClipper can counter the performance degradation caused by request reordering in multi-tiered and microservices-based applications. We implement TailClipper as a userspace Linux scheduler and evaluate it using cloud workload traces and a real-world microservices application. Compared to state-of-the-art schedulers, our experiments reveal that TailClipper improves the 99th percentile response time by up to 81%, while also improving the mean response time and the system throughput by up to 54% and 29% respectively under high loads.more » « less
An official website of the United States government

