Scheduler side-channels can leak critical information in real-time systems, thus posing serious threats to many safety-critical applications. The main culprit is the inherent determinism in the runtime timing behavior of such systems, e.g., the (expected) periodic behavior of critical tasks. In this paper, we introduce the notion of "schedule indistinguishability/", inspired by work in differential privacy, that introduces diversity into the schedules of such systems while offering analyzable security guarantees. We achieve this by adding a sufficiently large (controlled) noise to the task schedules in order to break their deterministic execution patterns. An "epsilon-Scheduler" then implements schedule indistinguishability in real-time Linux. We evaluate our system using two real applications: (a) an autonomous rover running on a real hardware platform (Raspberry Pi) and (b) a video streaming application that sends data across large geographic distances. Our results show that the epsilon-Scheduler offers better protection against scheduler side-channel attacks in real-time systems while still maintaining good performance and quality-of-service(QoS) requirements.
more »
« less
A Fast Work-Efficient SSSP Algorithm for GPUs
This paper presents a new Single Source Shortest Path (SSSP) algorithm for GPUs. Our key advancement is an improved work scheduler, which is central to the performance of SSSP algorithms. Previous GPU solutions for SSSP use simple work schedulers that can be implemented efficiently on GPUs but that produce low quality schedules. Such solutions yield poor work efficiency and can underutilize the hardware due to a lack of parallelism. Our solution introduces a more sophisticated work scheduler---based on a novel highly parallel approximate priority queue---that produces high quality schedules while being efficiently implementable on GPUs. To evaluate our solution, we use 226 graph inputs from the Lonestar 4.0 benchmark suite and the SuiteSparse Matrix Collection, and we find that our solution outperforms the previous state-of-the-art solution by an average of 2.9×, showing that an efficient work scheduling mechanism can be implemented on GPUs without sacrificing schedule quality. While this paper focuses on the SSSP problem, it has broader implications for the use of GPUs, illustrating that seemingly ill-suited data structures, such as priority queues, can be efficiently implemented for GPUs if we use the proper software structure.
more »
« less
- Award ID(s):
- 1823546
- PAR ID:
- 10334425
- Date Published:
- Journal Name:
- Annual Symposium on Principles and Practice of Parallel Programming
- Page Range / eLocation ID:
- 133 to 146
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)In ROS (Robot Operating System), most applications in time- and safety-critical domain are constructed in the form of callback chains with data dependencies. Due to the shortcomings in its real-time support, ROS does not provide a strong timing guarantee and may lead to disastrous results. Although ROS2 claims to enhance the real-time capability, ensuring predictable end-to-end chain latency still remains a challenging problem. In this paper, we propose a new priority-driven chain-aware scheduler for the ROS2 framework and present end-to-end latency analysis for the proposed scheduler. With our scheduler, callbacks are prioritized based on the given timing requirements of the corresponding chains so that the end-to-end latency of critical chains can be improved with a predictable bound. The proposed scheduling design includes priority assignment and resource allocation considering all ROS2 scheduling-related abstractions, e.g., callbacks, nodes, and executors. To the best of our knowledge, this is the first work to address the inherent limitations of ROS2 in end-to-end latency by proposing a new scheduler design. We have implemented our scheduler in ROS2 running on NVIDIA Xavier NX. We have conducted case studies and schedulability experiments. The results show that the proposed scheduler yields a substantial improvement in end-to-end latency over the default ROS2 scheduler and the latest work in real-world scenarios.more » « less
-
SparseAuto: An Auto-scheduler for Sparse Tensor Computations using Recursive Loop Nest RestructuringAutomated code generation and performance enhancements for sparse tensor algebra have become essential in many real-world applications, such as quantum computing, physical simulations, computational chemistry, and machine learning. General sparse tensor algebra compilers are not always versatile enough to generate asymptotically optimal code for sparse tensor contractions. This paper shows how to generate asymptotically better schedules for complex sparse tensor expressions using kernel fission and fusion. We present generalized loop restructuring transformations to reduce asymptotic time complexity and memory footprint. Furthermore, we present an auto-scheduler that uses a partially ordered set (poset)-based cost model that uses both time and auxiliary memory complexities to prune the search space of schedules. In addition, we highlight the use of Satisfiability Module Theory (SMT) solvers in sparse auto-schedulers to approximate the Pareto frontier of better schedules to the smallest number of possible schedules, with user-defined constraints available at compile-time. Finally, we show that our auto-scheduler can select better-performing schedules and generate code for them. Our results show that the auto-scheduler provided schedules achieve orders-of-magnitude speedup compared to the code generated by the Tensor Algebra Compiler (TACO) for several computations on different real-world tensors.more » « less
-
If a parallel program has determinacy race(s), different schedules can result in memory accesses that observe different values --- various race-detection tools have been designed to find such bugs. A key component of race detectors is an algorithm for series-parallel (SP) maintenance, which identifies whether two accesses are logically parallel. This paper describes an asymptotically optimal algorithm, called WSP-Order, for performing SP maintenance in programs with fork-join (or nested) parallelism. Given a fork-join program with T1 work and T∞ span, WSP-Order executes it while also maintaining SP relationships in O(T1/P + T∞) time on P processors, which is asymptotically optimal. At the heart of WSP-Order is a work-stealing scheduler designed specifically for SP maintenance. We also implemented C-RACER, a race-detector based on WSP-Order within the Cilk Plus runtime system, and evaluated its performance on five benchmarks. Empirical results demonstrate that when run sequentially, it performs almost as well as previous best sequential race detectors. More importantly, when run in parallel, it achieves almost as much speedup as the original program without race-detection.more » « less
-
IoT devices used in various applications, such as monitoring agricultural soil moisture, or urban air quality assessment, are typically battery-operated and energy-constrained. We develop a lightweight and distributed cooperative sensing scheme that provides energy-efficient sensing of an area by reducing spatio-temporal overlaps in the coverage using a multi-sensor IoT network. Our “Sensing Together” solution includes two algorithms: Distributed Task Adaptation (DTA) and Distributed Block Scheduler (DBS), which coordinate the sensing operations of the IoT network through information shared using a distributed “token passing” protocol. DTA adapts the sensing rates from their “raw” values (optimized for each IoT device independently) to minimize spatial redundancy in coverage, while ensuring that a desired coverage threshold is met at all points in the covered area. DBS then schedules task execution times across all IoT devices in a distributed manner to minimize temporal overlap. On-device evaluation shows a small token size and execution times of less than 0.6s on average while simulations show average energy savings of 5% per IoT device under various weather conditions. Moreover, when devices had more significant coverage overlaps, energy savings exceeded 30% thanks to cooperative sensing. In simulations of larger networks, energy savings range on average between 3.34% and 38.53%, depending on weather conditions. Our solutions consistently demonstrate near-optimal performance under various scenarios, showcasing their capability to efficiently reduce temporal overlap during sensing task scheduling.more » « less
An official website of the United States government

