NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Theory-Guided Adaptive Scheduling for ROS 2

Enright, Daniel; Sobhani, Hoora; Kim, Hyoseung (November 2025, The 33rd International Conference on Real-Time Networks and Systems (RTNS))

This paper presents Latency Management Executor (LaME), a theory-guided adaptive scheduling framework that enhances real-time performance in ROS 2 through dynamic resource allocation and hybrid priority-driven scheduling. LaME introduces the concept of threadclasses to dynamically adjust system configurations, ensuring response-time guarantees for real-time chains while maintaining starvation freedom for best-effort chains. By implementing adaptive resource allocation and continuous runtime monitoring, LaME provides robust response times even under fluctuating workloads and resource constraints. We implement our framework for the Autoware reference system and perform our evaluation on an Nvidia Jetson platform. Our results demonstrate that LaME successfully adapts to changing resource availability and workload surges, and effectively balances real-time guarantees with overall system throughput.
more » « less
Free, publicly-accessible full text available November 5, 2026
Modeling and Scheduling of Fusion Patterns in Autonomous Driving Systems

Sobhani, Hoora; Kim, Hyoseung (November 2025, The 33rd International Conference on Real-Time Networks and Systems (RTNS))

In Autonomous Driving Systems (ADS), Directed Acyclic Graphs (DAGs) are widely used to model complex data dependencies and inter-task communication. However, existing DAG scheduling approaches oversimplify data fusion tasks by assuming fixed triggering mechanisms, failing to capture the diverse fusion patterns found in real-world ADS software stacks. In this paper, we propose a systematic framework for analyzing various fusion patterns and their performance implications in ADS. Our framework models three distinct fusion task types: timer-triggered, wait-for-all, and immediate fusion, which comprehensively represent real-world fusion behaviors. Our Integer Linear Programming (ILP)-based approach enables an optimization of multiple real-time performance metrics, including reaction time, time disparity, age of information, and response time, while generating deterministic offline schedules directly applicable to real platforms. Evaluation using real-world ADS case studies, Raspberry Pi implementation, and randomly generated DAGs demonstrates that our framework handles diverse fusion patterns beyond the scope of existing work, and achieves substantial performance improvements in comparable scenarios.
more » « less
Free, publicly-accessible full text available November 5, 2026
ECLIP: Energy-efficient and Practical Co-Location of ML Inference on Spatially Partitioned GPUs

Quach, Ryan; Wang, Yidi; Jahanshahi, Ali; Wong, Daniel; Kim, Hyoseung (August 2025, IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED))

As AI inference becomes mainstream, research has begun to focus on improving the energy consumption of inference servers. Inference kernels commonly underutilize a GPU’s compute resources and waste power from idling components. To improve utilization and energy efficiency, multiple models can co-locate and share the GPU. However, typical GPU spatial partitioning techniques often experience significant overheads when reconfiguring spatial partitions, which can waste additional energy through repartitioning overheads or non-optimal partition configurations. In this paper, we present ECLIP, a framework to enable low-overhead energy-efficient kernel-wise resource partitioning between co-located inference kernels. ECLIP minimizes repartitioning overheads by pre-allocating pools of CU masked streams and assigns optimal CU assignments to groups of kernels through our resource allocation optimizer. Overall, ECLIP achieves an average of 13% improvement to throughput and 25% improvement to energy efficiency.
more » « less
Free, publicly-accessible full text available August 6, 2026
BOXR: Body and head motion Optimization framework for eXtended Reality

https://doi.org/10.1109/RTSS62706.2024.00016

Zhang, Ziliang; Li, Zexin; Kim, Hyoseung; Liu, Cong (December 2024, IEEE)

Full Text Available
OpenSense: An Open-World Sensing Framework for Incremental Learning and Dynamic Sensor Scheduling on Embedded Edge Devices

https://doi.org/10.1109/JIOT.2024.3385016

Bukhari, Abdulrahman; Hosseinimotlagh, Seyedmehdi; Kim, Hyoseung (August 2024, IEEE Internet of Things Journal)

Recent advances in Internet of Things (IoT) technologies have sparked significant interest toward developing learning-based sensing applications on embedded edge devices. These efforts, however, are being challenged by the complexities of adapting to unforeseen conditions in an open-world environment, mainly due to the intensive computational and energy demands exceeding the capabilities of edge devices. In this article, we propose OpenSense, an open-world time-series sensing framework for making inferences from time-series sensor data and achieving incremental learning on an embedded edge device with limited resources. The proposed framework is able to achieve two essential tasks, inference and incremental learning, eliminating the necessity for powerful cloud servers. In addition, to secure enough time for incremental learning and reduce energy consumption, we need to schedule sensing activities without missing any events in the environment. Therefore, we propose two dynamic sensor scheduling techniques: 1) a class-level period assignment scheduler that finds an appropriate sensing period for each inferred class and 2) a Q-learning-based scheduler that dynamically determines the sensing interval for each classification moment by learning the patterns of event classes. With this framework, we discuss the design choices made to ensure satisfactory learning performance and efficient resource usage. Experimental results demonstrate the ability of the system to incrementally adapt to unforeseen conditions and to efficiently schedule to run on a resource-constrained device.
more » « less
Full Text Available
Exploring Partitioned and Semi-partitioned Callback Scheduling on ROS 2 Multi-threaded Executors

Sobhani, Hoora; Enright, Daniel; Deshpande, Tejas Milind; Kim, Hyoseung (July 2024, ECRTS)

In recent studies aimed at enhancing the analyzability and real-time performance of ROS 2, there has been insufficient emphasis on the importance of different scheduling options, including global, partitioned, and semi-partitioned approaches, particularly when multiple CPU cores are involved. In this work, we enabled the partitioned and semi-partitioned scheduling for ROS 2 multi-threaded executors and discussed the opportunities and the potential issues associated with it.
more » « less
Full Text Available
PAAM: A Framework for Coordinated and Priority-Driven Accelerator Management in ROS 2

https://doi.org/10.1109/RTAS61025.2024.00015

Enright, Daniel; Xiang, Yecheng; Choi, Hyunjong; Kim, Hyoseung (May 2024, IEEE)

This paper proposes a Priority-driven Accelerator Access Management (PAAM) framework for multi-process robotic applications built on top of the Robot Operating System (ROS) 2 middleware platform. The framework addresses the issue of predictable execution of time- and safety-critical callback chains that require hardware accelerators such as GPUs and TPUs. PAAM provides a standalone ROS executor that acts as an accelerator resource server, arbitrating accelerator access requests from all other callbacks at the application layer. This approach enables coordinated and priority-driven accelerator access management in multi-process robotic systems. The framework design is directly applicable to all types of accelerators and enables granular control over how specific chains access accelerators, making it possible to achieve predictable real-time support for accelerators used by safety-critical callback chains without making changes to underlying accelerator device drivers. The paper shows that PAAM also offers a theoretical analysis that can upper bound the worst-case response time of safety-critical callback chains that necessitate accelerator access. This paper also demonstrates that complex robotic systems with extensive accelerator usage that are integrated with PAAM may achieve up to a 91% reduction in end-to-end response time of their critical callback chains.
more » « less
Full Text Available
GCAPS: GPU Context-Aware Preemptive Priority-Based Scheduling for Real-Time Tasks

https://doi.org/10.4230/LIPIcs.ECRTS.2024.14

Wang, Yidi; Liu, Cong; Wong, Daniel; Kim, Hyoseung (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Pellizzoni, Rodolfo (Ed.)
Scheduling real-time tasks that utilize GPUs with analyzable guarantees poses a significant challenge due to the intricate interaction between CPU and GPU resources, as well as the complex GPU hardware and software stack. While much research has been conducted in the real-time research community, several limitations persist, including the absence or limited availability of GPU-level preemption, extended blocking times, and/or the need for extensive modifications to program code. In this paper, we propose GCAPS, a GPU Context-Aware Preemptive Scheduling approach for real-time GPU tasks. Our approach exerts control over GPU context scheduling at the device driver level and enables preemption of GPU execution based on task priorities by simply adding one-line macros to GPU segment boundaries. In addition, we provide a comprehensive response time analysis of GPU-using tasks for both our proposed approach as well as the default Nvidia GPU driver scheduling that follows a work-conserving round-robin policy. Through empirical evaluations and case studies, we demonstrate the effectiveness of the proposed approaches in improving taskset schedulability and response time. The results highlight significant improvements over prior work as well as the default scheduling approach, with up to 40% higher schedulability, while also achieving predictable worst-case behavior on Nvidia Jetson embedded platforms.
more » « less
Full Text Available
Poster Abstract: Learning-based Sensor Scheduling for Event Classification on Embedded Edge Devices

https://doi.org/10.1145/3576842.3589176

Bukhari, Abdulrahman; Kim, Hyoseung (May 2023, IoTDI '23: Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation)

Full Text Available
Timing Analysis and Priority-driven Enhancements of ROS 2 Multi-threaded Executors

https://doi.org/10.1109/RTAS58335.2023.00016

Sobhani, Hoora; Choi, Hyunjong; Kim, Hyoseung (May 2023, IEEE 29th Real-Time and Embedded Technology and Applications Symposium (RTAS))

Full Text Available

« Prev Next »

Search for: All records