NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

X-Stream: Accelerating streaming segments on MPSoCs for real-time applications

https://doi.org/10.1016/j.sysarc.2023.102857

Tabish, Rohan; Pellizzoni, Rodolfo; Mancuso, Renato; Gracioli, Giovani; Mirosanlou, Reza; Caccamo, Marco (May 2023, Journal of Systems Architecture)

We are witnessing a race to meet the ever-growing computation requirements of emerging AI applications to provide perception and control in autonomous vehicles — e.g., self-driving cars and UAVs. To remain competitive, vendors are packing more processing units (CPUs, programmable logic, GPUs, and hardware accelerators) into next-generation multiprocessor systems-on-a-chip (MPSoC). As a result, modern embedded platforms are achieving new heights in peak computational capacity. Unfortunately, however, the collateral and inevitable increase in complexity represents a major obstacle for the development of correct-by-design safety-critical real-time applications. Due to the ever-growing gap between fast-paced hardware evolution and comparatively slower evolution of real-time operating systems (RTOS), there is a need for real-time oriented full-platform management frameworks to complement traditional RTOS designs. In this work, we propose one such framework, namely the X-Stream framework, for the definition, synthesis, and analysis of real-time workloads targeting state-of-the-art accelerator-augmented embedded platforms. Our X-Stream framework is designed around two cardinal principles. First, computation and data movements are orchestrated to achieve predictability by design. For this purpose, iterative computation over large data chunks is divided into subsequent segments. These segments are then streamed leveraging the three-phase execution model (load, execute and unload). Second, the framework is workflow-centric: system designers can specify their workflow and the necessary code for workflow orchestration is automatically generated. In addition to automating the deployment of user-defined hardware-accelerated workloads, X-Stream supports the deployment of some computation segments on traditional CPUs. Finally, X-Stream allows the definition of real-time partitions. Each partition groups applications belonging to the same criticality level and that share the same set of hardware resources, with support for preemptive priority-driven scheduling. Conversely, freedom from interference for applications deployed in different partitions is guaranteed by design. We provide a full-system implementation that includes RTOS integration and showcase the proposed X-Stream framework on a Xilinx Ultrascale+ platform by focusing on a matrix-multiplication and addition kernel use-case.
more » « less
Full Text Available
Lazy Load Scheduling for Mixed-criticality Applications in Heterogeneous MPSoCs

https://doi.org/10.1145/3587694

Kloda, Tomasz; Gracioli, Giovani; Tabish, Rohan; Mirosanlou, Reza; Mancuso, Renato; Pellizzoni, Rodolfo; Caccamo, Marco (May 2023, ACM Transactions on Embedded Computing Systems)

Newly emerging multiprocessor system-on-a-chip (MPSoC) platforms provide hard processing cores with programmable logic (PL) for high-performance computing applications. In this article, we take a deep look into these commercially available heterogeneous platforms and show how to design mixed-criticality applications such that different processing components can be isolated to avoid contention on the shared resources such as last-level cache and main memory. Our approach involves software/hardware co-design to achieve isolation between the different criticality domains. At the hardware level, we use a scratchpad memory (SPM) with dedicated interfaces inside the PL to avoid conflicts in the main memory. At the software level, we employ a hypervisor to support cache-coloring such that conflicts at the shared L2 cache can be avoided. In order to move the tasks in/out of the SPM memory, we rely on a DMA engine and propose a new CPU-DMA co-scheduling policy, called Lazy Load, for which we also derive the response time analysis. The results of a case study on image processing demonstrate that the contention on the shared memory subsystem can be avoided when running with our proposed architecture. Moreover, comprehensive schedulability evaluations show that the newly proposed Lazy Load policy outperforms the existing CPU-DMA scheduling approaches and is effective in mitigating the main memory interference in our proposed architecture.
more » « less
Full Text Available
Designing Mixed Criticality Applications on Modern Heterogeneous MPSoC Platforms

https://doi.org/10.4230/LIPIcs.ECRTS.2019.27

Gracioli, Giovani; Tabish, Rohan; Mancuso, Renato; Mirosanlou, Reza; Pellizzoni, Rodolfo; Caccamo, Marco (July 2019, Euromicro Conference on Real-Time Systems (ECRTS 2019))

Full Text Available
CHIPS-AHOy: a predictable holistic cyber-physical hypervisor for MPSoCs

https://doi.org/10.1145/3229631.3229642

Mück, Tiago; Fröhlich, Antonio A.; Gracioli, Giovani; Rahmani, Amir M.; Reis, João Gabriel; Dutt, Nikil (July 2018, Proceedings of SAMOS XVIII: International Conference on Embedded Computer Systems: Architectures, MOdeling, and Simulation, 2018)

Full Text Available

Search for: All records