skip to main content


Search for: All records

Creators/Authors contains: "Caccamo, Marco"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    The ever-increasing demand for high performance in the time-critical, low-power embedded domain drives the adoption of powerful but unpredictable, heterogeneous Systems-on-Chip. On these platforms, the main source of unpredictability—the shared memory subsystem—has been widely studied, and several approaches to mitigate undesired effects have been proposed over the years. Among them, performance-counter-based regulation methods have proved particularly successful. Unfortunately, such regulation methods require precise knowledge of each task’s memory consumption and cannot be extended to isolate mixed-criticality tasks running on the same core as the regulation budget is shared. Moreover, the desirable combination of these methodologies with well-known time-isolation techniques—such as server-based reservations—is still an uncharted territory and lacks a precise characterization of possible benefits and limitations. Recognizing the importance of such consolidation for designing predictable real-time systems, we introduce MCTI (Mixed-Criticality Task-based Isolation) as a first initial step in this direction. MCTI is a hardware/software co-design architecture that aims to improve both CPU and memory isolations among tasks with different criticalities even when they share the same CPU. In order to ascertain the correct behavior and distill the benefits of MCTI, we implemented and tested the proposed prototype architecture on a widely available off-the-shelf platform. The evaluation of our prototype shows that (1) MCTI helps shield critical tasks from concurrent non-critical tasks sharing the same memory budget, with only a limited increase in response time being observed, and (2) critical tasks running under memory stress exhibit an average response time close to that achieved when running without memory stress.

     
    more » « less
  2. Abstract

    In today’s multiprocessor systems-on-a-chip, the shared memory subsystem is a known source of temporal interference. The problem causes logically independent cores to affect each other’s performance, leading to pessimistic worst-case execution time analysis. Memory regulation via throttling is one of the most practical techniques to mitigate interference. Traditional regulation schemes rely on a combination of timer and performance counter interrupts to be delivered and processed on the same cores running real-time workload. Unfortunately, to prevent excessive overhead, regulation can only be enforced at a millisecond-scale granularity. In this work, we present a novel regulation mechanism fromoutside the coresthat monitors performance counters for the application core’s activity in main memory at a microsecond scale. The approach is fully transparent to the applications on the cores, and can be implemented using widely available on-chip debug facilities. The presented mechanism also allows more complex composition of metrics to enact load-aware regulation. For instance, it allows redistributing unused bandwidth between cores while keeping the overall memory bandwidth of all cores below a given threshold. We implement our approach on a host of embedded platforms and conduct an in-depth evaluation on the Xilinx Zynq UltraScale+ ZCU102, NXP i.MX8M and NXP S32G2 platforms using the San Diego Vision Benchmark Suite.

     
    more » « less
  3. Deep reinforcement learning (DRL) has demonstrated impressive success in solving complex control tasks by synthesizing control policies from data. However, the safety and stability of applying DRL to safety-critical systems remain a primary concern and challenging problem. To address the problem, we propose the Phy-DRL: a novel physics-model regulated deep reinforcement learning framework. The Phy-DRL is novel in two architectural designs: a physics-model-regulated reward and residual control, which integrates physics-model-based control and data-driven control. The concurrent designs enable the Phy-DRL to mathematically provable safety and stability guarantees. Finally, the effectiveness of the Phy-DRL is validated by an inverted pendulum system. Additionally, the experimental results demonstrate that the Phy-DRL features remarkably accelerated training and enlarged reward. 
    more » « less
  4. Nowadays, AI-based techniques, such as deep neural networks (DNNs), are widely deployed in autonomous systems for complex mission requirements (e.g., motion planning in robotics). However, DNNs-based controllers are typically very complex, and it is very hard to formally verify their correctness, potentially causing severe risks for safety-critical autonomous systems. In this paper, we propose a construction scheme for a so-called Safe-visor architecture to sandbox DNNs-based controllers. Particularly, we consider the construction under a stochastic game framework to provide a system-level safety guarantee which is robust to noises and disturbances. A supervisor is built to check the control inputs provided by a DNNs-based controller and decide whether to accept them. Meanwhile, a safety advisor is running in parallel to provide fallback control inputs in case the DNN-based controller is rejected. We demonstrate the proposed approaches on a quadrotor employing an unverified DNNs-based controller.

     
    more » « less
  5. In today’s multiprocessor systems-on-a-chip (MPSoC), the shared memory subsystem is a known source of temporal interference. The problem causes logically independent cores to affect each other’s performance, leading to pessimistic worst-case execution time (WCET) analysis. One of the most practical techniques to mitigate interference is memory regulation via throttling. Traditional regulation schemes rely on a combination of timer and performance counter interrupts to be delivered and processed on the same cores running real-time workload. Unfortunately, to prevent excessive overhead, regulation can only be enforced at a millisecond-scale granularity. In this work, we present a novel regulation mechanism from outside the cores that monitors performance counters for the application core’s activity in main memory at a microsecond scale. The approach is fully transparent to the applications on the cores, and can be implemented using widely available on-chip debug facilities. The presented mechanism also allows a more complex composition of metrics to enact load-aware regulation. For instance, it allows redistributing unused bandwidth between cores while keeping the overall memory bandwidth of all cores below a given threshold. We implement our approach on a host of embedded platforms and carry out an in-depth evaluation on the Xilinx Zynq UltraScale+ ZCU102 platform using the SD-VBS. 
    more » « less
  6. Abstract

    Advances in deep learning have revolutionized cyber‐physical applications, including the development of autonomous vehicles. However, real‐world collisions involving autonomous control of vehicles have raised significant safety concerns regarding the use of deep neural networks (DNNs) in safety‐critical tasks, particularly perception. The inherent unverifiability of DNNs poses a key challenge in ensuring their safe and reliable operation. In this work, we propose perception simplex ( ), a fault‐tolerant application architecture designed for obstacle detection and collision avoidance. We analyse an existing LiDAR‐based classical obstacle detection algorithm to establish strict bounds on its capabilities and limitations. Such analysis and verification have not been possible for deep learning‐based perception systems yet. By employing verifiable obstacle detection algorithms, identifies obstacle existence detection faults in the output of unverifiable DNN‐based object detectors. When faults with potential collision risks are detected, appropriate corrective actions are initiated. Through extensive analysis and software‐in‐the‐loop simulations, we demonstrate that provides deterministic fault tolerance against obstacle existence detection faults, establishing a robust safety guarantee.

     
    more » « less
  7. Newly emerging multiprocessor system-on-a-chip (MPSoC) platforms provide hard processing cores with programmable logic (PL) for high-performance computing applications. In this article, we take a deep look into these commercially available heterogeneous platforms and show how to design mixed-criticality applications such that different processing components can be isolated to avoid contention on the shared resources such as last-level cache and main memory. Our approach involves software/hardware co-design to achieve isolation between the different criticality domains. At the hardware level, we use a scratchpad memory (SPM) with dedicated interfaces inside the PL to avoid conflicts in the main memory. At the software level, we employ a hypervisor to support cache-coloring such that conflicts at the shared L2 cache can be avoided. In order to move the tasks in/out of the SPM memory, we rely on a DMA engine and propose a new CPU-DMA co-scheduling policy, called Lazy Load, for which we also derive the response time analysis. The results of a case study on image processing demonstrate that the contention on the shared memory subsystem can be avoided when running with our proposed architecture. Moreover, comprehensive schedulability evaluations show that the newly proposed Lazy Load policy outperforms the existing CPU-DMA scheduling approaches and is effective in mitigating the main memory interference in our proposed architecture. 
    more » « less
  8. We are witnessing a race to meet the ever-growing computation requirements of emerging AI applications to provide perception and control in autonomous vehicles — e.g., self-driving cars and UAVs. To remain competitive, vendors are packing more processing units (CPUs, programmable logic, GPUs, and hardware accelerators) into next-generation multiprocessor systems-on-a-chip (MPSoC). As a result, modern embedded platforms are achieving new heights in peak computational capacity. Unfortunately, however, the collateral and inevitable increase in complexity represents a major obstacle for the development of correct-by-design safety-critical real-time applications. Due to the ever-growing gap between fast-paced hardware evolution and comparatively slower evolution of real-time operating systems (RTOS), there is a need for real-time oriented full-platform management frameworks to complement traditional RTOS designs. In this work, we propose one such framework, namely the X-Stream framework, for the definition, synthesis, and analysis of real-time workloads targeting state-of-the-art accelerator-augmented embedded platforms. Our X-Stream framework is designed around two cardinal principles. First, computation and data movements are orchestrated to achieve predictability by design. For this purpose, iterative computation over large data chunks is divided into subsequent segments. These segments are then streamed leveraging the three-phase execution model (load, execute and unload). Second, the framework is workflow-centric: system designers can specify their workflow and the necessary code for workflow orchestration is automatically generated. In addition to automating the deployment of user-defined hardware-accelerated workloads, X-Stream supports the deployment of some computation segments on traditional CPUs. Finally, X-Stream allows the definition of real-time partitions. Each partition groups applications belonging to the same criticality level and that share the same set of hardware resources, with support for preemptive priority-driven scheduling. Conversely, freedom from interference for applications deployed in different partitions is guaranteed by design. We provide a full-system implementation that includes RTOS integration and showcase the proposed X-Stream framework on a Xilinx Ultrascale+ platform by focusing on a matrix-multiplication and addition kernel use-case. 
    more » « less
  9. Timing correctness is crucial in a multi-criticality real-time system, such as an autonomous driving system. It has been recently shown that these systems can be vulnerable to timing inference attacks, mainly due to their predictable behavioral patterns. Existing solutions like schedule randomization cannot protect against such attacks, often limited by the system’s real-time nature. This article presents “ SchedGuard++ ”: a temporal protection framework for Linux-based real-time systems that protects against posterior schedule-based attacks by preventing untrusted tasks from executing during specific time intervals. SchedGuard++ supports multi-core platforms and is implemented using Linux containers and a customized Linux kernel real-time scheduler. We provide schedulability analysis assuming the Logical Execution Time (LET) paradigm, which enforces I/O predictability. The proposed response time analysis takes into account the interference from trusted and untrusted tasks and the impact of the protection mechanism. We demonstrate the effectiveness of our system using a realistic radio-controlled rover platform. Not only is “ SchedGuard++ ” able to protect against the posterior schedule-based attacks, but it also ensures that the real-time tasks/containers meet their temporal requirements. 
    more » « less