Prompted by the ever-growing demand for high-performance System-on-Chip (SoC) and the plateauing of CPU frequencies, the SoC design landscape is shifting. In a quest to offer programmable specialization, the adoption of tightly-coupled FPGAs co-located with traditional compute clusters has been embraced by major vendors. This CPU+FPGA architectural paradigm opens the door to novel hardware/software co-design opportunities. The key principle is that CPU-originated memory traffic can be re-routed through the FPGA for analysis and management purposes. Albeit promising, the side-effect of this approach is that time-critical operations—such as cache-line refills—are fulfilled by moving data over slower interconnects meant for I/O traffic. In this article, we introduce a novel principle named Cache Coherence Backstabbing to precisely tackle these shortcomings. The technique leverages the ability to include the FGPA in the same coherence domain as the core processing elements. Importantly, this enables Coherence-Aided Elective and Seamless Alternative Routing (CAESAR), i.e., seamless inspection and routing of memory transactions, especially cache-line refills, through the FPGA. CAESAR allows the definition of new memory programming paradigms. We discuss the intrinsic potentials of the approach and evaluate it with a full-stack prototype implementation on a commercial platform. Our experiments show an improvement of up to 29% in read bandwidth, 23% in latency, and 13% in pragmatic workloads over the state of the art. Furthermore, we showcase the first in-coherence-domain runtime profiler design as a use-case of the CAESAR approach.
more »
« less
Software-Shaped Platforms
This paper outlines the vision for a new type of software-shaped platforms, or SOSH platforms for short, that can be implemented in commercial CPU+FPGA platforms. At the core of the SOSH paradigm is the idea of exposing direct control over the flow of data exchanged between hardware components in embedded Systemon-Chips (SoC). Data flow manipulation primitives are synthesized in reprogrammable hardware and interposed between central processors, memory modules, and I/O devices. A new layer of system software is then introduced to leverage such primitives and to achieve fine-grained control and introspection over the interaction of SoC resources. By turning memory and I/O data flows into manageable entities, a new degree of internal awareness can be achieved in complex systems. We first review recent works that are well aligned with the concept of data flow manipulation primitives that can be deployed in SOSH platforms. Next, we outline future research avenues concerning the use of the SOSH paradigm for workload profiling and prediction, to implement advanced memory models, and to perform security threat identification and mitigation.
more »
« less
- Award ID(s):
- 2008799
- PAR ID:
- 10482050
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- Cyber-Physical Systems and Internet of Things Week, 2nd Workshop on Real-time Intelligent Edge Computing (RAGE'23)
- ISBN:
- 9798400700491
- Page Range / eLocation ID:
- 185 to 191
- Format(s):
- Medium: X
- Location:
- San Antonio, TX, USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We present SERBERUS, the first comprehensive mitigation for hardening constant-time (CT) code against Spectre attacks (involving the PHT, BTB, RSB, STL, and/or PSF speculation primitives) on existing hardware. SERBERUS is based on three insights. First, some hardware control-flow integrity (CFI) protections restrict transient control-flow to the extent that it may be comprehensively considered by software analyses. Second, conformance to the accepted CT code discipline permits two code patterns that are unsafe in the post-Spectre era. Third, once these code patterns are addressed, all Spectre leakage of secrets in CT programs can be attributed to one of four classes of taint primitives—instructions that can transiently assign a secret value to a publicly-typed register. We evaluate SERBERUS on cryptographic primitives in the OPENSSL, LIBSODIUM, and HACL* libraries. SERBERUS introduces 21.3% runtime overhead on average, compared to 24.9% for the next closest state-of-the-art software mitigation, which is less secure.more » « less
-
With the proliferation of safety-critical real-time systems in our daily life, it is imperative that their security is protected to guarantee their functionalities. To this end, one of the most powerful modern security primitives is the enforcement of data flow integrity. However, the run-time overhead can be prohibitive for real-time cyber-physical systems. On the other hand, due to strong safety requirements on such real-time cyber-physical systems, platforms are often designed with enough reservation such that the system remains real-time even if it is experiencing the worst-case execution time. We conducted a measurement study on eight popular CPS systems and found the worst-case execution time is often at least five times the average run time. In this paper, we propose opportunistic data flow integrity, OP-DFI, that takes advantage of the system reservation to enforce data flow integrity to the CPS software. To avoid impacting the real-time property, OP-DFI tackles the challenge of slack estimation and run-time policy swapping to take advantage of the extra time in the system opportunistically. To ensure the security protection remains coherent, OP-DFI leverages in-line reference monitors and hardware-assisted features to perform dynamic fine-grained sandboxing. We evaluated OP-DFI on eight real-time CPS. With a worst-case execution time overhead of 2.7%, OP-DFI effectively performs DFI checking on 95.5% of all memory operations and 99.3% of safety-critical control-related memory operations on average.more » « less
-
Quantum computing employs some quantum phenomena to process information. It has been hailed as the future of computing but it is plagued by serious hurdles when it comes to its practical realization. MemComputing is a new paradigm that instead employs non-quantum dynamical systems and exploits time non-locality (memory) to compute. It can be efficiently emulated in software and its path towards hardware is more straightforward. I will discuss some analogies between these two computing paradigms, and the major differences that set them apart.more » « less
-
null (Ed.)Sequential consistency (SC) is the most intuitive memory consistency model and the easiest for programmers and hardware designers to reason about. However, the strict memory ordering restrictions imposed by SC make it less attractive from a performance standpoint. Additionally, prior high-performance SC implementations required complex hardware structures to support speculation and recovery. In this article, we introduce the lockstep SC consistency model (LSC), a new memory model based on SC but carefully defined to accommodate the data parallel lockstep execution paradigm of GPUs. We also describe an efficient LSC implementation for an APU system-on-chip (SoC) and show that our implementation performs close to the baseline relaxed model. Evaluation of our implementation shows that the geometric mean performance cost for lockstep SC is just 0.76% for GPU execution and 6.11% for the entire APU SoC compared to a baseline with a weaker memory consistency model. Adoption of LSC in future APU and SoC designs will reduce the burden on programmers trying to write correct parallel programs, while also simplifying the implementation and verification of systems with heterogeneous processing elements and complex memory hierarchies. 1more » « less