Prompted by the ever-growing demand for high-performance System-on-Chip (SoC) and the plateauing of CPU frequencies, the SoC design landscape is shifting. In a quest to offer programmable specialization, the adoption of tightly-coupled FPGAs co-located with traditional compute clusters has been embraced by major vendors. This CPU+FPGA architectural paradigm opens the door to novel hardware/software co-design opportunities. The key principle is that CPU-originated memory traffic can be re-routed through the FPGA for analysis and management purposes. Albeit promising, the side-effect of this approach is that time-critical operations—such as cache-line refills—are fulfilled by moving data over slower interconnects meant for I/O traffic. In this article, we introduce a novel principle named Cache Coherence Backstabbing to precisely tackle these shortcomings. The technique leverages the ability to include the FGPA in the same coherence domain as the core processing elements. Importantly, this enables Coherence-Aided Elective and Seamless Alternative Routing (CAESAR), i.e., seamless inspection and routing of memory transactions, especially cache-line refills, through the FPGA. CAESAR allows the definition of new memory programming paradigms. We discuss the intrinsic potentials of the approach and evaluate it with a full-stack prototype implementation on a commercial platform. Our experiments show an improvement of up to 29% in read bandwidth, 23% in latency, and 13% in pragmatic workloads over the state of the art. Furthermore, we showcase the first in-coherence-domain runtime profiler design as a use-case of the CAESAR approach.
more »
« less
Software-Shaped Platforms
This paper outlines the vision for a new type of software-shaped platforms, or SOSH platforms for short, that can be implemented in commercial CPU+FPGA platforms. At the core of the SOSH paradigm is the idea of exposing direct control over the flow of data exchanged between hardware components in embedded Systemon-Chips (SoC). Data flow manipulation primitives are synthesized in reprogrammable hardware and interposed between central processors, memory modules, and I/O devices. A new layer of system software is then introduced to leverage such primitives and to achieve fine-grained control and introspection over the interaction of SoC resources. By turning memory and I/O data flows into manageable entities, a new degree of internal awareness can be achieved in complex systems. We first review recent works that are well aligned with the concept of data flow manipulation primitives that can be deployed in SOSH platforms. Next, we outline future research avenues concerning the use of the SOSH paradigm for workload profiling and prediction, to implement advanced memory models, and to perform security threat identification and mitigation.
more »
« less
- Award ID(s):
- 2008799
- PAR ID:
- 10482050
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- Cyber-Physical Systems and Internet of Things Week, 2nd Workshop on Real-time Intelligent Edge Computing (RAGE'23)
- ISBN:
- 9798400700491
- Page Range / eLocation ID:
- 185 to 191
- Format(s):
- Medium: X
- Location:
- San Antonio, TX, USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We present SERBERUS, the first comprehensive mitigation for hardening constant-time (CT) code against Spectre attacks (involving the PHT, BTB, RSB, STL, and/or PSF speculation primitives) on existing hardware. SERBERUS is based on three insights. First, some hardware control-flow integrity (CFI) protections restrict transient control-flow to the extent that it may be comprehensively considered by software analyses. Second, conformance to the accepted CT code discipline permits two code patterns that are unsafe in the post-Spectre era. Third, once these code patterns are addressed, all Spectre leakage of secrets in CT programs can be attributed to one of four classes of taint primitives—instructions that can transiently assign a secret value to a publicly-typed register. We evaluate SERBERUS on cryptographic primitives in the OPENSSL, LIBSODIUM, and HACL* libraries. SERBERUS introduces 21.3% runtime overhead on average, compared to 24.9% for the next closest state-of-the-art software mitigation, which is less secure.more » « less
-
With the proliferation of safety-critical real-time systems in our daily life, it is imperative that their security is protected to guarantee their functionalities. To this end, one of the most powerful modern security primitives is the enforcement of data flow integrity. However, the run-time overhead can be prohibitive for real-time cyber-physical systems. On the other hand, due to strong safety requirements on such real-time cyber-physical systems, platforms are often designed with enough reservation such that the system remains real-time even if it is experiencing the worst-case execution time. We conducted a measurement study on eight popular CPS systems and found the worst-case execution time is often at least five times the average run time. In this paper, we propose opportunistic data flow integrity, OP-DFI, that takes advantage of the system reservation to enforce data flow integrity to the CPS software. To avoid impacting the real-time property, OP-DFI tackles the challenge of slack estimation and run-time policy swapping to take advantage of the extra time in the system opportunistically. To ensure the security protection remains coherent, OP-DFI leverages in-line reference monitors and hardware-assisted features to perform dynamic fine-grained sandboxing. We evaluated OP-DFI on eight real-time CPS. With a worst-case execution time overhead of 2.7%, OP-DFI effectively performs DFI checking on 95.5% of all memory operations and 99.3% of safety-critical control-related memory operations on average.more » « less
-
Quantum computing employs some quantum phenomena to process information. It has been hailed as the future of computing but it is plagued by serious hurdles when it comes to its practical realization. MemComputing is a new paradigm that instead employs non-quantum dynamical systems and exploits time non-locality (memory) to compute. It can be efficiently emulated in software and its path towards hardware is more straightforward. I will discuss some analogies between these two computing paradigms, and the major differences that set them apart.more » « less
-
The prevalence of scientific workflows with high computational demands calls for their execution on various distributed computing platforms, including large-scale leadership-class high-performance computing (HPC) clusters. To handle the deployment, monitoring, and optimization of workflow executions, many workflow systems have been developed over the past decade. There is a need for workflow benchmarks that can be used to evaluate the performance of workflow systems on current and future software stacks and hardware platforms. We present a generator of realistic workflow benchmark specifications that can be translated into benchmark code to be executed with current workflow systems. Our approach generates workflow tasks with arbitrary performance characteristics (CPU, memory, and I/O usage) and with realistic task dependency structures based on those seen in production workflows. We present experimental results that show that our approach generates benchmarks that are representative of production workflows, and conduct a case study to demonstrate the use and usefulness of our generated benchmarks to evaluate the performance of workflow systems under different configuration scenarios.more » « less
An official website of the United States government

