NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Compiler-Driven FPGA Virtualization with SYNERGY

https://doi.org/10.1145/3613903

Landgraf, Joshua; Yang, Tiffany; Lin, Will; Rossbach, Christopher J; Schkufza, Eric (August 2024, Communications of the ACM)

FPGAs are increasingly common in modern applications, and cloud providers now support on-demand FPGA acceleration in datacenters. Applications in datacenters run on virtual infrastructure, where consolidation, multi-tenancy, and workload migration enable economies of scale that are fundamental to the provider's business. However, a general strategy for virtualizing FPGAs has yet to emerge. While manufacturers struggle with hardware-based approaches, we propose a compiler/runtime-based solution called Synergy. We show a compiler transformation for Verilog programs that produces code able to yield control to software atsub-clock-tickgranularity according to the semantics of the original program. Synergy uses this property to efficiently support core virtualization primitives: suspend and resume, program migration, and spatial/temporal multiplexing, on hardware which is availabletoday.We use Synergy to virtualize FPGA workloads across a cluster of Intel SoCs and Xilinx FPGAs on Amazon F1. The workloads require no modification, run within 3--4x of unvirtualized performance, and incur a modest increase in FPGA fabric usage.
more » « less
Full Text Available
Reconfigurable Virtual Memory for FPGA-Driven I/O

https://doi.org/10.1145/3582016.3582048

Landgraf, Joshua; Giordano, Matthew; Yoon, Esther; Rossbach, Christopher J. (March 2023, ASPLOS 2023)

Full Text Available
Towards a Machine Learning-Assisted Kernel with LAKE

https://doi.org/10.1145/3575693.3575697

Fingler, Henrique; Tarte, Isha; Yu, Hangchen; Szekely, Ariel; Hu, Bodun; Akella, Aditya; Rossbach, Christopher J. (January 2023, ASPLOS 2023)

Full Text Available
AvA: Accelerated Virtualization of Accelerators

Yu, Hangchen and (January 2020, Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems)

Applications are migrating en masse to the cloud, while ac- celerators such as GPUs, TPUs, and FPGAs proliferate in the wake of Moore’s Law. These trends are in conflict: cloud ap- plicationsrunonvirtualplatforms,butexistingvirtualization techniques have not provided production-ready solutions for accelerators. As a result, cloud providers expose accel- erators by dedicating physical devices to individual guests. Multi-tenancy and consolidation are lost as a consequence. We present AvA, which addresses limitations of existing virtualization techniques with automated construction of hypervisor-managed virtual accelerator stacks. AvA com- bines a DSL for describing APIs and sharing policies, device- agnostic runtime components, and a compiler to generate accelerator-specific components such as guest libraries and API servers. AvA uses Hypervisor Interposed Remote Acceleration (HIRA),anewtechniquetoenablehypervisor- enforcement of sharing policies from the specification. We use AvA to virtualize nine accelerators and eleven framework APIs, including six for which no virtualization support has been previously explored. AvA provides near- native performance and can enforce sharing policies that are not possible with current techniques, with orders of magnitude less developer effort than required for hand-built virtualization support.
more » « less
Full Text Available
Just-In-Time Compilation for Verilog: A New Technique for Improving the FPGA Programming Experience

https://doi.org/10.1145/3297858.3304010

Schkufza, Eric; Wei, Michael; Rossbach, Christopher J. (January 2019, Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems)

FPGAs offer compelling acceleration opportunities for modern applications. However compilation for FPGAs is painfully slow, potentially requiring hours or longer. We approach this problem with a solution from the software domain: the use of a JIT. Code is executed immediately in a software simulator, and compilation is performed in the background. When finished, the code is moved into hardware, and from the user's perspective it simply gets faster. We have embodied these ideas in Cascade: the first JIT compiler for Verilog. Cascade reduces the time between initiating compilation and running code to less than a second, and enables generic printf debugging from hardware. Cascade preserves program performance to within 3× in a debugging environment, and has minimal effect on a finalized design. Crucially, these properties hold even for programs that perform side effects on connected IO devices. A user study demonstrates the value to experts and non-experts alike: Cascade encourages more frequent compilation, and reduces the time to produce working hardware designs.
more » « less
Full Text Available
Automatic Virtualization of Accelerators

https://doi.org/https://dl.acm.org/doi/10.1145/3317550.3321423

Yu, Hangchen and (January 2019, Proceedings of the Workshop on Hot Topics in Operating Systems)

Applications are migrating en masse to the cloud, while accelerators such as GPUs, TPUs, and FPGAs proliferate in the wake of Moore's Law. These technological trends are incompatible. Cloud applications run on virtual platforms, but traditional I/O virtualization techniques have not provided production-ready solutions for accelerators. As a result, cloud providers expose accelerators by using pass-through techniques which dedicate physical devices to individual guests. The multi-tenancy that drives their business is lost as a consequence. This paper proposes automatic generation of virtual accelerator stacks to address the fundamental tradeoffs between virtualization properties and techniques for accelerators. AvA (Automatic Virtualization of Accelerators) re-purposes a para-virtual I/O stack design based on API remoting to present virtual accelerator APIs to guest VMs. Conventional wisdom is that API remoting sacrifices interposition and compatibility. AvA forwards invocations over hypervisor-managed transport to recover interposition. AvA compensates for lost compatibility by automatically generating guest libraries, drivers, hypervisor-level schedulers, and API servers. AvA supports pluggable transport layers, allowing VMs to use disaggregated accelerators. With AvA, a single developer could virtualize a core subset of OpenCL at near-native performance in just a few days.
more » « less
Full Text Available
Sharing, Protection, and Compatibility for Reconfigurable Fabric with Amorphos

Khawaja, Ahmed and (January 2018, Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation)

Cloud providers such as Amazon and Microsoft have begun to support on-demand FPGA acceleration in the cloud, and hardware vendors will support FPGAs in future processors. At the same time, technology advancements such as 3D stacking, through-silicon vias (TSVs), and FinFETs have greatly increased FPGA density. The massive parallelism of current FPGAs can support not only extremely large applications, but multiple applications simultaneously as well. System support for FPGAs, however, is in its infancy. Unlike software, where resource configurations are limited to simple dimensions of compute, memory, and I/O, FPGAs provide a multi-dimensional sea of resources known as the FPGA fabric: logic cells, floating point units, memories, and I/O can all be wired together, leading to spatial constraints on FPGA resources. Current stacks either support only a single application or statically partition the FPGA fabric into fixed-size slots. These designs cannot efficiently support diverse workloads: the size of the largest slot places an artificial limit on application size, and oversized slots result in wasted FPGA resources and reduced concurrency. This paper presents AMORPHOS, which encapsulates user FPGA logic in morphable tasks, or Morphlets. Morphlets provide isolation and protection across mutually distrustful protection domains, extending the guarantees of software processes. Morphlets can morph, dynamically altering their deployed form based on resource requirements and availability. To build Morphlets, developers provide a parameterized hardware design that interfaces with AMORPHOS, along with a mesh, which specifies external resource requirements. AMORPHOS explores the parameter space, generating deployable Morphlets of varying size and resource requirements. AMORPHOS multiplexes Morphlets on the FPGA in both space and time to maximize FPGA utilization. We implement AMORPHOS on Amazon F1 [1] and Microsoft Catapult [92]. We show that protected sharing and dynamic scalability support on workloads such as DNN inference and blockchain mining improves aggregate throughput up to 4× and 23× on Catapult and F1 respectively.
more » « less
Full Text Available

Search for: All records