NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Disaggregated GPU Acceleration for Serverless Applications

https://doi.org/10.1145/3606557.3606560

Fingler, Henrique; Zhu, Zhiting; Yoon, Esther; Jia, Zhipeng; Witchel, Emmett; Rossbach, Christopher J. (June 2023, ACM SIGOPS Operating Systems Review)

Serverless platforms have been attracting applications from traditional platforms because infrastructure management responsibilities are shifted from users to providers. Many applications well-suited to serverless environments could leverage GPU acceleration to enhance their performance. Unfortunately, current serverless platforms do not expose GPUs to serverless applications.
more » « less
Full Text Available
Reconfigurable Virtual Memory for FPGA-Driven I/O

https://doi.org/10.1145/3582016.3582048

Landgraf, Joshua; Giordano, Matthew; Yoon, Esther; Rossbach, Christopher J. (March 2023, ASPLOS 2023)

Full Text Available
Towards a Machine Learning-Assisted Kernel with LAKE

https://doi.org/10.1145/3575693.3575697

Fingler, Henrique; Tarte, Isha; Yu, Hangchen; Szekely, Ariel; Hu, Bodun; Akella, Aditya; Rossbach, Christopher J. (January 2023, ASPLOS 2023)

Full Text Available
Parla: a Python orchestration system for heterogeneous architectures

https://doi.org/10.1109/SC41404.2022.00056

Lee, Hochan; Ruys, William; Henriksen, Ian; Peters, Arthur; Yan, Yineng; Stephens, Sean; You, Bozhi; Fingler, Henrique; Burtscher, Martin; Gligoric, Milos; et al (November 2022, SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis)

Python's ease of use and rich collection of numeric libraries make it an excellent choice for rapidly developing scientific applications. However, composing these libraries to take advantage of complex heterogeneous nodes is still difficult. To simplify writing multi-device code, we created Parla, a heterogeneous task-based programming framework that fully supports Python's scientific programming stack. Parla's API is based on Python decorators and allows users to wrap code in Parla tasks for parallel execution. Parla arrays enable automatic movement of data between devices. The Parla runtime handles resource-aware mapping, scheduling, and execution of tasks. Compared to other Python tasking systems, Parla is unique in its parallelization of tasks within a single process, its GPU context and resource-aware runtime, and its design around gradual adoption to provide easy migration of and integration into existing Python applications. We show that Parla can achieve performance competitive with hand-optimized code while improving ease of development.
more » « less
Full Text Available
DGSF: Disaggregated GPUs for Serverless Functions

https://doi.org/10.1109/IPDPS53621.2022.00077

Fingler, Henrique; Zhu, Zhiting; Yoon, Esther; Jia, Zhipeng; Witchel, Emmett (April 2022, IEEE International Parallel and Distributed Processing Symposium)

Ease of use and transparent access to elastic resources have attracted many applications away from traditional platforms toward serverless functions. Many of these applications, such as machine learning, could benefit significantly from GPU acceleration. Unfortunately, GPUs remain inaccessible from serverless functions in modern production settings. We present DGSF, a platform that transparently enables serverless functions to use GPUs through general purpose APIs such as CUDA. DGSF solves provisioning and utilization challenges with disaggregation, serving the needs of a potentially large number of functions through virtual GPUs backed by a small pool of physical GPUs on dedicated servers. Disaggregation allows the provider to decouple GPU provisioning from other resources, and enables significant benefits through consolidation. We describe how DGSF solves GPU disaggregation challenges including supporting API transparency, hiding the latency of communication with remote GPUs, and load-balancing access to heavily shared GPUs. Evaluation of our prototype on six workloads shows that DGSF’s API remoting optimizations can improve the runtime of a function by up to 50% relative to unoptimized DGSF. Such optimizations, which aggressively remove GPU runtime and object management latency from the critical path, can enable functions running over DGSF to have a lower end-to-end time than when running on a GPU natively. By enabling GPU sharing, DGSF can reduce function queueing latency by up to 53%. We use DGSF to augment AWS Lambda with GPU support, showing similar benefits.
more » « less
Full Text Available
Compiler-driven FPGA virtualization with SYNERGY

https://doi.org/10.1145/3445814.3446755

Landgraf, Joshua; Yang, Tiffany; Lin, Will; Rossbach, Christopher J.; Schkufza, Eric (April 2021, Architectural Support for Programming Languages and Operating Systems)
null (Ed.)
Full Text Available
Design, implementation, and application of GPU-based Java bytecode interpreters

https://doi.org/10.1145/3360603

Celik, Ahmet; Nie, Pengyu; Rossbach, Christopher J.; Gligoric, Milos (October 2019, Proceedings of the ACM on Programming Languages)

Full Text Available

Search for: All records