- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources2
- Resource Type
-
0001100000000000
- More
- Availability
-
02
- Author / Contributor
- Filter by Author / Creator
-
-
Ahmed, Adnan Shakeel (1)
-
Beedkar, Kaustubh (1)
-
Hu, Y Charlie (1)
-
Jindal, Abhilash (1)
-
Kong, Z Jonny (1)
-
Xu, Qiang (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
& Abreu-Ramos, E. D. (0)
-
& Abramson, C. I. (0)
-
& Abreu-Ramos, E. D. (0)
-
& Adams, S.G. (0)
-
& Ahmed, K. (0)
-
& Ahmed, Khadija. (0)
-
& Aina, D.K. Jr. (0)
-
& Akcil-Okan, O. (0)
-
& Akuom, D. (0)
-
& Aleven, V. (0)
-
& Andrews-Larson, C. (0)
-
& Archibald, J. (0)
-
- Filter by Editor
-
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
(submitted - in Review for IEEE ICASSP-2024) (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
With the rapid innovation of GPUs, heterogeneous GPU clusters in both public clouds and on-premise data centers have become increasingly commonplace. In this paper, we demonstrate how pipeline parallelism, a technique wellstudied for throughput-oriented deep learning model training, can be used effectively for serving latency-bound model inference, e.g., in video analytics systems, on heterogeneous GPU clusters. Our work exploits the synergy between diversity in model layers and diversity in GPU architectures, which results in comparable inference latency for many layers when running on low-class and high-class GPUs. We explore how such overlooked capability of low-class GPUs can be exploited using pipeline parallelism and present a novel inference serving system, PPipe, that employs pool-based pipeline parallelism via an MILP-based control plane and a data plane that performs resource reservation-based adaptive batching. Evaluation results on diverse workloads (18 CNN models) show that PPipe achieves 41.1%–65.5% higher utilization of low-class GPUs while maintaining high utilization of high-class GPUs, leading to 32.2%–75.1% higher serving throughput compared to various baselines.more » « lessFree, publicly-accessible full text available July 9, 2026
-
Ahmed, Adnan Shakeel; Jindal, Abhilash; Beedkar, Kaustubh (, IEEE)We present POPPER, a dataflow system for building Machine Learning (ML) workflows. A novel aspect of POPPER is its built-in support for in-flight error handling, which is crucial in developing effective ML workflows. POPPER provides a convenient API that allows users to create and execute complex workflows comprising traditional data processing operations (such as map, filter, and join) and user-defined error handlers. The latter enables inflight detection and correction of errors introduced by ML models in the workflows. Inside POPPER, we model the workflow as a reactive dataflow, a directed cyclic graph, to achieve efficient execution through pipeline parallelization. We demonstrate the in-flight error-handling capabilities of POPPER, for which we have built a graphical interface, allowing users to specify workflows, visualize and interact with its reactive dataflow, and delve into the internals of POPPER.more » « lessFree, publicly-accessible full text available May 19, 2026
An official website of the United States government
