NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Contiguitas: The Pursuit of Physical Memory Contiguity in Datacenters

https://doi.org/10.1145/3579371.3589079

Zhao, Kaiyang; Xue, Kaiwen; Wang, Ziqi; Schatzberg, Dan; Yang, Leon; Manousis, Antonis; Weiner, Johannes; Van Riel, Rik; Sharma, Bikash; Tang, Chunqiang; et al (June 2023, International Symposium on Computer Architecture)

Full Text Available
TMO: transparent memory offloading in datacenters

https://doi.org/10.1145/3503222.3507731

Weiner, Johannes; Agarwal, Niket; Schatzberg, Dan; Yang, Leon; Wang, Hao; Sanouillet, Blaise; Sharma, Bikash; Heo, Tejun; Jain, Mayank; Tang, Chunqiang; et al (February 2022, Proceedings Article published 28 Feb 2022 in Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems)

Full Text Available
Kube-Knots: Resource Harvesting through Dynamic Container Orchestration in GPU-based Datacenters

https://doi.org/10.1109/cluster.2019.8891040

Thinakaran, Prashanth; Gunasekaran, Jashwant Raj; Sharma, Bikash; Kandemir, Mahmut Taylan; Das, Chita R. (September 2019, IEEE International Conference on Cluster Computing (CLUSTER))

Compute heterogeneity is increasingly gaining prominence in modern datacenters due to the addition of accelerators like GPUs and FPGAs. We observe that datacenter schedulers are agnostic of these emerging accelerators, especially their resource utilization footprints, and thus, not well equipped to dynamically provision them based on the application needs. We observe that the state-of-the-art datacenter schedulers fail to provide fine-grained resource guarantees for latency-sensitive tasks that are GPU-bound. Specifically for GPUs, this results in resource fragmentation and interference leading to poor utilization of allocated GPU resources. Furthermore, GPUs exhibit highly linear energy efficiency with respect to utilization and hence proactive management of these resources is essential to keep the operational costs low while ensuring the end-to-end Quality of Service (QoS) in case of user-facing queries.Towards addressing the GPU orchestration problem, we build Knots, a GPU-aware resource orchestration layer and integrate it with the Kubernetes container orchestrator to build Kube- Knots. Kube-Knots can dynamically harvest spare compute cycles through dynamic container orchestration enabling co-location of latency-critical and batch workloads together while improving the overall resource utilization. We design and evaluate two GPU-based scheduling techniques to schedule datacenter-scale workloads through Kube-Knots on a ten node GPU cluster. Our proposed Correlation Based Prediction (CBP) and Peak Prediction (PP) schemes together improves both average and 99 th percentile cluster-wide GPU utilization by up to 80% in case of HPC workloads. In addition, CBP+PP improves the average job completion times (JCT) of deep learning workloads by up to 36% when compared to state-of-the-art schedulers. This leads to 33% cluster-wide energy savings on an average for three different workloads compared to state-of-the-art GPU-agnostic schedulers. Further, the proposed PP scheduler guarantees the end-to-end QoS for latency-critical queries by reducing QoS violations by up to 53% when compared to state-of-the-art GPU schedulers.
more » « less
Full Text Available
Getting more performance with polymorphism from emerging memory technologies

https://doi.org/10.1145/3319647.3325826

Narayanan, Iyswarya; Ganesan, Aishwarya; Badam, Anirudh; Govindan, Sriram; Sharma, Bikash; Sivasubramaniam, Anand (May 2019, 12th ACM International Conference on Systems and Storage)

Full Text Available
The Curious Case of Container Orchestration and Scheduling in GPU-based Datacenters

https://doi.org/10.1145/3267809.3275466

Thinakaran, Prashanth; Raj, Jashwant; Sharma, Bikash; Kandemir, Mahmut T.; Das, Chita R. (January 2018, ACM Symposium on Cloud Computing)

Full Text Available

Search for: All records