- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources3
- Resource Type
-
0003000000000000
- More
- Availability
-
21
- Author / Contributor
- Filter by Author / Creator
-
-
Arfeen, Daiyaan (3)
-
Jia, Zhihao (3)
-
Miao, Xupeng (2)
-
Abhyankar, Reyna (1)
-
Aggarwal, Neeraj (1)
-
Alizadeh, Mohammad (1)
-
Cao, Shiyi (1)
-
Chen, Tianqi (1)
-
Chen, Zhuoming (1)
-
Cheng, Xinhao (1)
-
Ganger, Gregory R (1)
-
Ganger, Gregory R. (1)
-
Jeon, Byungsoo (1)
-
Kim, Sunghyun (1)
-
Liao, Peiyuan (1)
-
Lin, Shouxu (1)
-
Oliaro, Gabriele (1)
-
Park, Sunghyun (1)
-
Qiao, Aurick (1)
-
Shi, Chunan (1)
-
- Filter by Editor
-
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
(submitted - in Review for IEEE ICASSP-2024) (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available March 30, 2026
-
Miao, Xupeng; Oliaro, Gabriele; Zhang, Zhihao; Cheng, Xinhao; Wang, Zeyu; Zhang, Zhengxin; Wong, Rae_Ying Yee; Zhu, Alan; Yang, Lijie; Shi, Xiaoxiang; et al (, ACM)
-
Subramanya, Suhas Jayaram; Arfeen, Daiyaan; Lin, Shouxu; Qiao, Aurick; Jia, Zhihao; Ganger, Gregory R. (, SOSP)The Sia1 scheduler efficiently assigns heterogeneous deep learning (DL) cluster resources to elastic resource-adaptive jobs. Although some recent schedulers address one aspect or another (e.g., heterogeneity or resource-adaptivity), none addresses all and most scale poorly to large clusters and/or heavy workloads even without the full complexity of the combined scheduling problem. Sia introduces a new scheduling formulation that can scale to the search-space sizes and intentionally match jobs and their configurations to GPU types and counts, while adapting to changes in cluster load and job mix over time. Sia also introduces a low- profiling-overhead approach to bootstrapping (for each new job) throughput models used to evaluate possible resource assignments, and it is the first cluster scheduler to support elastic scaling of hybrid parallel jobs. Extensive evaluations show that Sia outperforms state-of- the-art schedulers. For example, even on relatively small 44- to 64-GPU clusters with a mix of three GPU types, Sia reduces average job completion time ( JCT) by 30–93%, 99th percentile JCT and makespan by 28–95%, and GPU hours used by 12– 55% for workloads derived from 3 real-world environments. Additional experiments demonstrate that Sia scales to at least 2000-GPU clusters, provides improved fairness, and is not over-sensitive to scheduler parameter settings.more » « less
An official website of the United States government

Full Text Available