NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Online VM Service Selection with Spot Cores for Dynamic Workloads

https://doi.org/10.1109/Cloud-Summit61220.2024.00016

Alfares, Nader; Kesidis, G; Urgaonkar, B; Baarzi, Ata Fatahi; Jain, Aman (June 2024, IEEE)

Full Text Available
QUIDAM: A Framework for Qu ant i zation-Aware D NN A ccelerator and M odel Co-Exploration

https://doi.org/10.1145/3555807

Inci, Ahmet; Virupaksha, Siri Garudanagiri; Jain, Aman; Chin, Ting-Wu; Thallam, Venkata Vivek; Ding, Ruizhou; Marculescu, Diana (September 2022, ACM Transactions on Embedded Computing Systems)

As the machine learning and systems communities strive to achieve higher energy-efficiency through custom deep neural network (DNN) accelerators, varied precision or quantization levels, and model compression techniques, there is a need for design space exploration frameworks that incorporate quantization-aware processing elements into the accelerator design space while having accurate and fast power, performance, and area models. In this work, we present QUIDAM , a highly parameterized quantization-aware DNN accelerator and model co-exploration framework. Our framework can facilitate future research on design space exploration of DNN accelerators for various design choices such as bit precision, processing element type, scratchpad sizes of processing elements, global buffer size, number of total processing elements, and DNN configurations. Our results show that different bit precisions and processing element types lead to significant differences in terms of performance per area and energy. Specifically, our framework identifies a wide range of design points where performance per area and energy varies more than 5 × and 35 ×, respectively. With the proposed framework, we show that lightweight processing elements achieve on par accuracy results and up to 5.7 × more performance per area and energy improvement when compared to the best INT16 based implementation. Finally, due to the efficiency of the pre-characterized power, performance, and area models, QUIDAM can speed up the design exploration process by 3-4 orders of magnitude as it removes the need for expensive synthesis and characterization of each design.
more » « less
Full Text Available
SplitServe: Efficiently Splitting Apache Spark Jobs Across FaaS and IaaS

https://doi.org/10.1145/3423211.3425695

Jain, Aman; Baarzi, Ata F.; Kesidis, George; Urgaonkar, Bhuvan; Alfares, Nader; Kandemir, Mahmut (December 2020, Middleware'20)
null (Ed.)
Full Text Available
Heterogeneous MacroTasking (HeMT) for Parallel Processing in the Cloud

https://doi.org/10.1145/3429885.3429962

Shan, Yuquan; Kesidis, George; Jain, Aman; Urgaonkar, Bhurvan; Khamse-Ashari, Jalal; Lambadaris, Ioannis (December 2020, Workshop on Containers)
null (Ed.)
Full Text Available
Scheduling Distributed Resources in Heterogeneous Private Clouds

https://doi.org/10.1109/MASCOTS.2018.00018

Kesidis, George; Shan, Yuquan; Jain, Aman; Urgaonkar, Bhuvan; Khamse-Ashari, Jalal; Lambadaris, Ioannis (September 2018, IEEE MASCOTS)

We first consider the static problem of allocating resources to (i.e., scheduling) multiple distributed application frameworks, possibly with different priorities and server preferences, in a private cloud with heterogeneous servers. Several fair scheduling mechanisms have been proposed for this purpose. We extend prior results on max-min fair (MMF) and proportional fair (PF) scheduling to this constrained multiresource and multiserver case for generic fair scheduling criteria. The task efficiencies (a metric related to proportional fairness) of max- min fair allocations found by progressive filling are compared by illustrative examples. In the second part of this paper, we consider the online problem (with framework churn) by implementing variants of these schedulers in Apache Mesos using progressive filling to dynamically approximate max-min fair allocations. We evaluate the implemented schedulers in terms of overall execution time of realistic distributed Spark workloads. Our experiments show that resource efficiency is improved and execution times are reduced when the scheduler is “server specific” or when it leverages characterized required resources of the workloads (when known).
more » « less
Full Text Available

Search for: All records