NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Elastic Execution of Multi-Tenant DNNs on Heterogeneous Edge MPSoCs

https://doi.org/10.1109/SEC62691.2024.00029

Heidari, Soroush; Ghasemi, Mehdi; Kim, Young Geun; Wu, Carole-Jean; Vrudhula, Sarma (December 2024, IEEE)

Free, publicly-accessible full text available December 4, 2025
GreenScale: Carbon Optimization for Edge Computing

https://doi.org/10.1109/JIOT.2025.3555153

Son, Yonglak; Gupta, Udit; McCrabb, Andrew; Kim, Young Geun; Bertacco, Valeria; Brooks, David; Wu, Carole-Jean (January 2025, IEEE Internet of Things Journal)

Free, publicly-accessible full text available January 1, 2026
Evolving to Find Optimizations Humans Miss: Using Evolutionary Computation to Improve GPU Code for Bioinformatics Applications

https://doi.org/10.1145/3703920

Liou, Jhe-Yu; Awan, Muaaz; Leyba, Kirtus; Šulc, Petr; Hofmeyr, Steven; Wu, Carole-Jean; Forrest, Stephanie (December 2024, ACM Transactions on Evolutionary Learning and Optimization)

GPUs are used in many settings to accelerate large-scale scientific computation, including simulation, computational biology, and molecular dynamics. However, optimizing codes to run efficiently on GPUs requires developers to have both detailed understanding of the application logic and significant knowledge of parallel programming and GPU architectures. This paper shows that an automated GPU program optimization tool, GEVO, can leverage evolutionary computation to find code edits that reduce the runtime of three important applications, multiple sequence alignment, agent-based simulation and molecular dynamics codes, by 28.9%, 29%, and 17.8% respectively. The paper presents an in-depth analysis of the discovered optimizations, revealing that (1) several of the most important optimizations involve significant epistasis, (2) the primary sources of improvement are application-specific, and (3) many of the optimizations generalize across GPU architectures. In general, the discovered optimizations are not straightforward even for a GPU human expert, showcasing the potential of automated program optimization tools to both reduce the optimization burden for human domain experts and provide new insights for GPU experts.
more » « less
Free, publicly-accessible full text available December 31, 2025
Toward Efficient Inference for Mixture of Experts

Huang, Haiyang; Ardalani, Newsha; Sun, Anna; Ke, Liu; Lee, Hsien-Hsin S; Bhosale, Shruti; Wu, Carole-Jean; Lee, Benjamin (December 2024, Proceedings Neural Information Processing Systems (NeurIPS))

Free, publicly-accessible full text available December 1, 2025
FedGPO: Heterogeneity-Aware Global Parameter optimization for Efficient Federated Learning

https://doi.org/10.1109/IISWC55918.2022.00020

Kim, Young Geun; Wu, Carole-Jean (November 2022, 2022 IEEE International Symposium on Workload Characterization (IISWC))

Full Text Available
CAMDNN: Content-Aware Mapping of a Network of Deep Neural Networks on Edge MPSoCs

https://doi.org/10.1109/TC.2022.3207137

Heidari, Soroush; Ghasemi, Mehdi; Kim, Young Geun; Wu, Carole-Jean; Vrudhula, Sarma (December 2022, IEEE Transactions on Computers)

Full Text Available
Understanding the Power of Evolutionary Computation for GPU Code Optimization

https://doi.org/10.1109/IISWC55918.2022.00025

Liou, Jhe-Yu; Awan, Muaaz; Hofmeyr, Steven; Forrest, Stephanie; Wu, Carole-Jean (November 2022, IEEE International Symposium on Workload Characterization (IISWC))

Full Text Available
EdgeWise: Energy-Efficient CNN Computation on Edge Devices under Stochastic Communication Delays

https://doi.org/10.1145/3530908

Ghasemi, Mehdi; Rakhmatov, Daler; Wu, Carole-Jean; Vrudhula, Sarma (April 2022, ACM Transactions on Embedded Computing Systems)

This paper presents a framework to enable the energy-efficient execution of convolutional neural networks (CNNs) on edge devices. The framework consists of a pair of edge devices connected via a wireless network: a performance and energy-constrained device D as the first recipient of data, and an energy-unconstrained device N as an accelerator for D. Device D decides on-the-fly how to distribute the workload with the objective of minimizing its energy consumption while accounting for the inherent uncertainty in network delay and the overheads involved in data transfer. These challenges are tackled by adopting the data-driven modeling framework of Markov Decision Processes (MDP), whereby an optimal policy is consulted by D in O(1) time to make layer-by-layer assignment decisions. As a special case, a linear-time dynamic programming algorithm is also presented for finding optimal layer assignment at once, under the assumption that the network delay is constant throughout the execution of the application. The proposed framework is demonstrated on a platform comprised of a Raspberry PI 3 as D and an NVIDIA Jetson TX2 as N. An average improvement of 31% and 23% in energy consumption is achieved compared to the alternatives of executing the CNNs entirely on D and N. Two state-of-the-art methods were also implemented, and compared with the proposed methods.
more » « less
Full Text Available
AutoFL: Enabling Heterogeneity-Aware Energy Efficient Federated Learning

https://doi.org/10.1145/3466752.3480129

Kim, Young Geun; Wu, Carole-Jean (October 2021, MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture)

Full Text Available
Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation

Ke, Liu; Gupta, Udit; Hempstead, Mark; Wu, Carole-Jean; Lee, Hsien-Hsin S.; Zhang, Xuan (April 2022, 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA))

Full Text Available

« Prev Next »

Search for: All records