NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Fixed-point Encoding and Architecture Exploration for Residue Number Systems

https://doi.org/10.1145/3664923

Deng, Bobin; Nadendla, Bhargava; Suo, Kun; Xie, Yixin; Lo, Dan Chia-Tien (May 2024, ACM Transactions on Architecture and Code Optimization)

Residue Number Systems (RNS) demonstrate the fascinating potential to serve integer addition/multiplication-intensive applications. The complexity of Artificial Intelligence (AI) models has grown enormously in recent years. From a computer system’s perspective, ensuring the training of these large-scale AI models within an adequate time and energy consumption has become a big concern. Matrix multiplication is a dominant subroutine in many prevailing AI models, with an addition/multiplication-intensive attribute. However, the data type of matrix multiplication within machine learning training typically requires real numbers, which indicates that RNS benefits for integer applications cannot be directly gained by AI training. The state-of-the-art RNS real number encodings, including floating-point and fixed-point, have defects and can be further enhanced. To transform default RNS benefits to the efficiency of large-scale AI training, we propose a low-cost and high-accuracy RNS fixed-point representation: Single RNS Logical Partition (S-RNS-Logic-P) representation with Scaling Down Postprocessing Multiplication (SD-Post-Mul). Moreover, we extend the implementation details of the other two RNS fixed-point methods: Double RNS Concatenation (D-RNS-Concat) and Single RNS Logical Partition (S-RNS-Logic-P) representation with Scaling Down Preprocessing Multiplication (SD-Pre-Mul). We also design the architectures of these three fixed-point multipliers. In empirical experiments, our S-RNS-Logic-P representation with SD-Post-Mul method achieves less latency and energy overhead while maintaining good accuracy. Furthermore, this method can easily extend to the Redundant Residue Number System (RRNS) to raise the efficiency of error-tolerant domains, such as improving the error correction efficiency of quantum computing.
more » « less
Full Text Available
An Empirical Analysis and Resource Footprint Study of Deploying Large Language Models on Edge Devices

https://doi.org/10.1145/3603287.3651205

Dhar, Nobel; Deng, Bobin; Lo, Dan; Wu, Xiaofeng; Zhao, Liang; Suo, Kun (April 2024, ACM)

The success of ChatGPT is reshaping the landscape of the entire IT industry. The large language model (LLM) powering ChatGPT is experiencing rapid development, marked by enhanced features, improved accuracy, and reduced latency. Due to the execution overhead of LLMs, prevailing commercial LLM products typically manage user queries on remote servers. However, the escalating volume of user queries and the growing complexity of LLMs have led to servers becoming bottlenecks, compromising the quality of service (QoS). To address this challenge, a potential solution is to shift LLM inference services to edge devices, a strategy currently being explored by industry leaders such as Apple, Google, Qualcomm, Samsung, and others. Beyond alleviating the computational strain on servers and enhancing system scalability, deploying LLMs at the edge offers additional advantages. These include real-time responses even in the absence of network connectivity and improved privacy protection for customized or personal LLMs. This article delves into the challenges and potential bottlenecks currently hindering the effective deployment of LLMs on edge devices. Through deploying the LLaMa-2 7B model with INT4 quantization on diverse edge devices and systematically analyzing experimental results, we identify insufficient memory and/or computing resources on traditional edge devices as the primary obstacles. Based on our observation and empirical analysis, we further provide insights and design guidance for the next generation of edge devices and systems from both hardware and software directions
more » « less
Full Text Available
Building a Resilient and Sustainable Grid: A Study of Challenges and Opportunities in AI for Smart Virtual Power Plants

https://doi.org/10.1145/3603287.3651202

Islam, Md Romyull; Vu, Long; Dhar, Nobel; Deng, Bobin; Suo, Kun (April 2024, ACM)

In recent years, integrating distributed energy resources has emerged as a pervasive trend in competitive energy markets. The idea of virtual power plants (VPPs) has gained traction among researchers and startups, offering a solution to address diverse social, economic, and environmental requirements. A VPP comprises interconnected distributed energy resources collaborating to optimize operations and participate in energy markets. However, existing VPPs confront numerous challenges, including the unpredictability of renewable energy sources, the intricacies and fluctuations of energy markets, and concerns related to insecure communication and data transmission. This article comprehensively reviews the concept, historical development, evolution, and components of VPPs. It delves into the various issues and challenges encountered by current VPPs. Furthermore, the article explores the potential of artificial intelligence (AI) in mitigating these challenges, investigating how AI can enhance the performance, efficiency, and sustainability of future smart VPPs.
more » « less
Full Text Available
Living on the Electric Vehicle and Cloud Era: A Study of Cyber Vulnerabilities, Potential Impacts, and Possible Strategies

https://doi.org/10.1145/3603287.3651209

Vu, Long; Suo, Kun; Islam, Md Romyull; Dhar, Nobel; Nguyen, Tu N; He, Selena; Shi, Yong (April 2024, ACM)

In recent years, electric vehicles (EVs) have emerged as a sustainable alternative to conventional automobiles. Distinguished by their environmental friendliness, superior performance, reduced noise, and low maintenance requirements, EVs offer numerous advantages over traditional vehicles. The integration of electric vehicles with cloud computing has heralded a transformative shift in the automotive industry. However, as EVs become increasingly interconnected with the internet, various devices, and infrastructure, they become susceptible to cyberattacks. These attacks pose a significant risk to the safety, privacy, and functionality of both the vehicles and the broader transportation infrastructure. In this paper, we delve into the topic of electric vehicles and their connectivity to the cloud. We scrutinize the potential attack vectors that EVs are vulnerable to and the consequential impact on vehicle operations. Moreover, we outline both general and specific strategies aimed at thwarting these cyberattacks. Additionally, we anticipate future developments aimed at enhancing EV performance and reducing security risks.
more » « less
Full Text Available
A Systematic Investigation of Hardware and Software in Electric Vehicular Platform

https://doi.org/10.1145/3603287.3651203

Suo, Kun; Vu, Long; Islam, Md Romyull; Dhar, Nobel; Nguyen, Tu N; He, Selena; Wu, Xiaofeng (April 2024, ACM)

In recent years, computing has been moving rapidly from the centralized cloud to various edges. For instance, electric vehicles (EVs), one of the next-generation computing platforms, have grown in popularity as a sustainable alternative to conventional vehicles. Compared with traditional ones, EVs have many unique advantages, such as less environmental pollution, high energy utilization efficiency, simple structure, and convenient maintenance etc. Meanwhile, it is also currently facing lots of challenges, including short cruising range, long charging time, inadequate supporting facilities, cyber security risks, etc. Nevertheless, electric vehicles are still developing as a future industry, and the number of users keeps growing, with governments and companies around the world continuously investing in promoting EV-related supply chains. As an emerging and important computing platform, we comprehensively study electric vehicular systems and state-of-the-art EV-related technologies. Specifically, this paper outlines electric vehicles’ history, major architecture and components in hardware and software, current state-of-the-art technologies, and anticipated future developments to reduce drawbacks and difficulties.
more » « less
Full Text Available
Robust Efficient License Plate and Character Detection System Based on Simplified CNN

https://doi.org/10.1145/3564746.3587108

He, Selena; Nguyen, Tu; Suo, Kun (April 2023, ACMSE 2023: Proceedings of the 2023 ACM Southeast Conference)

Full Text Available
Keep Clear of the Edges : An Empirical Study of Artificial Intelligence Workload Performance and Resource Footprint on Edge Devices

https://doi.org/10.1109/IPCCC55026.2022.9894338

Suo, Kun; Nguyen, Tu N.; Shi, Yong; He, Jing Selena; Hung, Chih-Cheng (November 2022, 2022 IEEE International Performance, Computing, and Communications Conference (IPCCC))

Recently, with the advent of the Internet of everything and 5G network, the amount of data generated by various edge scenarios such as autonomous vehicles, smart industry, 4K/8K, virtual reality (VR), augmented reality (AR), etc., has greatly exploded. All these trends significantly brought real-time, hardware dependence, low power consumption, and security requirements to the facilities, and rapidly popularized edge computing. Meanwhile, artificial intelligence (AI) workloads also changed the computing paradigm from cloud services to mobile applications dramatically. Different from wide deployment and sufficient study of AI in the cloud or mobile platforms, AI workload performance and their resource impact on edges have not been well understood yet. There lacks an in-depth analysis and comparison of their advantages, limitations, performance, and resource consumptions in an edge environment. In this paper, we perform a comprehensive study of representative AI workloads on edge platforms. We first conduct a summary of modern edge hardware and popular AI workloads. Then we quantitatively evaluate three categories (i.e., classification, image-to-image, and segmentation) of the most popular and widely used AI applications in realistic edge environments based on Raspberry Pi, Nvidia TX2, etc. We find that interaction between hardware and neural network models incurs non-negligible impact and overhead on AI workloads at edges. Our experiments show that performance variation and difference in resource footprint limit availability of certain types of workloads and their algorithms for edge platforms, and users need to select appropriate workload, model, and algorithm based on requirements and characteristics of edge environments.
more » « less
Full Text Available
Energy Efficiency on Edge Computing: Challenges and Vision

https://doi.org/10.1109/IPCCC55026.2022.9894303

Holmes, Tyler; McLarty, Charlie; Shi, Yong; Bobbie, Patrick; Suo, Kun (November 2022, IEEE International Performance, Computing, and Communications Conference (IPCCC))

Full Text Available
5G network slicing and drone-assisted applications: a deep reinforcement learning approach

https://doi.org/10.1145/3555661.3560873

Le, Linh; Nguyen, Tu N.; Suo, Kun; He, Jing (October 2022, ACM MobiCom Conference: Proceedings of the 5th International ACM Mobicom Workshop on Drone Assisted Wireless Communications for 5G and Beyond)

Full Text Available
Tackling Cold Start of Serverless Applications by Efficient and Adaptive Container Runtime Reusing

https://doi.org/10.1109/Cluster48925.2021.00018

Suo, Kun; Son, Junggab; Cheng, Dazhao; Chen, Wei; Baidya, Sabur (September 2021, Tackling Cold Start of Serverless Applications by Efficient and Adaptive Container Runtime Reusing)

During the past few years, serverless computing has changed the paradigm of application development and deployment in the cloud and edge due to its unique advantages, including easy administration, automatic scaling, built-in fault tolerance, etc. Nevertheless, serverless computing is also facing challenges such as long latency due to the cold start. In this paper, we present an in-depth performance analysis of cold start in the serverless framework and propose HotC, a container-based runtime management framework that leverages the lightweight containers to mitigate the cold start and improve the network performance of serverless applications. HotC maintains a live container runtime pool, analyzes the user input or configuration file, and provides available runtime for immediate reuse. To precisely predict the request and efficiently manage the hot containers, we design an adaptive live container control algorithm combining the exponential smoothing model and Markov chain method. Our evaluation results show that HotC introduces negligible overhead and can efficiently improve the performance of various applications with different network traffic patterns in both cloud servers and edge devices.
more » « less
Full Text Available

Search for: All records