NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Making AI Less 'Thirsty'

https://doi.org/10.1145/3724499

Li, Pengfei; Yang, Jianyi; Islam, Mohammad A; Ren, Shaolei (July 2025, Communications of the ACM)

Uncovering and addressing the secret water footprint of AI models
more » « less
Free, publicly-accessible full text available July 1, 2026
Holistic Design towards Resource-Stringent Binary Vector Symbolic Architecture

Duan, Shijin; Narkthong, Nuntipat; Luo, Yukui; Ren, Shaolei; Xu, Xiaolin (June 2025, Design Automation Conference (DAC))

Classification tasks on ultra-lightweight devices demand devices that are resource-constrained and deliver swift responses. Binary Vector Symbolic Architecture (VSA) is a promising approach due to its minimal memory requirements and fast execution times compared to traditional machine learning (ML) methods. Nonetheless, binary VSA's practicality is limited by its inferior inference performance and a design that prioritizes algorithmic over hardware optimization. This paper introduces UniVSA, a co-optimized binary VSA framework for both algorithm and hardware. UniVSA not only significantly enhances inference accuracy beyond current state-of-the-art binary VSA models but also reduces memory footprints. It incorporates novel, lightweight modules and design flow tailored for optimal hardware performance. Experimental results show that UniVSA surpasses traditional ML methods in terms of performance on resource-limited devices, achieving smaller memory usage, lower latency, reduced resource demand, and decreased power consumption.
more » « less
Free, publicly-accessible full text available June 22, 2026
Towards Vector Optimization on Low-Dimensional Vector Symbolic Architecture

Duan, Shijin; Liu, Yejia; Liu, Gaowen; Kompella, Ramana Rao; Ren, Shaolei; Xu, Xiaolin (March 2025, The Conference on Parsimony and Learning (CPAL))

Vector Symbolic Architecture (VSA) is emerging in machine learning due to its efficiency, but they are hindered by issues of hyperdimensionality and accuracy. As a promising mitigation, the Low-Dimensional Computing (LDC) method significantly reduces the vector dimension by 100 times while maintaining accuracy, by employing a gradient-based optimization. Despite its potential, LDC optimization for VSA is still underexplored. Our investigation into vector updates underscores the importance of stable, adaptive dynamics in LDC training. We also reveal the overlooked yet critical roles of batch normalization (BN) and knowledge distillation (KD) in standard approaches. Besides the accuracy boost, BN does not add computational overhead during inference, and KD significantly enhances inference confidence. Through extensive experiments and ablation studies across multiple benchmarks, we provide a thorough evaluation of our approach and extend the interpretability of binary neural network optimization similar to LDC, previously unaddressed in BNN literature.
more » « less
Free, publicly-accessible full text available March 6, 2026
Learning-Augmented Decentralized Online Convex Optimization in Networks

https://doi.org/10.1145/3700420

Li, Pengfei; Yang, Jianyi; Wierman, Adam; Ren, Shaolei (December 2024, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

This paper studies learning-augmented decentralized online convex optimization in a networked multi-agent system, a challenging setting that has remained under-explored. We first consider a linear learning-augmented decentralized online algorithm (LADO-Lin) that combines a machine learning (ML) policy with a baseline expert policy in a linear manner. We show that, while LADO-Lin can exploit the potential of ML predictions to improve the average cost performance, it cannot have guaranteed worst-case performance. To address this limitation, we propose a novel online algorithm (LADO) that adaptively combines the ML policy and expert policy to safeguard the ML predictions to achieve strong competitiveness guarantees. We also prove the average cost bound for LADO, revealing the tradeoff between average performance and worst-case robustness and demonstrating the advantage of training the ML policy by explicitly considering the robustness requirement. Finally, we run an experiment on decentralized battery management. Our results highlight the potential of ML augmentation to improve the average performance as well as the guaranteed worst-case performance of LADO.
more » « less
Free, publicly-accessible full text available December 10, 2025
Bileve: Securing Text Provenance in Large Language Models Against Spoofing with Bi-level Signature

Zhou, Tong; Zhao, Xuandong; Xu, Xiaolin; Ren, Shaolei (November 2024, NeurIPS)

Text watermarks for large language models (LLMs) have been commonly used to identify the origins of machine-generated content, which is promising for assessing liability when combating deepfake or harmful content. While existing watermarking techniques typically prioritize robustness against removal attacks, unfortunately, they are vulnerable to spoofing attacks: malicious actors can subtly alter the meanings of LLM-generated responses or even forge harmful content, potentially misattributing blame to the LLM developer. To overcome this, we introduce a bi-level signature scheme, Bileve, which embeds fine-grained signature bits for integrity checks (mitigating spoofing attacks) as well as a coarse-grained signal to trace text sources when the signature is invalid (enhancing detectability) via a novel rank-based sampling strategy. Compared to conventional watermark detectors that only output binary results, Bileve can differentiate 5 scenarios during detection, reliably tracing text provenance and regulating LLMs. The experiments conducted on OPT-1.3B and LLaMA-7B demonstrate the effectiveness of Bileve in defeating spoofing attacks with enhanced detectability.
more » « less
Free, publicly-accessible full text available November 5, 2025
Reconciling the contrasting narratives on the environmental impact of large language models

https://doi.org/10.1038/s41598-024-76682-6

Ren, Shaolei; Tomlinson, Bill; Black, Rebecca_W; Torrance, Andrew_W (November 2024, Scientific Reports)

Abstract The recent proliferation of large language models (LLMs) has led to divergent narratives about their environmental impacts. Some studies highlight the substantial carbon footprint of training and using LLMs, while others argue that LLMs can lead to more sustainable alternatives to current practices. We reconcile these narratives by presenting a comparative assessment of the environmental impact of LLMs vs. human labor, examining their relative efficiency across energy consumption, carbon emissions, water usage, and cost. Our findings reveal that, while LLMs have substantial environmental impacts, their relative impacts can be dramatically lower than human labor in the U.S. for the same output, with human-to-LLM ratios ranging from 40 to 150 for a typical LLM (Llama-3-70B) and from 1200 to 4400 for a lightweight LLM (Gemma-2B-it). While the human-to-LLM ratios are smaller with regard to human labor in India, these ratios are still between 3.4 and 16 for a typical LLM and between 130 and 1100 for a lightweight LLM. Despite the potential benefit of switching from humans to LLMs, economic factors may cause widespread adoption to lead to a new combination of human and LLM-driven work, rather than a simple substitution. Moreover, the growing size of LLMs may substantially increase their energy consumption and lower the human-to-LLM ratios, highlighting the need for further research to ensure the sustainability and efficiency of LLMs.
more » « less
Geographical Server Relocation: Opportunities and Challenges

Liu, Yejia; Li, Pengfei; Wong, Daniel; Ren, Shaolei (July 2024, 2024 HotCarbon Workshop on Sustainable Computer Systems)

The enormous growth of AI computing has led to a surging demand for electricity. To stem the resulting energy cost and environmental impact, this paper explores opportunities enabled by the increasing hardware heterogeneity and introduces the concept of Geographical Server Relocation (GSR). Specifically, GSR physically balances the available AI servers across geographically distributed data centers subject to AI computing demand and power capacity constraints in each location. The key idea of GSR is to relocate older and less energy-efficient servers to regions with more renewables, better water efficiencies and/or lower electricity prices. Our case study demonstrates that, even with modest flexibility of relocation, GSR can substantially reduce the total operational environmental footprints and operation costs of AI computing. We conclude this paper by discussing major challenges of GSR, including service migration, software management, and algorithms.
more » « less
Full Text Available
Building Socially-Equitable Public Models

Liu, Yejia; Yang, Jianyi; Li, Pengfei; Li, Tongxin; Ren, Shaolei (July 2024, 2024 International Conference on Machine Learning (ICML))

Public models offer predictions to a variety of downstream tasks and have played a crucial role in various AI applications, showcasing their proficiency in accurate predictions. However, the exclusive emphasis on prediction accuracy may not align with the diverse end objectives of downstream agents. Recognizing the public model's predictions as a service, we advocate for integrating the objectives of downstream agents into the optimization process. Concretely, to address performance disparities and foster fairness among heterogeneous agents in training, we propose a novel Equitable Objective. This objective, coupled with a policy gradient algorithm, is crafted to train the public model to produce a more equitable/uniform performance distribution across downstream agents, each with their unique concerns. Both theoretical analysis and empirical case studies have proven the effectiveness of our method in advancing performance equity across diverse downstream agents utilizing the public model for their decision-making.
more » « less
Full Text Available
ArchLock: Locking DNN Transferability at the Architecture Level with a Zero-Cost Binary Predictor

Zhou, Tong; Ren, Shaolei; Xu, Xiaolin (May 2024, The Twelfth International Conference on Learning Representations ICLR 2024)

Deep neural network (DNN) models, despite their impressive performance, are vulnerable to exploitation by attackers who attempt to transfer them to other tasks for their own benefit. Current defense strategies mainly address this vulnerability at the model parameter level, leaving the potential of architectural-level defense largely unexplored. This paper, for the first time, addresses the issue of model protection by reducing transferability at the architecture level. Specifically, we present a novel neural architecture search (NAS)-enabled algorithm that employs zero-cost proxies and evolutionary search, to explore model architectures with low transferability. Our method, namely ArchLock, aims to achieve high performance on the source task, while degrading the performance on potential target tasks, i.e., locking the transferability of a DNN model. To achieve efficient cross-task search without accurately knowing the training data owned by the attackers, we utilize zero-cost proxies to speed up architecture evaluation and simulate potential target task embeddings to assist cross-task search with a binary performance predictor. Extensive experiments on NAS-Bench-201 and TransNAS-Bench-101 demonstrate that ArchLock reduces transferability by up to 30% and 50%, respectively, with negligible performance degradation on source tasks (<2%). The code is available at https://github.com/Tongzhou0101/ArchLock.
more » « less
Towards Environmentally Equitable AI via Geographical Load Balancing

https://doi.org/10.1145/3632775.3661938

Li, Pengfei; Yang, Jianyi; Wierman, Adam; Ren, Shaolei (May 2024, ACM International Conference on Future and Sustainable Energy Systems (e-Energy) 2024)

Fueled by the soaring popularity of foundation models, the accelerated growth of artificial intelligence (AI) models’ enormous environmental footprint has come under increased scrutiny. While many approaches have been proposed to make AI more energy-efficient and environmentally friendly, environmental inequity — the fact that AI’s environmental footprint can be disproportionately higher in certain regions than in others — has emerged, raising social-ecological justice concerns. This paper takes a first step toward addressing AI’s environmental inequity by fairly balancing its regional environmental impact. Concretely, we focus on the carbon and water footprints of AI model inference and propose equity-aware geographical load balancing (eGLB) to explicitly minimize AI’s highest environmental cost across all the regions. The consideration of environmental equity creates substantial algorithmic challenges as the optimal GLB decisions require complete offline information that is lacking practice. To address the challenges, we introduce auxiliary variables and optimize GLB decisions online based on dual mirror descent. In addition to analyzing the performance of eGLB theoretically, we run trace-based empirical simulations by considering a set of geographically distributed data centers that serve inference requests for a large language AI model. The results demonstrate that existing GLB approaches may amplify environmental inequity while eGLB can significantly reduce the regional disparity in terms of carbon and water footprints.
more » « less
Full Text Available

« Prev Next »

Search for: All records