skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Reconciling the contrasting narratives on the environmental impact of large language models
Abstract The recent proliferation of large language models (LLMs) has led to divergent narratives about their environmental impacts. Some studies highlight the substantial carbon footprint of training and using LLMs, while others argue that LLMs can lead to more sustainable alternatives to current practices. We reconcile these narratives by presenting a comparative assessment of the environmental impact of LLMs vs. human labor, examining their relative efficiency across energy consumption, carbon emissions, water usage, and cost. Our findings reveal that, while LLMs have substantial environmental impacts, their relative impacts can be dramatically lower than human labor in the U.S. for the same output, with human-to-LLM ratios ranging from 40 to 150 for a typical LLM (Llama-3-70B) and from 1200 to 4400 for a lightweight LLM (Gemma-2B-it). While the human-to-LLM ratios are smaller with regard to human labor in India, these ratios are still between 3.4 and 16 for a typical LLM and between 130 and 1100 for a lightweight LLM. Despite the potential benefit of switching from humans to LLMs, economic factors may cause widespread adoption to lead to a new combination of human and LLM-driven work, rather than a simple substitution. Moreover, the growing size of LLMs may substantially increase their energy consumption and lower the human-to-LLM ratios, highlighting the need for further research to ensure the sustainability and efficiency of LLMs.  more » « less
Award ID(s):
2121572 2324916
PAR ID:
10553329
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Reports
Volume:
14
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Large language models (LLMs) have achieved high accuracy in diverse NLP and computer vision tasks due to self- attention mechanisms relying on GEMM and GEMV operations. However, scaling LLMs poses significant computational and energy challenges, particularly for traditional Von-Neumann architectures (CPUs/GPUs), which incur high latency and energy consumption from frequent data movement. These issues are even more pronounced in energy-constrained edge environments. While DRAM-based near-memory architectures offer improved energy efficiency and throughput, their processing elements are limited by strict area, power, and timing constraints. This work introduces CIDAN-3D, a novel Processing-in-Memory (PIM) architecture tailored for LLMs. It features an ultra-low-power Neuron Processing Element (NPE) with high compute density (#Operations/Area), enabling ecient in-situ execution of LLM operations by leveraging high parallelism within DRAM. CIDAN- 3D reduces data movement, improves locality, and achieves substantial gains in performance and energy efficiency—showing up to 1.3X higher throughput and 21.9X better energy efficiency for smaller models, and 3X throughput and 7X energy improvement for large decoder-only models compared to prior near-memory designs. As a result, CIDAN-3D offers a scalable, energy-efficient platform for LLM-driven Gen-AI applications. 
    more » « less
  2. When reading narratives, human readers rely on their Theory of Mind (ToM) to infer not only what the characters know from their utterances, but also whether characters are likely to share common ground. As in human conversation, such decisions are not infallible but probabilistic, based on the evidence available in the narrative. By responding on a scale (rather than Yes/No), humans can indicate commitment to their inferences about what characters know (ToM). We use two prompting approaches to explore (i) how well LLM judgments align with human judgments, and (ii) how well LLMs infer the author’s intent from utterances intended to project knowledge in narratives. 
    more » « less
  3. Abstract ObjectiveExtracting social determinants of health (SDoHs) from medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. Here, we introduce SDoH-GPT, a novel framework leveraging few-shot learning large language models (LLMs) to automate the extraction of SDoH from unstructured text, aiming to improve both efficiency and generalizability. Materials and MethodsSDoH-GPT is a framework including the few-shot learning LLM methods to extract the SDoH from medical notes and the XGBoost classifiers which continue to classify SDoH using the annotations generated by the few-shot learning LLM methods as training datasets. The unique combination of the few-shot learning LLM methods with XGBoost utilizes the strength of LLMs as great few shot learners and the efficiency of XGBoost when the training dataset is sufficient. Therefore, SDoH-GPT can extract SDoH without relying on extensive medical annotations or costly human intervention. ResultsOur approach achieved tenfold and twentyfold reductions in time and cost, respectively, and superior consistency with human annotators measured by Cohen's kappa of up to 0.92. The innovative combination of LLM and XGBoost can ensure high accuracy and computational efficiency while consistently maintaining 0.90+ AUROC scores. DiscussionThis study has verified SDoH-GPT on three datasets and highlights the potential of leveraging LLM and XGBoost to revolutionize medical note classification, demonstrating its capability to achieve highly accurate classifications with significantly reduced time and cost. ConclusionThe key contribution of this study is the integration of LLM with XGBoost, which enables cost-effective and high quality annotations of SDoH. This research sets the stage for SDoH can be more accessible, scalable, and impactful in driving future healthcare solutions. 
    more » « less
  4. Large language models (LLMs) exhibit a wide range of promising capabilities -- from step-by-step planning to commonsense reasoning -- that may provide utility for robots, but remain prone to confidently hallucinated predictions. In this work, we present KnowNo, which is a framework for measuring and aligning the uncertainty of LLM-based planners such that they know when they don't know and ask for help when needed. KnowNo builds on the theory of conformal prediction to provide statistical guarantees on task completion while minimizing human help in complex multi-step planning settings. Experiments across a variety of simulated and real robot setups that involve tasks with different modes of ambiguity (e.g., from spatial to numeric uncertainties, from human preferences to Winograd schemas) show that KnowNo performs favorably over modern baselines (which may involve ensembles or extensive prompt tuning) in terms of improving efficiency and autonomy, while providing formal assurances. KnowNo can be used with LLMs out of the box without model-finetuning, and suggests a promising lightweight approach to modeling uncertainty that can complement and scale with the growing capabilities of foundation models. 
    more » « less
  5. Recent innovation in large language models (LLMs), and their myriad use cases have rapidly driven up the compute demand for datacenter GPUs. Several cloud providers and other enterprises plan to substantially grow their datacenter capacity to support these new workloads. A key bottleneck resource in datacenters is power, which LLMs are quickly saturating due to their rapidly increasing model sizes.We extensively characterize the power consumption patterns of a variety of LLMs and their configurations. We identify the differences between the training and inference power consumption patterns. Based on our analysis, we claim that the average and peak power utilization in LLM inference clusters should not be very high. Our deductions align with data from production LLM clusters, revealing that inference workloads offer substantial headroom for power oversubscription. However, the stringent set of telemetry and controls that GPUs offer in a virtualized environment make it challenging to build a reliable and robust power management framework.We leverage the insights from our characterization to identify opportunities for better power management. As a detailed use case, we propose a new framework called POLCA, which enables power oversubscription in LLM inference clouds. POLCA is robust, reliable, and readily deployable. Using open-source models to replicate the power patterns observed in production, we simulate POLCA and demonstrate that we can deploy 30% more servers in existing clusters with minimal performance loss. 
    more » « less