Generative AI, exemplified in ChatGPT, Dall-E 2, and Stable Diffusion, are exciting new applications consuming growing quantities of computing. We study the compute, energy, and carbon impacts of generative AI inference. Using ChatGPT as an exemplar, we create a workload model and compare request direction approaches (Local, Balance, CarbonMin), assessing their power use and carbon impacts. Our workload model shows that for ChatGPT-like services, in- ference dominates emissions, in one year producing 25x the carbon-emissions of training GPT-3. The workload model characterizes user experience, and experiments show that carbon emissions-aware algorithms (CarbonMin) can both maintain user experience and reduce carbon emissions dramatically (35%). We also consider a future scenario (2035 workload and power grids), and show that CarbonMin can reduce emissions by 56%. In both cases, the key is intelligent direction of requests to locations with low-carbon power. Combined with hardware technology advances, CarbonMin can keep emissions increase to only 20% compared to 2022 levels for 55x greater workload. Finally we consider datacenter headroom to increase effectiveness of shifting. With headroom, CarbonMin reduces 2035 emissions by 71%.
more »
« less
This content will become publicly available on June 30, 2026
Slower is Greener: Acceptance of Eco-feedback Interventions on Carbon Heavy Internet Services
The carbon emissions of modern information and communication technologies (ICT) present a significant environmental challenge, accounting for approximately 4% of global greenhouse gases, and are on par with the aviation industry. Modern internet services levy high carbon emissions due to the significant infrastructure resources required to operate them, owing to strict service requirements expected by users. One opportunity to reduce emissions is relaxing strict service requirements by leveraging eco-feedback. In this study, we explore the effect of the carbon reduction impact of allowing longer internet service response time based on user preferences and feedback. Across four services (i.e., Amazon, Google, ChatGPT, Social Media) our study reveals opportunities to relax latency requirements of services based on user feedback; this feedback is application-specific, with ChatGPT having the most favorable eco-feedback tradeoff. Further system studies suggest leveraging the reduced latency can bring down the carbon footprint of an average service request by 93.1%.
more »
« less
- Award ID(s):
- 2324860
- PAR ID:
- 10585287
- Publisher / Repository:
- ACM Journal on Computing and Sustainable Societies
- Date Published:
- Journal Name:
- ACM Journal on Computing and Sustainable Societies
- Volume:
- 3
- Issue:
- 2
- ISSN:
- 2834-5533
- Page Range / eLocation ID:
- 1 to 21
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
While 5G offers fast access networks and a high-performance data plane, the control plane in 5G core (5GC) still presents challenges due to inefficiencies in handling control plane operations (including session establishment, handovers and idle-to-active state-transitions) of 5G User Equipment (UE). The Service-based Interface (SBI) used for communication between 5G control plane functions introduces substantial overheads that impact latency. Typical 5GCs are supported in the cloud on containers, to support the disaggregated Control and User Plane Separation (CUPS) framework of 3GPP. L25GC is a state-of-the-art 5G control plane design utilizing shared memory processing to reduce the control plane latency. However, L25GC has limitations in supporting multiple user sessions and has programming language incompatibilities with 5GC implementations, e.g., free5GC, using modern languages such as GoLang. To address these challenges, we develop L25GC+, a significant enhancement to L25GC. L25GC+ re-designs the shared-memory-based networking stack to support synchronous I/O between control plane functions. L25GC+ distinguishes different user sessions and maintains strict 3GPP compliance. L25GC+ also offers seamless integration with existing 5GC microservice implementations through equivalent SBI APIs, reducing code refactoring and porting efforts. By leveraging shared memory I/O and overcoming L25GC’s limitations, L25GC+ provides an improved solution to optimize the 5G control plane, enhancing latency, scalability, and overall user experience. We demonstrate the improved performance of L25GC+ on a 5G testbed with commercial basestations and multiple UEs.more » « less
-
null (Ed.)Edge data centers are an appealing place for telecommunication providers to offer in-network processing such as VPN services, security monitoring, and 5G. Placing these network services closer to users can reduce latency and core network bandwidth, but the deployment of network functions at the edge poses several important challenges. Edge data centers have limited resource capacity, yet network functions are re-source intensive with strict performance requirements. Replicating services at the edge is needed to meet demand, but balancing the load across multiple servers can be challenging due to diverse service costs, server and flow heterogeneity, and dynamic workload conditions. In this paper, we design and implement a model-based load balancer EdgeBalance for edge network data planes. EdgeBalance predicts the CPU demand of incoming traffic and adaptively distributes flows to servers to keep them evenly balanced. We overcome several challenges specific to network processing at the edge to improve throughput and latency over static load balancing and monitoring-based approaches.more » « less
-
The proliferation of innovative mobile services such as augmented reality, networked gaming, and autonomous driving has spurred a growing need for low-latency access to computing resources that cannot be met solely by existing centralized cloud systems. Mobile Edge Computing (MEC) is expected to be an effective solution to meet the demand for low-latency services by enabling the execution of computing tasks at the network-periphery, in proximity to end-users. While a number of recent studies have addressed the problem of determining the execution of service tasks and the routing of user requests to corresponding edge servers, the focus has primarily been on the efficient utilization of computing resources, neglecting the fact that non-trivial amounts of data need to be stored to enable service execution, and that many emerging services exhibit asymmetric bandwidth requirements. To fill this gap, we study the joint optimization of service placement and request routing in MEC-enabled multi-cell networks with multidimensional (storage-computation-communication) constraints. We show that this problem generalizes several problems in literature and propose an algorithm that achieves close-to-optimal performance using randomized rounding. Evaluation results demonstrate that our approach can effectively utilize the available resources to maximize the number of requests served by low-latency edge cloud servers.more » « less
-
To scale the Internet of Things (IoT) beyond a single home or enterprise, we need an effective mechanism to manage the growth of data, facilitate resource discovery and name resolution, encourage data sharing, and foster cross-domain services. To address these needs, we propose a GlObaL Directory for Internet of Everything (GOLDIE). GOLDIE is a hierarchical location-based IoT directory architecture featuring diverse user-oriented modules and federated identity management. IoT-specific features include discoverability, aggregation and geospatial queries, and support for global access. We implement and evaluate the prototype on a Raspberry Pi and Intel mini servers. We show that a global implementation of GOLDIE could decrease service access latency by 87% compared to a centralized-server solution.more » « less