skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Designing Cloud Servers for Lower Carbon
To mitigate climate change, we must reduce carbon emissions from hyperscale cloud computing. We find that cloud compute servers cause the majority of emissions in a general-purpose cloud. Thus, we motivate designing carbon-efficient compute server SKUs, or GreenSKUs, using recently-available low-carbon server components. To this end, we design and build three GreenSKUs using low-carbon components, such as energy-efficient CPUs, reused old DRAM via CXL, and reused old SSDs. We detail several challenges that limit GreenSKUs, carbon savings at scale and may prevent their adoption by cloud providers. To address these challenges, we develop a novel methodology and associated framework, GSF (GreenSKU Framework), that enables a cloud provider to systematically evaluate a GreenSKU’s carbon savings at scale. We implement GSF within Microsoft Azure’s production constraints to evaluate our three GreenSKUs’ carbon savings. Using GSF, we show that our most carbon-efficient GreenSKU reduces emissions per core by 28% compared to currently-deployed cloud servers. When designing GreenSKUs to meet applications’ performance requirements, we reduce emissions by 15%. When incorporating overall data center overheads, our GreenSKU reduces Azure’s net cloud emissions by 8%.  more » « less
Award ID(s):
2340042
PAR ID:
10575026
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
IEEE
Date Published:
ISBN:
979-8-3503-2658-1
Page Range / eLocation ID:
452 to 470
Format(s):
Medium: X
Location:
Buenos Aires, Argentina
Sponsoring Org:
National Science Foundation
More Like this
  1. The growing demands for computational power in cloud computing have led to a significant increase in the deployment of high-performance servers. The growing power consumption of servers and the heat they produce is on track to outpace the capacity of conventional air cooling systems, necessitating more efficient cooling solutions such as liquid immersion cooling. The superior heat exchange capabilities of immersion cooling both eliminates the need for bulky heat sinks, fans, and air flow channels while also unlocking the potential go beyond conventional 2D blade servers to three-dimensional designs. In this work, we present a computational framework to explore designs of servers in three-dimensional space, specifically targeting the maximization of server density within immersion cooling tanks. Our tool is designed to handle a variety of physical and electrical server design constraints. We demonstrate our optimized designs can reduce server volume by 25--52% compared to traditional flat server designs. This increased density reduces land usage as well as the amount of liquid used for immersion, with significant reduction in the carbon emissions embodied in datacenter buildings. We further create physical prototypes to simulate dense server designs and perform real-world experiments in an immersion cooling tank demonstrating they operate at safe temperatures. This approach marks a critical step forward in sustainable and efficient datacenter management. 
    more » « less
  2. Cloud platforms are increasing their emphasis on sustainability and reducing their operational carbon footprint. A common approach for reducing carbon emissions is to exploit the temporal flexibility inherent to many cloud workloads by executing them in periods with the greenest energy and suspending them at other times. Since such suspend-resume approaches can incur long delays in job completion times, we present a new approach that exploits the elasticity of batch workloads in the cloud to optimize their carbon emissions. Our approach is based on the notion of carbon scaling, similar to cloud autoscaling, where a job dynamically varies its server allocation based on fluctuations in the carbon cost of the grid's energy. We develop a greedy algorithm for minimizing a job's carbon emissions via carbon scaling that is based on the well-known problem of marginal resource allocation. We implement a CarbonScaler prototype in Kubernetes using its autoscaling capabilities and an analytic tool to guide the carbon-efficient deployment of batch applications in the cloud. We then evaluate CarbonScaler using real-world machine learning training and MPI jobs on a commercial cloud platform and show that it can yield i) 51% carbon savings over carbon-agnostic execution; ii) 37% over a state-of-the-art suspend-resume policy; and iii) 8 over the best static scaling policy. 
    more » « less
  3. We present a new and practical framework for security verification of secure architectures. Specifically, we break the verification task into external verification and internal verification. External verification considers the external protocols, i.e. interactions between users, compute servers, network entities, etc. Meanwhile, internal verification considers the interactions between hardware and software components within each server. This verification framework is general-purpose and can be applied to a stand-alone server, or a large-scale distributed system. We evaluate our verification method on the CloudMonatt and HyperWall architectures as examples. 
    more » « less
  4. Cloud GPU servers have become the de facto way for deep learning practitioners to train complex models on large- scale datasets. However, it is challenging to determine the appropriate cluster configuration—e.g., server type and number—for different training workloads while balancing the trade-offs in training time, cost, and model accuracy. Adding to the complexity is the potential to reduce the monetary cost by using cheaper, but revocable, transient GPU servers. In this work, we analyze distributed training performance under diverse cluster configurations using CM-DARE, a cloud- based measurement and training framework. Our empirical datasets include measurements from three GPU types, six geographic regions, twenty convolutional neural networks, and thousands of Google Cloud servers. We also demonstrate the feasibility of predicting training speed and overhead using regression-based models. Finally, we discuss potential use cases of our performance modeling such as detecting and mitigating performance bottlenecks. 
    more » « less
  5. We present Whisper, a system for privacy-preserving collection of aggregate statistics. Like prior systems, a Whisper deployment consists of a small set of non-colluding servers; these servers compute aggregate statistics over data from a large number of users without learning the data of any individual user. Whisper's main contribution is that its server-to-server communication cost and its server-side storage costs scale sublinearly with the total number of users. In particular, prior systems required the servers to exchange a few bits of information to verify the well-formedness of each client submission. In contrast, Whisper uses silently verifiable proofs, a new type of proof system on secret-shared data that allows the servers to verify an arbitrarily large batch of proofs by exchanging a single 128-bit string. This improvement comes with increased client-to-server communication, which, in cloud computing, is typically cheaper (or even free) than the cost of egress for server-to-server communication. To reduce server storage, Whisper approximates certain statistics using small-space sketching data structures. Applying randomized sketches in an environment with adversarial clients requires a careful and novel security analysis. In a deployment with two servers and 100,000 clients of which 1% are malicious, Whisper can improve server-to-server communication for vector sum by three orders of magnitude while each client's communication increases by only 10%. 
    more » « less