skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.


Title: CFD Optimization of the Cooling of Yosemite Open Compute Server
Over the past few years, there has been an ever increasing rise in energy consumption by IT equipment in Data Centers. Thus, the need to minimize the environmental impact of Data Centers by optimizing energy consumption and material use is increasing. In 2011, the Open Compute Project was started which was aimed at sharing specifications and best practices with the community for highly energy efficient and economical data centers. The first Open Compute Server was the ‘ Freedom’ Server. It was a vanity free design and was completely custom designed using minimum number of components and was deployed in a data center in Prineville, Oregon. Within the first few months of operation, considerable amount of energy and cost savings were observed. Since then, progressive generations of Open Compute servers have been introduced. Initially, the servers used for compute purposes mainly had a 2 socket architecture. In 2015, the Yosemite Open Compute Server was introduced which was suited for higher compute capacity. Yosemite has a system on a chip architecture having four CPUs per sled providing a significant improvement in performance per watt over the previous generations. This study mainly focuses on air flow optimization in Yosemite platform to improve its overall cooling performance. Commercially available CFD tools have made it possible to do the thermal modeling of these servers and predict their efficiency. A detailed server model is generated using a CFD tool and its optimization has been done to improve the air flow characteristics in the server. Thermal model of the improved design is compared to the existing design to show the impact of air flow optimization on flow rates and flow speeds which in turn affects CPU die temperatures and cooling power consumption and thus, impacting the overall cooling performance of the Yosemite platform. Emphasis is given on effective utilization of fans in the server as compared to the original design and improving air flow characteristics inside the server via improved ducting.  more » « less
Award ID(s):
1738811
NSF-PAR ID:
10065914
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
ASME 2017 International Technical Conference and Exhibition on Packaging and Integration of Electronic and Photonic Microsystems
Page Range / eLocation ID:
V001T02A011
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In recent years, there have been phenomenal increases in Artificial Intelligence and Machine Learning that require data collection, mining and using data sets to teach computers certain things to learn, analyze image and speech recognition. Machine Learning tasks require a lot of computing power to carry out numerous calculations. Therefore, most servers are powered by Graphics Processing Units (GPUs) instead of traditional CPUs. GPUs provide more computational throughput per dollar spent than traditional CPUs. Open Compute Servers forum has introduced the state-of-the-art machine learning servers “Big Sur” recently. Big Sur unit consists of 4OU (OpenU) chassis housing eight NVidia Tesla M40 GPUs and two CPUs along with SSD storage and hot-swappable fans at the rear. Management of the airflow is a critical requirement in the implementation of air cooling for rack mount servers to ensure that all components, especially critical devices such as CPUs and GPUs, receive adequate flow as per requirement. In addition, component locations within the chassis play a vital role in the passage of airflow and affect the overall system resistance. In this paper, sizeable improvement in chassis ducting is targeted to counteract effects of air diffusion at the rear of air flow duct in “Big Sur” Open Compute machine learning server wherein GPUs are located directly downstream from CPUs. A CFD simulation of the detailed server model is performed with the objective of understanding the effect of air flow bypass on GPU die temperatures and fan power consumption. The cumulative effect was studied by simulations to see improvements in fan power consumption by the server. The reduction in acoustics noise levels caused by server fans is also discussed. 
    more » « less
  2. Data centers house a variety of compute, storage, network IT hardware where equipment reliability is of utmost importance. Heat generated by the IT equipment can substantially reduce its service life if Tjmax, maximum temperature that the microelectronic device tolerates to guarantee reliable operation, is exceeded. Hence, data center rooms are bound to maintain continuous conditioning of the cooling medium becoming large energy consumers. The objective of this work is to introduce and evaluate a new end-of-aisle cooling design which consists of three cooling configurations. The key objectives of close-coupled cooling are to enable a controlled cooling of the IT equipment, flexible as well as modular design, and containment of hot air exhaust from the cold air. The thermal performance of the proposed solution is evaluated using CFD modeling. A computational model of a small size data center room has been developed. Larger axial fans are selected and placed at rack-level which constitute the rack-fan wall design. The model consists of 10 electronic racks each dissipating a heat load of 8kw. The room is modeled to be hot aisle containment i.e. the hot air exhaust exiting for each row is contained and directed within a specific volume. Each rack has passive IT with no server fans and the servers are cooled by means of rack fan wall. The cold aisle is separated with hot aisle by means of banks of heat exchangers placed on the either sides of the aisle containment. Based on the placement of rack fans, the design is divided to three sub designs — case 1: passive heat exchangers with rack fan walls; case 2: active heat exchangers (HXs coupled with fans) with rack fan walls; case 3: active heat exchangers (hxs coupled with fans) with no rack fans. The cooling performance is calculated based on the thermal and flow parameters obtained for all three configurations. The computational data obtained has shown that the case 1 is used only for lower system resistance IT. However, case 2 and Case 3 can handle denser IT systems. Case 3 is the design that can consume lower fan energy as well as handle denser IT systems. The paper also discusses the cooling behavior of each type of design. 
    more » « less
  3. There are various designs for segregating hot and cold air in data centers such as cold aisle containment (CAC), hot aisle containment (HAC), and chimney exhaust rack. These containment systems have different characteristics and impose various conditions on the information technology equipment (ITE). One common issue in HAC systems is the pressure buildup inside the HAC (known as backpressure). Backpressure also can be present in CAC systems in case of airflow imbalances. Hot air recirculation, limited cooling airflow rate in servers, and reversed flow through ITE with weaker fan systems (e.g. network switches) are some known consequences of backpressure. Currently there is a lack of experimental data on the interdependency between overall performance of ITE and its internal design when a backpressure is imposed on ITE. In this paper, three commercial 2-rack unit (RU) servers with different internal designs from various generations and performance levels are tested and analyzed under various environmental conditions. Smoke tests and thermal imaging are implemented to study the airflow patterns inside the tested equipment. In addition, the impact leak of hot air into ITE on the fan speed and the power consumption of ITE is studied. Furthermore, the cause of the discrepancy between measured inlet temperatures by internal intelligent platform management interface (IPMI) and external sensors is investigated. It is found that arrangement of fans, segregation of space upstream and downstream of fans, leakage paths, location of sensors of baseboard management controller (BMC) and presence of backpressure can have a significant impact on ITE power and cooling efficiency. 
    more » « less
  4. The most common approach to air cooling of data centers involves the pressurization of the plenum beneath the raised floor and delivery of air flow to racks via perforated floor tiles. This cooling approach is thermodynamically inefficient due in large part to the pressure losses through the tiles. Furthermore, it is difficult to control flow at the aisle and rack level since the flow source is centralized rather than distributed. Distributed cooling systems are more closely coupled to the heat generating racks. In overhead cooling systems, one can distribute flow to distinct aisles by placing the air mover and water cooled heat exchanger directly above an aisle. Two arrangements are possible: (i.) placing the air mover and heat exchanger above the cold aisle and forcing downward flow of cooled air into the cold aisle (Overhead Downward Flow (ODF)), or (ii.) placing the air mover and heat exchanger above the hot aisle and forcing heated air upwards from the hot aisle through the water cooled heat exchanger (Overhead Upward Flow (OUF)). This study focuses on the steady and transient behavior of overhead cooling systems in both ODF and OUF configurations and compares their cooling effectiveness and energy efficiency. The flow and heat transfer inside the servers and heat exchangers are modeled using physics based approaches that result in differential equation based mathematical descriptions. These models are programmed in the MATLAB™ language and embedded within a CFD computational environment (using the commercial code FLUENT™) that computes the steady or instantaneous airflow distribution. The complete computational model is able to simulate the complete flow and thermal field in the airside, the instantaneous temperatures within and pressure drops through the servers, and the instantaneous temperatures within and pressure drops through the overhead cooling system. Instantaneous overall energy consumption (1st Law) and exergy destruction (2nd Law) were used to quantify overall energy efficiency and to identify inefficiencies within the two systems. The server cooling effectiveness, based on an effectiveness-NTU model for the servers, was used to assess the cooling effectiveness of the two overhead cooling approaches 
    more » « less
  5. In typical data centers, the servers and IT equipment are cooled by air and almost half of total IT power is dedicated to cooling. Hybrid cooling is a combined cooling technology with both air and water, where the main heat generating components are cooled by water or water-based coolants and rest of the components are cooled by air supplied by CRAC or CRAH. Retrofitting the air-cooled servers with cold plates and pumps has the advantage over thermal management of CPUs and other high heat generating components. In a typical 1U server, the CPUs were retrofitted with cold plates and the server tested with raised coolant inlet conditions. The study showed the server can operate with maximum utilization for CPUs, DIMMs, and PCH for inlet coolant temperature from 25–45 °C following the ASHRAE guidelines. The server was also tested for failure scenarios of the pumps and fans with reducing numbers of fans and pumps. To reduce cooling power consumption at the facility level and increase air-side economizer hours, the hybrid cooled server can be operated at raised inlet air temperatures. The trade-off in energy savings at the facility level due to raising the inlet air temperatures versus the possible increase in server fan power and component temperatures is investigated. A detailed CFD analysis with a minimum number of server fans can provide a way to find an operating range of inlet air temperature for a hybrid cooled server. Changes in the model are carried out in 6SigmaET for an individual server and compared to the experimental data to validate the model. The results from this study can be helpful in determining the room level operating set points for data centers housing hybrid cooled server racks. 
    more » « less