Abstract Direct Liquid Cooling (DLC) has emerged as a promising technology for thermal management of high-performance computing servers, enabling efficient heat dissipation and reliable operation. Thermal performance is governed by several factors, including the coolant physical properties and flow parameters such as coolant inlet temperature and flow rate. The design and development of the coolant distribution manifold to the Information Technology Equipment (ITE) can significantly impact the overall performance of the computing system. This paper aims to investigate the hydraulic characterization and design validation of a rack-level coolant distribution manifold or rack manifold. To achieve this goal, a custom-built high power-density liquid-cooled ITE rack was assembled, and various cooling loops were plugged into the rack manifold to validate its thermal performance. The rack manifold is responsible for distributing the coolant to each of these cooling loops, which is pumped by a CDU (Coolant Distribution Unit). In this study, pressure drop characteristics of the rack manifold were obtained for flow rates that effectively dissipate the heat loads from the ITE. The pressure drop is a critical parameter in the design of the coolant distribution manifold since it influences the flow rate and ultimately the thermal performance of the system. By measuring the pressure drop at various flow rates, the researchers can accurately determine the optimum flow rate for efficient heat dissipation. Furthermore, 1D flow network and CFD models of the rack-level coolant loop, including the rack manifold, were developed, and validated against experimental test data. The validated models provide a useful tool for the design of facility-level modeling of a liquid-cooled data center. The CFD models enable the researchers to simulate the fluid flow and heat transfer within the cooling system accurately. These models can help to design the coolant distribution manifold at facility level. The results of this study demonstrate the importance of the design and development of the coolant distribution manifold in the thermal performance of a liquid-cooled data center. The study also highlights the usefulness of 1D flow network and CFD models for designing and validating liquid-cooled data center cooling systems. In conclusion, the hydraulic characterization and design validation of a rack-level coolant distribution manifold is critical in achieving efficient thermal management of high-performance computing servers. This study presents a comprehensive approach for hydraulic characterization of the coolant distribution manifold, which can significantly impact the overall thermal performance and reliability of the system. The validated models also provide a useful tool for the design of facility-level modeling of a liquid-cooled data center.
more »
« less
A METHODOLOGY FOR THERMAL CHARACTERIZATION OF HIGH-POWER LIQUID-COOLED SERVERS
Effective cooling is crucial for high-power liquid-cooled servers to ensure optimal performance and reliability ofcomponents. Thermal characterization is necessary to ensure that the cooling system functions as intended, is energy efficient, and minimizes downtime. In this study, a proposed methodology for thermal characterization of a high-powerliquid-cooled server/TTV [server and TTVs (thermal test vehicle) are used interchangeably] is presented. The server layout includes multiple thermal test vehicle setups equipped with direct-to-chip cold plates, with two or more connected in series to form a TTV cooling loop. These cooling loops are connected in parallel to the supply and return plenums of the cooling loop manifold, which includes a chassis-level flow distribution manifold. To obtain accurate measurements, two identical server/TTV prototypes are instrumented with sensors for coolant flow rate and temperature measurements for every TTV cooling loop. Four ultrasonic flow sensors are installed in the flow verification server/TTV to measure the coolant flow rate to each TTV cooling loop. In the thermal verification server, thermistors are installed at the outlet of each GPU heater of TTV cooling loop to log temperature measurements. The amount of heat captured by the coolant in each TTV cooling loop is subsequently estimated based on the flow rates determined from the flow verification server.This methodology enables precise characterization of the thermal performance of high-power liquid-cooled servers,ensuring optimal functionality, energy efficiency, and minimized downtime.
more »
« less
- Award ID(s):
- 2209751
- PAR ID:
- 10537462
- Publisher / Repository:
- American Society of Mechanical Engineers
- Date Published:
- Journal Name:
- Heat Transfer Research
- Volume:
- 55
- Issue:
- 7
- ISSN:
- 1064-2285
- Page Range / eLocation ID:
- 39 to 56
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Abstract In today’s world, most data centers have multiple racks with numerous servers in each of them. The high amount of heat dissipation has become the largest server-level cooling problem for the data centers. The higher dissipation required, the higher is the total energy required to run the data center. Although still the most widely used cooling methodology, air cooling has reached its cooling capabilities especially for High-Performance Computing data centers. Liquid-cooled servers have several advantages over their air-cooled counterparts, primarily of which are high thermal mass, lower maintenance. Nano-fluids have been used in the past for improving the thermal efficiency of traditional dielectric coolants in the power electronics and automotive industry. Nanofluids have shown great promise in improving the convective heat transfer properties of the coolants due to a proven increase in thermal conductivity and specific heat capacity. The present research investigates the thermal enhancement of the performance of de-ionized water-based dielectric coolant with Copper nanoparticles for a higher heat transfer from the server cold plates. Detailed 3-D modeling of a commercial cold plate is completed and the CFD analysis is done in a commercially available CFD code ANSYS CFX. The obtained results compare the improvement in heat transfer due to improvement in coolant properties with data available in the literature.more » « less
-
Abstract Demand is growing for the dense and high-performing IT computing capacity to support artificial intelligence, deep learning, machine learning, autonomous cars, the Internet of Things, etc. This led to an unprecedented growth in transistor density for high-end CPUs and GPUs, creating thermal design power (TDP) of even more than 700 watts for some of the NVIDIA existing GPUs. Cooling these high TDP chips with air cooling comes with a cost of the higher form factor of servers and noise produced by server fans close to the permissible limit. Direct-to-chip cold plate-based liquid cooling is highly efficient and becoming more reliable as the advancement in technology is taking place. Several components are used in the liquid-cooled data centers for the deployment of cold plate-based direct-to-chip liquid cooling like cooling loops, rack manifolds, CDUs, row manifolds, quick disconnects, flow control valves, etc. Row manifolds used in liquid cooling are used to distribute secondary coolant to the rack manifolds. Characterizing these row manifolds to understand the pressure drops and flow distribution for better data center design and energy efficiency is important. In this paper, the methodology is developed to characterize the row manifolds. Water-based coolant Propylene glycol 25% was used as the coolant for the experiments and experiments were conducted at 21 °C coolant supply temperature. Two, six-port row manifolds' P-Q curves were generated, and the value of supply pressure and the flowrate were measured at each port. The results obtained from the experiments were validated by a technique called flow network modeling (FNM). FNM technique uses the overall flow and thermal characteristics to represent the behavior of individual components.more » « less
-
Abstract Due to the increasing computational demand driven by artificial intelligence, machine learning, and the Internet of Things (IoT), there has been an unprecedented growth in transistor density for high-end CPUs and GPUs. This growth has resulted in high thermal dissipation power (TDP) and high heat flux, necessitating the adoption of advanced cooling technologies to minimize thermal resistance and optimize cooling efficiency. Among these technologies, direct-to-chip cold plate-based liquid cooling has emerged as a preferred choice in electronics cooling due to its efficiency and cost-effectiveness. In this context, different types of single-phase liquid coolants, such as propylene glycol (PG), ethylene glycol (EG), DI water, treated water, and nanofluids, have been utilized in the market. These coolants, manufactured by different companies, incorporate various inhibitors and chemicals to enhance long-term performance, prevent biogrowth, and provide corrosion resistance. However, the additives used in these coolants can impact their thermal performance, even when the base coolant is the same. This paper aims to compare these coolant types and evaluate the performance of the same coolant from different vendors. The selection of coolants in this study is based on their performance, compatibility with wetted materials, reliability during extended operation, and environmental impact, following the guidelines set by ASHRAE. To conduct the experiments, a single cold plate-based benchtop setup was constructed, utilizing a thermal test vehicle (TTV), pump, reservoir, flow sensor, pressure sensors, thermocouple, data acquisition units, and heat exchanger. Each coolant was tested using a dedicated cold plate, and thorough cleaning procedures were carried out before each experiment. The experiments were conducted under consistent boundary conditions, with a TTV power of 1000 watts and varying coolant flow rates (ranging from 0.5 lpm to 2 lpm) and supply coolant temperatures (17°C, 25°C, 35°C, and 45°C), simulating warm water cooling. The thermal resistance (Rth) versus flow rate and pressure drop (ΔP) versus flow rate graphs were obtained for each coolant, and the impact of different supply coolant temperatures on pressure drop was characterized. The data collected from this study will be utilized to calculate the Total Cost of Ownership (TCO) in future research, providing insights into the impact of coolant selection at the data center level. There is limited research available on the reliability used in direct-to-chip liquid cooling, and there is currently no standardized methodology for testing their reliability. This study aims to fill this gap by focusing on the reliability of coolants, specifically propylene glycols at concentrations of 25%. To analyze the effectiveness of corrosion inhibitors in these coolants, ASTM standard D1384 apparatus, typically used for testing engine coolant corrosion inhibitors on metal samples in controlled laboratory settings, was employed. The setup involved immersing samples of wetted materials (copper, solder coated brass, brass, steel, cast iron, and cast aluminum) in separate jars containing inhibited propylene glycol solutions from different vendors. This test will determine the reliability difference between the same inhibited solutions from different vendors.more » « less
-
Abstract Transistor density trends till recently have been following Moore's law, doubling every generation resulting in increased power density. The computational performance gains with the breakdown of Moore's law were achieved by using multicore processors, leading to nonuniform power distribution and localized high temperatures making thermal management even more challenging. Cold plate-based liquid cooling has proven to be one of the most efficient technologies in overcoming these thermal management issues. Traditional liquid-cooled data center deployments provide a constant flow rate to servers irrespective of the workload, leading to excessive consumption of coolant pumping power. Therefore, a further enhancement in the efficiency of implementation of liquid cooling in data centers is possible. The present investigation proposes the implementation of dynamic cooling using an active flow control device to regulate the coolant flow rates at the server level. This device can aid in pumping power savings by controlling the flow rates based on server utilization. The flow control device design contains a V-cut ball valve connected to a microservo motor used for varying the device valve angle. The valve position was varied to change the flow rate through the valve by servomotor actuation based on predecided rotational angles. The device operation was characterized by quantifying the flow rates and pressure drop across the device by changing the valve position using both computational fluid dynamics and experiments. The proposed flow control device was able to vary the flow rate between 0.09 lpm and 4 lpm at different valve positions.more » « less
An official website of the United States government

