Effective cooling is crucial for high-power liquid-cooled servers to ensure optimal performance and reliability ofcomponents. Thermal characterization is necessary to ensure that the cooling system functions as intended, is energy efficient, and minimizes downtime. In this study, a proposed methodology for thermal characterization of a high-powerliquid-cooled server/TTV [server and TTVs (thermal test vehicle) are used interchangeably] is presented. The server layout includes multiple thermal test vehicle setups equipped with direct-to-chip cold plates, with two or more connected in series to form a TTV cooling loop. These cooling loops are connected in parallel to the supply and return plenums of the cooling loop manifold, which includes a chassis-level flow distribution manifold. To obtain accurate measurements, two identical server/TTV prototypes are instrumented with sensors for coolant flow rate and temperature measurements for every TTV cooling loop. Four ultrasonic flow sensors are installed in the flow verification server/TTV to measure the coolant flow rate to each TTV cooling loop. In the thermal verification server, thermistors are installed at the outlet of each GPU heater of TTV cooling loop to log temperature measurements. The amount of heat captured by the coolant in each TTV cooling loop is subsequently estimated based on the flow rates determined from the flow verification server.This methodology enables precise characterization of the thermal performance of high-power liquid-cooled servers,ensuring optimal functionality, energy efficiency, and minimized downtime.
more »
« less
Thermal, Hydraulic and Reliability Analysis of Single-Phase Liquid Coolants for Direct-to-Chip Cold Plate Cooling in High-Performance Computing Systems
Abstract Due to the increasing computational demand driven by artificial intelligence, machine learning, and the Internet of Things (IoT), there has been an unprecedented growth in transistor density for high-end CPUs and GPUs. This growth has resulted in high thermal dissipation power (TDP) and high heat flux, necessitating the adoption of advanced cooling technologies to minimize thermal resistance and optimize cooling efficiency. Among these technologies, direct-to-chip cold plate-based liquid cooling has emerged as a preferred choice in electronics cooling due to its efficiency and cost-effectiveness. In this context, different types of single-phase liquid coolants, such as propylene glycol (PG), ethylene glycol (EG), DI water, treated water, and nanofluids, have been utilized in the market. These coolants, manufactured by different companies, incorporate various inhibitors and chemicals to enhance long-term performance, prevent biogrowth, and provide corrosion resistance. However, the additives used in these coolants can impact their thermal performance, even when the base coolant is the same. This paper aims to compare these coolant types and evaluate the performance of the same coolant from different vendors. The selection of coolants in this study is based on their performance, compatibility with wetted materials, reliability during extended operation, and environmental impact, following the guidelines set by ASHRAE. To conduct the experiments, a single cold plate-based benchtop setup was constructed, utilizing a thermal test vehicle (TTV), pump, reservoir, flow sensor, pressure sensors, thermocouple, data acquisition units, and heat exchanger. Each coolant was tested using a dedicated cold plate, and thorough cleaning procedures were carried out before each experiment. The experiments were conducted under consistent boundary conditions, with a TTV power of 1000 watts and varying coolant flow rates (ranging from 0.5 lpm to 2 lpm) and supply coolant temperatures (17°C, 25°C, 35°C, and 45°C), simulating warm water cooling. The thermal resistance (Rth) versus flow rate and pressure drop (ΔP) versus flow rate graphs were obtained for each coolant, and the impact of different supply coolant temperatures on pressure drop was characterized. The data collected from this study will be utilized to calculate the Total Cost of Ownership (TCO) in future research, providing insights into the impact of coolant selection at the data center level. There is limited research available on the reliability used in direct-to-chip liquid cooling, and there is currently no standardized methodology for testing their reliability. This study aims to fill this gap by focusing on the reliability of coolants, specifically propylene glycols at concentrations of 25%. To analyze the effectiveness of corrosion inhibitors in these coolants, ASTM standard D1384 apparatus, typically used for testing engine coolant corrosion inhibitors on metal samples in controlled laboratory settings, was employed. The setup involved immersing samples of wetted materials (copper, solder coated brass, brass, steel, cast iron, and cast aluminum) in separate jars containing inhibited propylene glycol solutions from different vendors. This test will determine the reliability difference between the same inhibited solutions from different vendors.
more »
« less
- Award ID(s):
- 2209751
- PAR ID:
- 10537461
- Publisher / Repository:
- American Society of Mechanical Engineers
- Date Published:
- ISBN:
- 978-0-7918-8751-6
- Format(s):
- Medium: X
- Location:
- San Diego, California, USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In recent years, electronic packaging has evolved significantly to meet demands for higher performance, lower costs, and smaller designs. This shift has led to heterogeneous packaging, which integrates chips of varying stack heights and results in non-uniform heat flux and temperature distributions. These conditions pose substantial thermal management challenges, as they can create large temperature gradients, which increase thermal stress and potentially compromise chip reliability. This study explores single-phase liquid cooling for multi-chip modules (MCMs) through a comprehensive experimental and machine learning approach. It investigates the impact of chip spacing, height, fluid flow rate, fluid inlet location, and heat flux uniformity on chip temperature and the thermohydraulic performance of a commercial cold plate. Results show that increasing coolant flow from 1 LPM to 2 LPM decreased thermal resistance by 26 %, with heat losses remaining below 5 %. The left inlet configuration improved temperature uniformity compared to the right, though both yielded comparable thermal performance. Adjusting heater spacing impacted temperature distribution based on inlet position, and lowering one heater by 1 mm raised its temperatures by 15 ◦C due to increased thermal resistance from thermal interface material. A transient test demonstrated the cold plate’s quick response to power surges, in which there is only a 1 ◦C spike above steady state. Complementing these findings, an Artificial Neural Network (ANN) model was developed with optimized architecture specifically for the unique challenges of this study. The ANN model was rigorously validated using an independent dataset, achieving highly accurate temperature predictions (R2 = 0.99) within 2.5 % of experimentalmore » « less
-
Abstract Demand is growing for the dense and high-performing IT computing capacity to support artificial intelligence, deep learning, machine learning, autonomous cars, the Internet of Things, etc. This led to an unprecedented growth in transistor density for high-end CPUs and GPUs, creating thermal design power (TDP) of even more than 700 watts for some of the NVIDIA existing GPUs. Cooling these high TDP chips with air cooling comes with a cost of the higher form factor of servers and noise produced by server fans close to the permissible limit. Direct-to-chip cold plate-based liquid cooling is highly efficient and becoming more reliable as the advancement in technology is taking place. Several components are used in the liquid-cooled data centers for the deployment of cold plate-based direct-to-chip liquid cooling like cooling loops, rack manifolds, CDUs, row manifolds, quick disconnects, flow control valves, etc. Row manifolds used in liquid cooling are used to distribute secondary coolant to the rack manifolds. Characterizing these row manifolds to understand the pressure drops and flow distribution for better data center design and energy efficiency is important. In this paper, the methodology is developed to characterize the row manifolds. Water-based coolant Propylene glycol 25% was used as the coolant for the experiments and experiments were conducted at 21 °C coolant supply temperature. Two, six-port row manifolds' P-Q curves were generated, and the value of supply pressure and the flowrate were measured at each port. The results obtained from the experiments were validated by a technique called flow network modeling (FNM). FNM technique uses the overall flow and thermal characteristics to represent the behavior of individual components.more » « less
-
Miniaturization and high heat flux of power electronic devices have posed a colossal challenge for adequate thermal management. Conventional air-cooling solutions are inadequate for high-performance electronics. Liquid cooling is an alternative solution thanks to the higher specific heat and latent heat associated with the coolants. Liquid-cooled cold plates are typically manufactured by different approaches such as: skived, forged, extrusion, electrical discharge machining. When researchers are facing challenges at creating complex geometries in small spaces, 3D-printing can be a solution. In this paper, a 3D-printed cold plate was designed and characterized with water coolant. The printed metal fin structures were strong enough to undergo pressure from the fluid flow even at high flow rates and small fin structures. A copper block with top surface area of 1 inch by 1 inch was used to mimic a computer chip. Experimental data has good match with a simulation model which was built using commercial software 6SigmaET. Effects of geometry parameters and operating parameters were investigated. Fin diameter was varied from 0.3 mm to 0.5 mm and fin height was maintained at 2 mm. A special manifold was designed to maximize the surface contact area between coolant and metal surface and therefore minimize thermal resistance. The flow rate was varied from 0.75 L/min to 2 L/min and coolant inlet temperature was varied from 25 to 48 oC. It was observed that for the coolant inlet temperature 25 oC and aluminum cold plate, the junction temperature was kept below 63.2 oC at input power 350 W and pressure drop did not exceed 23 Kpa. Effects of metal materials used in 3D-printing on the thermal performance of the cold plate were also studied in detail.more » « less
-
Miniaturization of microelectronic components comes at a price of high heat flux density. By adopting liquid cooling, the rising demand of high heat flux devices can be met while the reliability of the microelectronic devices can also be improved to a greater extent. Liquid cooled cold plates are largely replacing air based heat sinks for electronics in data center applications, thanks to its large heat carrying capacity. A bench level study was carried out to characterize the thermohydraulic performance of two microchannel cold plates which uses warm DI water for cooling Multi Chip Server Modules (MCM). A laboratory built mock package housing mock dies and a heat spreader was employed while assessing the thermal performance of two different cold plate designs at varying coolant flow rate and temperature. The case temperature measured at the heat spreader for varying flow rates and input power were essential in identifying the convective resistance. The flow performance was evaluated by measuring the pressure drop across cold plate module at varying flow rates. Cold plate with the enhanced microchannel design yielded better results compared to a traditional parallel microchannel design. The study conducted at higher coolant temperatures yielded lower pressure drop values with no apparent change in the thermal behavior using different cold plates. The tests conducted after reversing the flow direction in microchannels provide an insight at the effect of neighboring dies on each other and reveal the importance of package specific cold plate designs for top performance. The experimental results were validated using a numerical model which are further optimized for improved geometric designs.more » « less
An official website of the United States government

