skip to main content


Title: Experimental Analysis of Chiller Cooling Failure in a Small Size Data Center Environment Using Wireless Instrumentation
Given the vital rule of data center availability and since the inlet temperature of the IT equipment increase rapidly until reaching a certain threshold value after which IT starts throttling or shut down because of overheat during cooling system failure. Hence, it is especially important to understand failures and their effects. This study presented experimental investigation and analysis of a facility-level cooling system failure scenario in which chilled water interruption introduced to the data center. Quantitative instrumentation tools including wireless technology such as wireless temperature and pressure sensors were used to measure the discrete air inlet temperature and pressure differential though cold aisle enclosure, respectively. In addition, Intelligent Platform Management Interface (IPMI) and cooling system data during failure/recovery were reported. Furthermore, the IT equipment performance and response for opened and contained environments were simulated and compared. Finally, an experiment based analysis of the Ride Through Time (RTT) of servers during chilled water interruption of the cooling infrastructure presented as well. The results showed that for all three classes of servers tested during the cooling failure, CAC helped keep the server’s cooler for longer. The containment provided a barrier between the hot and cold air streams and caused slight negative pressure to build up, which allowed the servers to pull cold air from the underfloor plenum. In addition, the results show that the effect of CAC in containment solutions on the IT equipment performance and response could vary and depend on the server’s airflow, generation and hence types of servers deployed in cold aisle enclosure. Moreover, it was shown that when compared to the discrete sensors, the IPMI inlet temperature sensors underestimate the Ride Through Time (RTT) by 42% and 12% for the CAC and opened cases, respectively.  more » « less
Award ID(s):
1738793
NSF-PAR ID:
10094472
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ASME 2018 International Technical Conference and Exhibition on Packaging and Integration of Electronic and Photonic Microsystems
Page Range / eLocation ID:
V001T02A006
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. During the lifespan of a data center, power outages and blower cooling failures are common occurrences. Given that data centers have a vital role in modern life, it is especially important to understand these failures and their effects. A previous study [16] showed that cold aisle containment might have a negative impact on IT equipment uptime during a blower failure. This new study further analyzed the impact of containment on IT equipment uptime during a CRAH blower failure. It also compared the IT equipment performance both with and without a pressure relief mechanism implemented in the containment system. The results show that the effect of implementing pressure relief in containment solution on the IT equipment performance and response could vary and depend on the server's airflow, generation and hence types of servers deployed in cold aisle enclosure. The results also showed that when compared to the discrete sensors, the IPMI inlet temperature sensors underestimate the Ride Through Time (RTT) by 32%. This means that the RTT calculations based on the IPMI inlet sensors may be inaccurate due to variations in the sensor readings; as they exist today; in these servers. as discussed in a previous study [26]. Additionally, it was shown that all Dell PowerEdge 2950 servers have a similar IPMI inlet temperature reading, regardless of mounting location. As external system resistance increases during cooling failure, the servers exhibit internal recirculation through their weaker power supply fans, which is reflected in the high IPMI inlet temperature readings. For this server specifically, a pressure relief mechanism reduces the external resistance, thereby eliminating internal recirculation and resulting in lower IPMI inlet temperature readings. This in turn translates to a lower RTT. However, pressure relief showed conflicting results where the discrete sensors showed an increase in inlet temperature when pressure relief was introduced, thereby reducing the RTT. The CPU temperatures conformed with the discrete sensor data, indicating that containment helped increase the RTT of the servers during failure. 
    more » « less
  2. There are various designs for segregating hot and cold air in data centers such as cold aisle containment (CAC), hot aisle containment (HAC), and chimney exhaust rack. These containment systems have different characteristics and impose various conditions on the information technology equipment (ITE). One common issue in HAC systems is the pressure buildup inside the HAC (known as backpressure). Backpressure also can be present in CAC systems in case of airflow imbalances. Hot air recirculation, limited cooling airflow rate in servers, and reversed flow through ITE with weaker fan systems (e.g. network switches) are some known consequences of backpressure. Currently there is a lack of experimental data on the interdependency between overall performance of ITE and its internal design when a backpressure is imposed on ITE. In this paper, three commercial 2-rack unit (RU) servers with different internal designs from various generations and performance levels are tested and analyzed under various environmental conditions. Smoke tests and thermal imaging are implemented to study the airflow patterns inside the tested equipment. In addition, the impact leak of hot air into ITE on the fan speed and the power consumption of ITE is studied. Furthermore, the cause of the discrepancy between measured inlet temperatures by internal intelligent platform management interface (IPMI) and external sensors is investigated. It is found that arrangement of fans, segregation of space upstream and downstream of fans, leakage paths, location of sensors of baseboard management controller (BMC) and presence of backpressure can have a significant impact on ITE power and cooling efficiency. 
    more » « less
  3. In recent years, various airflow containment systems have been deployed in data centers to improve the cooling efficiency by minimizing the mixing of hot and cold air streams. The goal of this study is the experimental investigation of passive and active hot aisle containment (HAC) systems. Also investigated, will be the dynamic interaction between HAC and information technology equipment (ITE). In addition, various provisioning levels of HAC are studied. In this study, a chimney exhaust rack (CER) is considered as the HAC system. The rack is populated by 22 commercial 2-RU servers and one network switch. Four scenarios with and without the presence of cold and hot aisle containments are investigated and compared. The transient pressure build-up inside the rack, servers' fan speed, inlet air temperatures (IAT), IT power consumption, and CPU temperatures are monitored and operating data recorded. In addition, IAT of selected servers is measured using external temperature sensors and compared with data available via the Intelligent Platform Management Interface (IPMI). To the best of authors' knowledge, this is the first experimental study in which a HAC system is analyzed using commercial ITE in a white space. It is observed that presence of backpressure can lead to a false high IPMI IAT reading. Consequently, a cascade rise in servers' fan speed is observed, which increases the backpressure and worsen the situation. As a result, the thermal performance of ITE and power consumption of the rack are affected. Furthermore, it is shown that the backpressure can affect the accuracy of common data center efficiency metrics. 
    more » « less
  4. Most of the thermal management technologies concentrate on managing airflow to achieve the desired server inlet temperature (supply air operating set point) and not to manage/improve the amount of cool air (CFM) that each computer rack (i.e. IT servers) should receive in order to remove the produced heat. However, airflow is equally important for quantifying adequate cooling to IT equipment, but it is more challenging to obtain a uniform airflow distribution at the inlet of computer racks. Therefore, as a potential option for improving airflow distribution is to eliminate the sources of non-uniformities such as maldistribution of under-floor plenum pressure field caused by vortices. Numerous researchers focus on the adverse effects of under-floor blockages. This study focused to numerically investigate the positive impact of selectively placed obstructions (on-purpose air-directors); referred as partitions; Quantitative and qualitative analysis of underfloor plenum pressure field, perforated tiles airflow rate and racks inlet temperature with and without partitions using two Computational Fluid Dynamics (CFD) models, which were built using Future Facilities 6SigmaRoom CFD tool. First, a simple data center model was used to quantify the partitions benefits for two different systems; Hot Aisle Containment (HAC) compared to an open configuration. Second, the investigation was expanded using a physics-based experimentally validated CFD model of medium size data center (more complicated data center geometry) to compare different types of proposed partitions. Both models results showed that partition type I (partitions height of $\frac{2}{3}$ of plenum depth measured from the subfloor) eliminates the presence of vortices in the under-floor plenum and hence, more uniform pressure differential across the perforated tiles that drives more uniform airflow rates. In addition, the influence of proposed partitions on the rack inlet temperature was reported through a comparison between open versus hot aisle containment. The results showed that the partitions have a minor effect on the rack inlet temperature for the hot aisle containment system. However, the partitions significantly improve the tiles flowrate. On the other hand, for the open system, the presence of partitions has improved the tiles airflow rate, rack inlet temperature and hence eliminate the hot spots formation at computer rack inlet 
    more » « less
  5. The dynamic nature of today’s data centers requires active monitoring and holistic management of all aspects of the facility, from the applications to the air conditioning. The most significant aspect of implementing a dynamic data center is the requirement to actively monitor and manage the infrastructure assets. It is vital to ensure information technology (IT) equipment has access to sufficient air (provisioned) at a proper temperature to assure their optimal and continues operation. Hot air recirculation, elevated fan speed, and hot spots are known consequences of an under-provisioned cold aisle. On the other hand, over-provisioning a cold aisle can lead to a significant loss in energy due to bypass of cooling air and leakages. Besides, the number of active servers in an aisle may be varied by load balancers due to short or long-term IT load changes. This demonstrates the need for an active airflow management scheme that is able to respond to airflow demand in different aisles of a data center. In this study, remotely controllable air dampers are implemented to regulate airflow delivery to a cold aisle containment (CAC) during workload changes in a data center. The energy saving opportunities are investigated and practical considerations are discussed. 
    more » « less