skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Experimental Analysis of Chiller Cooling Failure in a Small Size Data Center Environment Using Wireless Instrumentation
Given the vital rule of data center availability and since the inlet temperature of the IT equipment increase rapidly until reaching a certain threshold value after which IT starts throttling or shut down because of overheat during cooling system failure. Hence, it is especially important to understand failures and their effects. This study presented experimental investigation and analysis of a facility-level cooling system failure scenario in which chilled water interruption introduced to the data center. Quantitative instrumentation tools including wireless technology such as wireless temperature and pressure sensors were used to measure the discrete air inlet temperature and pressure differential though cold aisle enclosure, respectively. In addition, Intelligent Platform Management Interface (IPMI) and cooling system data during failure/recovery were reported. Furthermore, the IT equipment performance and response for opened and contained environments were simulated and compared. Finally, an experiment based analysis of the Ride Through Time (RTT) of servers during chilled water interruption of the cooling infrastructure presented as well. The results showed that for all three classes of servers tested during the cooling failure, CAC helped keep the server’s cooler for longer. The containment provided a barrier between the hot and cold air streams and caused slight negative pressure to build up, which allowed the servers to pull cold air from the underfloor plenum. In addition, the results show that the effect of CAC in containment solutions on the IT equipment performance and response could vary and depend on the server’s airflow, generation and hence types of servers deployed in cold aisle enclosure. Moreover, it was shown that when compared to the discrete sensors, the IPMI inlet temperature sensors underestimate the Ride Through Time (RTT) by 42% and 12% for the CAC and opened cases, respectively.  more » « less
Award ID(s):
1738793
PAR ID:
10094472
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ASME 2018 International Technical Conference and Exhibition on Packaging and Integration of Electronic and Photonic Microsystems
Page Range / eLocation ID:
V001T02A006
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. During the lifespan of a data center, power outages and blower cooling failures are common occurrences. Given that data centers have a vital role in modern life, it is especially important to understand these failures and their effects. A previous study [16] showed that cold aisle containment might have a negative impact on IT equipment uptime during a blower failure. This new study further analyzed the impact of containment on IT equipment uptime during a CRAH blower failure. It also compared the IT equipment performance both with and without a pressure relief mechanism implemented in the containment system. The results show that the effect of implementing pressure relief in containment solution on the IT equipment performance and response could vary and depend on the server's airflow, generation and hence types of servers deployed in cold aisle enclosure. The results also showed that when compared to the discrete sensors, the IPMI inlet temperature sensors underestimate the Ride Through Time (RTT) by 32%. This means that the RTT calculations based on the IPMI inlet sensors may be inaccurate due to variations in the sensor readings; as they exist today; in these servers. as discussed in a previous study [26]. Additionally, it was shown that all Dell PowerEdge 2950 servers have a similar IPMI inlet temperature reading, regardless of mounting location. As external system resistance increases during cooling failure, the servers exhibit internal recirculation through their weaker power supply fans, which is reflected in the high IPMI inlet temperature readings. For this server specifically, a pressure relief mechanism reduces the external resistance, thereby eliminating internal recirculation and resulting in lower IPMI inlet temperature readings. This in turn translates to a lower RTT. However, pressure relief showed conflicting results where the discrete sensors showed an increase in inlet temperature when pressure relief was introduced, thereby reducing the RTT. The CPU temperatures conformed with the discrete sensor data, indicating that containment helped increase the RTT of the servers during failure. 
    more » « less
  2. There are various designs for segregating hot and cold air in data centers such as cold aisle containment (CAC), hot aisle containment (HAC), and chimney exhaust rack. These containment systems have different characteristics and impose various conditions on the information technology equipment (ITE). One common issue in HAC systems is the pressure buildup inside the HAC (known as backpressure). Backpressure also can be present in CAC systems in case of airflow imbalances. Hot air recirculation, limited cooling airflow rate in servers, and reversed flow through ITE with weaker fan systems (e.g. network switches) are some known consequences of backpressure. Currently there is a lack of experimental data on the interdependency between overall performance of ITE and its internal design when a backpressure is imposed on ITE. In this paper, three commercial 2-rack unit (RU) servers with different internal designs from various generations and performance levels are tested and analyzed under various environmental conditions. Smoke tests and thermal imaging are implemented to study the airflow patterns inside the tested equipment. In addition, the impact leak of hot air into ITE on the fan speed and the power consumption of ITE is studied. Furthermore, the cause of the discrepancy between measured inlet temperatures by internal intelligent platform management interface (IPMI) and external sensors is investigated. It is found that arrangement of fans, segregation of space upstream and downstream of fans, leakage paths, location of sensors of baseboard management controller (BMC) and presence of backpressure can have a significant impact on ITE power and cooling efficiency. 
    more » « less
  3. In recent years, various airflow containment systems have been deployed in data centers to improve the cooling efficiency by minimizing the mixing of hot and cold air streams. The goal of this study is the experimental investigation of passive and active hot aisle containment (HAC) systems. Also investigated, will be the dynamic interaction between HAC and information technology equipment (ITE). In addition, various provisioning levels of HAC are studied. In this study, a chimney exhaust rack (CER) is considered as the HAC system. The rack is populated by 22 commercial 2-RU servers and one network switch. Four scenarios with and without the presence of cold and hot aisle containments are investigated and compared. The transient pressure build-up inside the rack, servers' fan speed, inlet air temperatures (IAT), IT power consumption, and CPU temperatures are monitored and operating data recorded. In addition, IAT of selected servers is measured using external temperature sensors and compared with data available via the Intelligent Platform Management Interface (IPMI). To the best of authors' knowledge, this is the first experimental study in which a HAC system is analyzed using commercial ITE in a white space. It is observed that presence of backpressure can lead to a false high IPMI IAT reading. Consequently, a cascade rise in servers' fan speed is observed, which increases the backpressure and worsen the situation. As a result, the thermal performance of ITE and power consumption of the rack are affected. Furthermore, it is shown that the backpressure can affect the accuracy of common data center efficiency metrics. 
    more » « less
  4. The dynamic nature of today’s data centers requires active monitoring and holistic management of all aspects of the facility, from the applications to the air conditioning. The most significant aspect of implementing a dynamic data center is the requirement to actively monitor and manage the infrastructure assets. It is vital to ensure information technology (IT) equipment has access to sufficient air (provisioned) at a proper temperature to assure their optimal and continues operation. Hot air recirculation, elevated fan speed, and hot spots are known consequences of an under-provisioned cold aisle. On the other hand, over-provisioning a cold aisle can lead to a significant loss in energy due to bypass of cooling air and leakages. Besides, the number of active servers in an aisle may be varied by load balancers due to short or long-term IT load changes. This demonstrates the need for an active airflow management scheme that is able to respond to airflow demand in different aisles of a data center. In this study, remotely controllable air dampers are implemented to regulate airflow delivery to a cold aisle containment (CAC) during workload changes in a data center. The energy saving opportunities are investigated and practical considerations are discussed. 
    more » « less
  5. The operation of today’s data centers increasingly relies on environmental data collection and analysis to operate the cooling infrastructure as efficiently as possible and to maintain the reliability of IT equipment. This in turn emphasizes the importance of the quality of the data collected and their relevance to the overall operation of the data center. This study presents an experimentally based analysis and comparison between two different approaches for environmental data collection; one using a discrete sensor network, and another using available data from installed IT equipment through their Intelligent Platform Management Interface (IPMI). The comparison considers the quality and relevance of the data collected and investigates their effect on key performance and operational metrics. The results have shown the large variation of server inlet temperatures provided by the IPMI interface. On the other hand, the discrete sensor measurements showed much more reliable results where the server inlet temperatures had minimal variation inside the cold aisle. These results highlight the potential difficulty in using IPMI inlet temperature data to evaluate the thermal environment inside the contained cold aisle. The study also focuses on how industry common methods for cooling efficiency management and control can be affected by the data collection approach. Results have shown that using preheated IPMI inlet temperature data can lead to unnecessarily lower cooling set points, which in turn minimizes the potential cooling energy savings. It was shown in one case that using discrete sensor data for control provides 20% more energy savings than using IPMI inlet temperature data. 
    more » « less