During the lifespan of a data center, power outages and blower cooling failures are common occurrences. Given that data centers have a vital role in modern life, it is especially important to understand these failures and their effects. A previous study [16] showed that cold aisle containment might have a negative impact on IT equipment uptime during a blower failure. This new study further analyzed the impact of containment on IT equipment uptime during a CRAH blower failure. It also compared the IT equipment performance both with and without a pressure relief mechanism implemented in the containment system. The results show that the effect of implementing pressure relief in containment solution on the IT equipment performance and response could vary and depend on the server's airflow, generation and hence types of servers deployed in cold aisle enclosure. The results also showed that when compared to the discrete sensors, the IPMI inlet temperature sensors underestimate the Ride Through Time (RTT) by 32%. This means that the RTT calculations based on the IPMI inlet sensors may be inaccurate due to variations in the sensor readings; as they exist today; in these servers. as discussed in a previous study [26]. Additionally, it was shown that all Dell PowerEdge 2950 servers have a similar IPMI inlet temperature reading, regardless of mounting location. As external system resistance increases during cooling failure, the servers exhibit internal recirculation through their weaker power supply fans, which is reflected in the high IPMI inlet temperature readings. For this server specifically, a pressure relief mechanism reduces the external resistance, thereby eliminating internal recirculation and resulting in lower IPMI inlet temperature readings. This in turn translates to a lower RTT. However, pressure relief showed conflicting results where the discrete sensors showed an increase in inlet temperature when pressure relief was introduced, thereby reducing the RTT. The CPU temperatures conformed with the discrete sensor data, indicating that containment helped increase the RTT of the servers during failure.
more »
« less
Comparison and Evaluation of Different Monitoring Methods in a Data Center Environment
The operation of today’s data centers increasingly relies on environmental data collection and analysis to operate the cooling infrastructure as efficiently as possible and to maintain the reliability of IT equipment. This in turn emphasizes the importance of the quality of the data collected and their relevance to the overall operation of the data center. This study presents an experimentally based analysis and comparison between two different approaches for environmental data collection; one using a discrete sensor network, and another using available data from installed IT equipment through their Intelligent Platform Management Interface (IPMI). The comparison considers the quality and relevance of the data collected and investigates their effect on key performance and operational metrics. The results have shown the large variation of server inlet temperatures provided by the IPMI interface. On the other hand, the discrete sensor measurements showed much more reliable results where the server inlet temperatures had minimal variation inside the cold aisle. These results highlight the potential difficulty in using IPMI inlet temperature data to evaluate the thermal environment inside the contained cold aisle. The study also focuses on how industry common methods for cooling efficiency management and control can be affected by the data collection approach. Results have shown that using preheated IPMI inlet temperature data can lead to unnecessarily lower cooling set points, which in turn minimizes the potential cooling energy savings. It was shown in one case that using discrete sensor data for control provides 20% more energy savings than using IPMI inlet temperature data.
more »
« less
- Award ID(s):
- 1738793
- PAR ID:
- 10058022
- Date Published:
- Journal Name:
- ASME 2017 International Technical Conference and Exhibition on Packaging and Integration of Electronic and Photonic Microsystems collocated with the ASME 2017 Conference on Information Storage and Processing Systems
- Page Range / eLocation ID:
- V001T02A019
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Given the vital rule of data center availability and since the inlet temperature of the IT equipment increase rapidly until reaching a certain threshold value after which IT starts throttling or shut down because of overheat during cooling system failure. Hence, it is especially important to understand failures and their effects. This study presented experimental investigation and analysis of a facility-level cooling system failure scenario in which chilled water interruption introduced to the data center. Quantitative instrumentation tools including wireless technology such as wireless temperature and pressure sensors were used to measure the discrete air inlet temperature and pressure differential though cold aisle enclosure, respectively. In addition, Intelligent Platform Management Interface (IPMI) and cooling system data during failure/recovery were reported. Furthermore, the IT equipment performance and response for opened and contained environments were simulated and compared. Finally, an experiment based analysis of the Ride Through Time (RTT) of servers during chilled water interruption of the cooling infrastructure presented as well. The results showed that for all three classes of servers tested during the cooling failure, CAC helped keep the server’s cooler for longer. The containment provided a barrier between the hot and cold air streams and caused slight negative pressure to build up, which allowed the servers to pull cold air from the underfloor plenum. In addition, the results show that the effect of CAC in containment solutions on the IT equipment performance and response could vary and depend on the server’s airflow, generation and hence types of servers deployed in cold aisle enclosure. Moreover, it was shown that when compared to the discrete sensors, the IPMI inlet temperature sensors underestimate the Ride Through Time (RTT) by 42% and 12% for the CAC and opened cases, respectively.more » « less
-
In recent years, various airflow containment systems have been deployed in data centers to improve the cooling efficiency by minimizing the mixing of hot and cold air streams. The goal of this study is the experimental investigation of passive and active hot aisle containment (HAC) systems. Also investigated, will be the dynamic interaction between HAC and information technology equipment (ITE). In addition, various provisioning levels of HAC are studied. In this study, a chimney exhaust rack (CER) is considered as the HAC system. The rack is populated by 22 commercial 2-RU servers and one network switch. Four scenarios with and without the presence of cold and hot aisle containments are investigated and compared. The transient pressure build-up inside the rack, servers' fan speed, inlet air temperatures (IAT), IT power consumption, and CPU temperatures are monitored and operating data recorded. In addition, IAT of selected servers is measured using external temperature sensors and compared with data available via the Intelligent Platform Management Interface (IPMI). To the best of authors' knowledge, this is the first experimental study in which a HAC system is analyzed using commercial ITE in a white space. It is observed that presence of backpressure can lead to a false high IPMI IAT reading. Consequently, a cascade rise in servers' fan speed is observed, which increases the backpressure and worsen the situation. As a result, the thermal performance of ITE and power consumption of the rack are affected. Furthermore, it is shown that the backpressure can affect the accuracy of common data center efficiency metrics.more » « less
-
There are various designs for segregating hot and cold air in data centers such as cold aisle containment (CAC), hot aisle containment (HAC), and chimney exhaust rack. These containment systems have different characteristics and impose various conditions on the information technology equipment (ITE). One common issue in HAC systems is the pressure buildup inside the HAC (known as backpressure). Backpressure also can be present in CAC systems in case of airflow imbalances. Hot air recirculation, limited cooling airflow rate in servers, and reversed flow through ITE with weaker fan systems (e.g. network switches) are some known consequences of backpressure. Currently there is a lack of experimental data on the interdependency between overall performance of ITE and its internal design when a backpressure is imposed on ITE. In this paper, three commercial 2-rack unit (RU) servers with different internal designs from various generations and performance levels are tested and analyzed under various environmental conditions. Smoke tests and thermal imaging are implemented to study the airflow patterns inside the tested equipment. In addition, the impact leak of hot air into ITE on the fan speed and the power consumption of ITE is studied. Furthermore, the cause of the discrepancy between measured inlet temperatures by internal intelligent platform management interface (IPMI) and external sensors is investigated. It is found that arrangement of fans, segregation of space upstream and downstream of fans, leakage paths, location of sensors of baseboard management controller (BMC) and presence of backpressure can have a significant impact on ITE power and cooling efficiency.more » « less
-
In typical data centers, the servers and IT equipment are cooled by air and almost half of total IT power is dedicated to cooling. Hybrid cooling is a combined cooling technology with both air and water, where the main heat generating components are cooled by water or water-based coolants and rest of the components are cooled by air supplied by CRAC or CRAH. Retrofitting the air-cooled servers with cold plates and pumps has the advantage over thermal management of CPUs and other high heat generating components. In a typical 1U server, the CPUs were retrofitted with cold plates and the server tested with raised coolant inlet conditions. The study showed the server can operate with maximum utilization for CPUs, DIMMs, and PCH for inlet coolant temperature from 25–45 °C following the ASHRAE guidelines. The server was also tested for failure scenarios of the pumps and fans with reducing numbers of fans and pumps. To reduce cooling power consumption at the facility level and increase air-side economizer hours, the hybrid cooled server can be operated at raised inlet air temperatures. The trade-off in energy savings at the facility level due to raising the inlet air temperatures versus the possible increase in server fan power and component temperatures is investigated. A detailed CFD analysis with a minimum number of server fans can provide a way to find an operating range of inlet air temperature for a hybrid cooled server. Changes in the model are carried out in 6SigmaET for an individual server and compared to the experimental data to validate the model. The results from this study can be helpful in determining the room level operating set points for data centers housing hybrid cooled server racks.more » « less
An official website of the United States government

