skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Measuring Network Latency Variation Impacts to High Performance Computing Application Performance
In this paper, we study the impacts of latency variation versus latency mean on application runtime, library performance, and packet delivery. Our contributions include the design and implementation of a network latency injector that is suitable for most QLogic and Mellanox InfiniBand cards. We fit statistical distributions of latency mean and variation to varying levels of network contention for a range of parallel application workloads. We use the statistical distributions to characterize the latency variation impacts to application degradation. The level of application degradation caused by variation in network latency depends on application characteristics, and can be significant. Observed degradation varies from no degradation for applications without communicating processes to 3.5 times slower for communication-intensive parallel applications. We support our results with statistical analysis of our experimental observations. For communication-intensive high performance computing applications, we show statistically significant evidence that changes in performance are more highly correlated with changes of variation in network latency than with changes of mean network latency alone.  more » « less
Award ID(s):
1642542 1405767 1725573 1633608
PAR ID:
10060492
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
ICPE '18 Proceedings of the 2018 ACM/SPEC International Conference on Performance Engineering
Page Range / eLocation ID:
68 to 79
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Recent work has shown that lightweight virtualization like Docker containers can be used in HPC to package applications with their runtime environments. In many respects, applications in containers perform similarly to native applications. Other work has shown that containers can have adverse effects on the latency variation of communications with the enclosed application. This latency variation may have an impact on the performance of some HPC workloads, especially those dependent on synchronization between processes. In this work, we measure the latency characteristics of messages between Docker containers, and then compare those measurements to the performance of real-world applications. Our specific goals are to: measure the changes in mean and variation of latency with Docker containers, study how this affects the synchronization time of MPI processes, and measure the impact of these factors on real­world applications such as the NAS Parallel Benchmark (NPB). 
    more » « less
  2. null (Ed.)
    Performance variation deriving from hardware and software sources is common in modern scientific and data-intensive computing systems, and synchronization in parallel and distributed programs often exacerbates their impacts at scale. The decentralized and emergent effects of such variation are, unfortunately, also difficult to systematically measure, analyze, and predict; modeling assumptions which are stringent enough to make analysis tractable frequently cannot be guaranteed at meaningful application scales, and longitudinal methods at such scales can require the capture and manipulation of impractically large amounts of data. This paper describes a new, scalable, and statistically robust approach for effective modeling, measurement, and analysis of large-scale performance variation in HPC systems. Our approach avoids the need to reason about complex distributions of runtimes among large numbers of individual application processes by focusing instead on the maximum length of distributed workload intervals. We describe this approach and its implementation in MPI which makes it applicable to a diverse set of HPC workloads. We also present evaluations of these techniques for quantifying and predicting performance variation carried out on large-scale computing systems, and discuss the strengths and limitations of the underlying modeling assumptions. 
    more » « less
  3. Heterogeneous chiplets have been proposed for accelerating high-performance computing tasks. Integrated inside one package, CPU and GPU chiplets can share a common interconnection network that can be implemented through the interposer. However, CPU and GPU applications have very different traffic patterns in general. Without effective management of the network resource, some chiplets can suffer significant performance degradation because the network bandwidth is taken away by communication-intensive applications. Therefore, techniques need to be developed to effectively manage the shared network resources. In a chiplet-based system, resource management needs to not only react in real-time but also be cost-efficient. In this work, we propose a reconfigurable network architecture, leveraging Kalman Filter to make accurate predictions on network resources needed by the applications and then adaptively change the resource allocation. Using our design, the network bandwidth can be fairly allocated to avoid starvation or performance degradation. Our evaluation results show that the proposed reconfigurable interconnection network can dynamically react to the changes in traffic demand of the chiplets and improve the system performance with low cost and design complexity. 
    more » « less
  4. Dragonfly class of networks are considered as promising interconnects for next-generation supercomputers. While Dragonfly+ networks offer more path diversity than the original Dragonfly design, they are still prone to performance variability due to their hierarchical architecture and resource sharing design. Event-driven network simulators are indispensable tools for navigating complex system design. In this study, we quantitatively evaluate a variety of application communication interactions on a 3,456-node Dragonfly+ system by using the CODES toolkit. This study looks at the impact of communication interference from a user’s perspective. Specifically, for a given application submitted by a user, we examine how this application will behave with the existing workload running in the system under different job placement policies. Our simulation study considers hundreds of experiment configurations including four target applications with representative communication patterns under a variety of network traffic conditions. Our study shows that intra-job interference can cause severe performance degradation for communication-intensive applications. Inter-job interference can generally be reduced for applications with one-toone or one-to-many communication patterns through job isolation. Application with one-to-all communication pattern is resilient to network interference. 
    more » « less
  5. Traditional power reduction techniques such as DVFS or RAPL are challenging to use with web services because they significantly affect the services’ latency and throughput. Previous work sug- gested the use of controllers based on control theory or machine learning to reduce performance degradation under constrained power. However, generating these controllers is challenging as ev- ery web service applications running in a data center requires a power-performance model and a fine-tuned controller. In this paper, we present DDPC, a system for autonomic data-driven controller generation for power-latency management. DDPC automates the process of designing and deploying controllers for dynamic power allocation to manage the power-performance trade-offs for latency- sensitive web applications such as a social network. For each application, DDPC uses system identification techniques to learn an adaptive power-performance model that captures the application’s power-latency trade-offs which is then used to generate and deploy a Proportional-Integral (PI) power controller with gain-scheduling to dynamically manage the power allocation to the server running application using RAPL. We evaluate DDPC with two realistic latency-sensitive web applications under varying load scenarios. Our results show that DDPC is capable of autonomically generating and deploying controllers within a few minutes reducing the ac- tive power allocation of a web-server by more than 50% compared to state-of-the-art techniques while maintaining the latency well below the target of the application. 
    more » « less