skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Interactive Code Adaptation Tool for Modernizing Applications for Intel Knights Landing Processors
The process of code adaptation to take advantage of the latest innovations in a supercomputing platform begins with learning about the details of the platform's underlying hardware. It can be challenging for many users to spend time and effort in developing an understanding of the innovative features in a supercomputing platform - such as deep memory hierarchies - and to harness their maximum possible performance by manually modernizing their applications. To mitigate the aforementioned challenge, we are developing an Interactive Code Adaptation Tool (ICAT). In its current form, ICAT can assist the users in modifying, compiling, and optimally running their applications on the latest HPC platforms that are equipped with the Intel Knights Landing (KNL) processors. ICAT detects a given application's characteristics such as memory usage pattern, type of memory allocation, and execution time. Depending upon the application's characteristics, it advises the user on optimal ways to take advantage of the KNL processor and its memory-hierarchy.  more » « less
Award ID(s):
1642396
PAR ID:
10036417
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the Practice and Experience in Advanced Research Computing 2017 (PEARC17) on Sustainability, Success and Impact
Page Range / eLocation ID:
1 to 8
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Python's ease of use and rich collection of numeric libraries make it an excellent choice for rapidly developing scientific applications. However, composing these libraries to take advantage of complex heterogeneous nodes is still difficult. To simplify writing multi-device code, we created Parla, a heterogeneous task-based programming framework that fully supports Python's scientific programming stack. Parla's API is based on Python decorators and allows users to wrap code in Parla tasks for parallel execution. Parla arrays enable automatic movement of data between devices. The Parla runtime handles resource-aware mapping, scheduling, and execution of tasks. Compared to other Python tasking systems, Parla is unique in its parallelization of tasks within a single process, its GPU context and resource-aware runtime, and its design around gradual adoption to provide easy migration of and integration into existing Python applications. We show that Parla can achieve performance competitive with hand-optimized code while improving ease of development. 
    more » « less
  2. The Intel Knight Landing (KNL) manycore chip includes 3D-stacked memory named MCDRAM, also known as High Bandwidth Memory (HBM) for parallel applications that needs to scale to high thread count. In this paper, we provide a quantitative study of the KNL for HPC proxy applications including Lulesh, HPCG, AMG, and Hotspot when using DDR4 and MCDRAM. The results indicate that HBM significantly improves the performance of memory intensive applications for as many as three times better than DDR4 in HPCG, and Lulesh and HPCG for as many as 40% and 200%. For the selected compute intensive applications, the performance advantage of MCDRAM over DDR4 varies from 2% to 28%. We also observed that the cross-points, where MCDRAM starts outperforming DDR4, are around 8 to 16 threads. 
    more » « less
  3. null (Ed.)
    The global pandemic of COVID-19 has turned the spotlight on video conferencing applications like never before. In this critical time, applications such as Zoom have experienced a surge in its user base jump over the 300 million daily mark (ZoomBlog, 2020). The increase in use has led malicious actors to exploit the application, and in many cases perform Zoom Bombings. Therefore forensically examining Zoom is inevitable. Our work details the primary disk, network, and memory forensic analysis of the Zoom video conferencing application. Results demonstrate it is possible to find users' critical information in plain text and/or encrypted/encoded, such as chat messages, names, email addresses, passwords, and much more through network captures, forensic imaging of digital devices, and memory forensics. Furthermore we elaborate on interesting anti-forensics techniques employed by the Zoom application when contacts are deleted from the Zoom application's contact list. 
    more » « less
  4. Abstract The Generic Mapping Tools (GMT) software is ubiquitous in the Earth and ocean sciences. As a cross‐platform tool producing high‐quality maps and figures, it is used by tens of thousands of scientists around the world. The basic syntax of GMT scripts has evolved very slowly since the 1990s, despite the fact that GMT is generally perceived to have a steep learning curve with many pitfalls for beginners and experienced users alike. Reducing these pitfalls means changing the interface, which would break compatibility with thousands of existing scripts. With the latest GMT version 6, we solve this conundrum by introducing a new “modern mode” to complement the interface used in previous versions, which GMT 6 now calls “classic mode.” GMT 6 defaults to classic mode and thus is a recommended upgrade for all GMT 5 users. Nonetheless, new users should take advantage of modern mode to make shorter scripts, quickly access commonly used global data sets, and take full advantage of the new tools to draw subplots, place insets, and create animations. 
    more » « less
  5. To deliver scalable performance to large-scale scientific and data analytic applications, HPC cluster architectures adopt the distributed-memory model. The performance and scalability of parallel applications on such systems are limited by the communication cost across compute nodes. Therefore, projecting the minimum communication cost and maximum scalability of the user applications plays a critical role in assessing the benefits of porting these applications to HPC clusters as well as developing efficient distributed-memory implementations. Unfortunately, this task is extremely challenging for end users, as it requires comprehensive knowledge of the target application and hardware architecture and demands significant effort and time for manual system analysis. To streamline the process of porting user applications to HPC clusters, this paper presents CommAnalyzer, an automated framework for estimating the communication cost on distributed-memory models from sequential code. CommAnalyzer uses novel dynamic program analyses and graph algorithms to capture the inherent flow of program values (information) in sequential code to estimate the communication when this code is ported to HPC clusters. Therefore, CommAnalyzer makes it possible to project the efficiency/scalability upper-bound (i.e., Roofline) of the effective distributed-memory implementation before even developing one. The experiments with real-world, regular and irregular HPC applications demonstrate the utility of CommAnalyzer in estimating the minimum communication of sequential applications on HPC clusters. In addition, the optimized MPI+X implementations achieve more than 92% of the efficiency upper-bound across the different workloads. 
    more » « less