skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Usability and Performance Improvements in Hatchet
Performance analysis is critical for pinpointing bottlenecks in parallel applications. Several profilers exist to instrument parallel programs on HPC systems and gather performance data. Hatchet is an open-source Python library that can read profiling output of several tools, and enables the user to perform a variety of programmatic analyses on hierarchical performance profiles. In this paper, we augment Hatchet to support new features: a query language for representing call path patterns that can be used to filter a calling context tree, visualization support for displaying and interacting with performance profiles, and new operations for performing analyses on multiple datasets. Additionally, we present performance optimizations in Hatchet’s HPCToolkit reader and the unify operation to enable scalable analysis of large datasets.  more » « less
Award ID(s):
1656958
PAR ID:
10285529
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
2020 IEEE/ACM International Workshop on HPC User Support Tools (HUST) and Workshop on Programming and Performance Visualization Tools (ProTools)
Page Range / eLocation ID:
49 to 58
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We describe our approach in augmenting the BEAGLE library for high-performance statistical phylogenetic inference to support concurrent computation of independent partial likelihoods arrays. Our solution involves identifying independent likelihood estimates in analyses of partitioned datasets and in proposed tree topologies, and configuring concurrent computation of these likelihoods via CUDA and OpenCLl frameworks. We evaluate the effect of each increase in concurrency on throughput performance for our partial likelihoods kernel for a four-state nucleotide substitution model on a variety of parallel computing hardware, such as NVIDIA and AMD GPUs, and Intel multicore CPUs, observing up to 16-fold speedups over our previous implementation. Finally, we evaluate the effect of these gains on an domain application program, mrbayes. For a partitioned nucleotide-model analysis we observe an average speedup for the overall run time of 2.1-fold over our previous parallel implementation, and 10-fold over the native mrbayes with sse. 
    more » « less
  2. Genomic datasets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share these data with the research community, but some of these genomic data analysis problems require large-scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high-end parallel systems today and place different requirements on programming support, software libraries and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high-performance genomics analysis, including alignment, profiling, clustering and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or ‘motifs’ that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’. 
    more » « less
  3. Many graph problems can be solved using ordered parallel graph algorithms that achieve significant speedup over their unordered counterparts by reducing redundant work. This paper introduces a new priority-based extension to GraphIt, a domain-specific language for writing graph applications, to simplify writing high-performance parallel ordered graph algorithms. The extension enables vertices to be processed in a dynamic order while hiding low-level implementation details from the user. We extend the compiler with new program analyses, transformations, and code generation to produce fast implementations of ordered parallel graph algorithms. We also introduce bucket fusion, a new performance optimization that fuses together different rounds of ordered algorithms to reduce synchronization overhead, resulting in 1.2x--3x speedup over the fastest existing ordered algorithm implementations on road networks with large diameters. With the extension, GraphIt achieves up to 3x speedup on six ordered graph algorithms over state-of-the-art frameworks and hand-optimized implementations (Julienne, Galois, and GAPBS) that support ordered algorithms. 
    more » « less
  4. Analyzing “large p small n” data is becoming increasingly paramount in a wide range of application fields. As a projection pursuit index, the Penalized Discriminant Analysis ($$\mathrm{PDA}$$) index, built upon the Linear Discriminant Analysis ($$\mathrm{LDA}$$) index, is devised in Lee and Cook (2010) to classify high-dimensional data with promising results. Yet, there is little information available about its performance compared with the popular Support Vector Machine ($$\mathrm{SVM}$$). This paper conducts extensive numerical studies to compare the performance of the $$\mathrm{PDA}$$ index with the $$\mathrm{LDA}$$ index and $$\mathrm{SVM}$$, demonstrating that the $$\mathrm{PDA}$$ index is robust to outliers and able to handle high-dimensional datasets with extremely small sample sizes, few important variables, and multiple classes. Analyses of several motivating real-world datasets reveal the practical advantages and limitations of individual methods, suggesting that the $$\mathrm{PDA}$$ index provides a useful alternative tool for classifying complex high-dimensional data. These new insights, along with the hands-on implementation of the $$\mathrm{PDA}$$ index functions in the R package classPP, facilitate statisticians and data scientists to make effective use of both sets of classification tools. 
    more » « less
  5. null (Ed.)
    In the last several decades, public interest for electric vehicles (EVs) and research initiatives for smart AC and DC microgrids have increased substantially. Although EVs can yield benefits to their use, they also present new demand and new business models for a changing power grid. Some of the challenges include stochastic demand profiles from EVs, unplanned load growth by rapid EV adoption, and potential frequency (harmonics) and voltage disturbances due to uncoordinated charging. In order to properly account for any of these problems, an accurate and validated model for EV distributions in a power grid must be established. This model (or several models) may then be used for economic and technical analyses. This paper supplies insight into the impact that EVs play in effecting critical loads in a system, and develops a theoretical model to further support a hardware in-the-loop (HIL) real time simulation of modelling and analysis of a distribution feeder with distributed energy resources (DERs) and EVs based on existing data compiled. 
    more » « less