skip to main content


Title: Multi-Level Analysis of Compiler-Induced Variability and Performance Tradeoffs
Successful HPC software applications are long-lived. When ported across machines and their compilers, these applications often produce different numerical results, many of which are unacceptable. Such variability is also a concern while optimizing the code more aggressively to gain performance. Efficient tools that help locate the program units (files and functions) within which most of the variability occurs are badly needed, both to plan for code ports and to root-cause errors due to variability when they happen in the field. In this work, we offer an enhanced version of the open-source testing framework FLiT to serve these roles. Key new features of FLiT include a suite of bisection algorithms that help locate the root causes of variability. Another added feature allows an analysis of the tradeoffs between performance and the degree of variability. Our new contributions also include a collection of case studies. Results on the MFEM finite-element library include variability/performance tradeoffs, and the identification of a (hitherto unknown) abnormal level of result-variability even under mild compiler optimizations. Results from studying the Laghos proxy application include identifying a significantly divergent floating-point result-variability and successful root-causing down to the problematic function over as little as 14 program executions. Finally, in an evaluation of 4,376 controlled injections of floating-point perturbations on the LULESH proxy application, we showed that the FLiT framework has 100% precision and recall in discovering the file and function locations of the injections all within an average of only 15 program executions.  more » « less
Award ID(s):
1817073 1704715
NSF-PAR ID:
10101376
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
HPDC '19 Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing
Page Range / eLocation ID:
61 to 72
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Scientific applications, especially legacy applications, contain a wealth of scientific knowledge. As hardware changes, applications need to be ported to new architectures and extended to include scientific advances. As a result, it is common to encounter problems like performance bottlenecks and dead code. A visual representation of the dataflow can help performance experts identify and debug such problems. The Computation API of the sparse polyhedral framework (SPF) provides a single entry point for tools to generate and manipulate polyhedral dataflow graphs, and transform applications. However, when viewing graphs generated for scientific applications there are several barriers. The graphs are large, and manipulating their layout to respect execution order is difficult. This paper presents a case study that uses the Computation API to represent a scientific application, GeoAc, in the SPF. Generated polyhedral dataflow graphs were explored for optimization opportunities and limitations were addressed using several graph simplifications to improve their usability. 
    more » « less
  2. As scaling of conventional memory devices has stalled, many high-end computing systems have begun to incorporate alternative memory technologies to meet performance goals. Since these technologies present distinct advantages and tradeoffs compared to conventional DDR* SDRAM, such as higher bandwidth with lower capacity or vice versa, they are typically packaged alongside conventional SDRAM in a heterogeneous memory architecture. To utilize the different types of memory efficiently, new data management strategies are needed to match application usage to the best available memory technology. However, current proposals for managing heterogeneous memories are limited, because they either (1) do not consider high-level application behavior when assigning data to different types of memory or (2) require separate program execution (with a representative input) to collect information about how the application uses memory resources. This work presents a new data management toolset to address the limitations of existing approaches for managing complex memories. It extends the application runtime layer with automated monitoring and management routines that assign application data to the best tier of memory based on previous usage, without any need for source code modification or a separate profiling run. It evaluates this approach on a state-of-the-art server platform with both conventional DDR4 SDRAM and non-volatile Intel Optane DC memory, using both memory-intensive high-performance computing (HPC) applications as well as standard benchmarks. Overall, the results show that this approach improves program performance significantly compared to a standard unguided approach across a variety of workloads and system configurations. The HPC applications exhibit the largest benefits, with speedups ranging from 1.4× to 7× in the best cases. Additionally, we show that this approach achieves similar performance as a comparable offline profiling-based approach after a short startup period, without requiring separate program execution or offline analysis steps. 
    more » « less
  3. null (Ed.)
    With growing access to versatile robotics, it is beneficial for end users to be able to teach robots tasks without needing to code a control policy. One possibility is to teach the robot through successful task executions. However, near-optimal demonstrations of a task can be difficult to provide and even successful demonstrations can fail to capture task aspects key to robust skill replication. Here, we propose a learning from demonstration (LfD) approach that enables learning of robust task definitions without the need for near-optimal demonstrations. We present a novel algorithmic framework for learning task specifications based on the ergodic metric—a measure of information content in motion. Moreover, we make use of negative demonstrations— demonstrations of what not to do—and show that they can help compensate for imperfect demonstrations, reduce the number of demonstrations needed, and highlight crucial task elements improving robot performance. In a proof-of-concept example of cart-pole inversion, we show that negative demonstrations alone can be sufficient to successfully learn and recreate a skill. Through a human subject study with 24 participants, we show that consistently more information about a task can be captured from combined positive and negative (posneg) demonstrations than from the same amount of just positive demonstrations. Finally, we demonstrate our learning approach on simulated tasks of target reaching and table cleaning with a 7-DoF Franka arm. Our results point towards a future with robust, data efficient LfD for novice users. 
    more » « less
  4. Approximation is a technique that optimizes the balance between application outcome quality and its resource usage. Trading quality for performance has been investigated for single application scenarios, but not for environments where multiple approximate applications may run concurrently on the same machine, interfering with each other by sharing machine resources. Applying existing, single application techniques to this multi-programming environment may lead to configuration space size explosion, or result in poor overall application quality outcomes. Our new RAPID-M system is the first cross-application con-figuration management framework. It reduces the problem size by clustering configurations of individual applications into local"similarity buckets". The global cross-applications configuration selection is based on these local bucket spaces. RAPID-M dynamically assigns buckets to applications such that overall quality is maximized while respecting individual application cost budgets. Once assigned a bucket, reconfigurations within buckets may be performed locally with minimal impact on global selections. Experimental results using six configurable applications show that even large configuration spaces of complex applications can be clustered into a small number of buckets, resulting in search space size reductions of up to 9 orders of magnitude for our six applications. RAPID-M constructs performance cost models with an average prediction error of ≤3%. For our application execution traces, RAPID-M dynamically selects configurations that lower the budget violation rate by 33.9% with an average budget exceeding rate of 6.6% as compared to other possible approaches. RAPID-M successfully finishes 22.75% more executions which translates to a 1.52X global output quality increase under high system loads. The overhead of RAPID-M is within ≤1% of application execution times. 
    more » « less
  5. A recent S-STEM award has allowed the engineering program in a rural, liberal arts institution to offer a need-based scholarship program for its students. The engineering program has a number of veteran, underrepresented minority, transfer, and nontraditional students. Many students are also first-generation college students. The institution and engineering program matriculate a number of under-served populations, students who may have needs that are not well understood in the typical engineering education literature. The scholarship program and its associated mentoring and activities will assist workforce development and will also incorporate a number of research avenues to better understand and serve the student population in this unique setting. To apply for the program, students must fill out an application with four 250 – 500 word essay responses relating to their academic progress, perceived barriers to degree completion, and how this award would help them to complete their degree. This initial study seeks to analyze the student applications to explore which students are applying for the new scholarship program and which students are successful in their applications. Responses to the application questions will be analyzed to develop an archetypical applicant, an archetypical successful applicant, and an archetypical unsuccessful applicant. Similar to the IDEO method of creating a specific client to design for, these profiles will not encompass all possible student responses; not all students who would be grouped with the archetype would see themselves in the archetype. Rather, these archetypes will help us to define students that we can use as a model when we are developing programs for the students in the scholarship program and the student body as a whole. These profiles will be presented and used to generate an understanding of which students are likely to choose to apply and which students may be missing out on this opportunity. At this time, the applications are not yet due and the analysis has not yet begun. Initial interest for the grant has been strong and we anticipate at least thirty applications for the nineteen available grants. Results presented will include the three student profiles as well as a report of the perceived barriers to graduation as reported by applicants in their application materials. 
    more » « less