Successful HPC software applications are long-lived. When ported across machines and their compilers, these applications often produce different numerical results, many of which are unacceptable. Such variability is also a concern while optimizing the code more aggressively to gain performance. Efficient tools that help locate the program units (files and functions) within which most of the variability occurs are badly needed, both to plan for code ports and to root-cause errors due to variability when they happen in the field. In this work, we offer an enhanced version of the open-source testing framework FLiT to serve these roles. Key new features of FLiT include a suite of bisection algorithms that help locate the root causes of variability. Another added feature allows an analysis of the tradeoffs between performance and the degree of variability. Our new contributions also include a collection of case studies. Results on the MFEM finite-element library include variability/performance tradeoffs, and the identification of a (hitherto unknown) abnormal level of result-variability even under mild compiler optimizations. Results from studying the Laghos proxy application include identifying a significantly divergent floating-point result-variability and successful root-causing down to the problematic function over as little as 14 program executions. Finally, in an evaluation of 4,376 controlled injections of floating-point perturbations on the LULESH proxy application, we showed that the FLiT framework has 100% precision and recall in discovering the file and function locations of the injections all within an average of only 15 program executions.
more »
« less
FLoAT : Framework for Workflow Analysis and Transformation
New abstractions and frameworks are born when one creates hard-coded solutions to important tasks, regardless of whether they scale or result in software that can be meaningfully released. This paper describes our experience creating such a light-weight framework out of a previous tool effort FLiT for detecting compiler-induced numerical variability. The resulting framework FLOAT has already helped us better understand and fix performance bugs in FLiT. Our design of FLOAT and the ways in which we anticipate it enabling the adoption and re-purposing of FLiT, though likely not exhaustive, are described. We also express our views on the appropriate scope of such an approach, especially given that variations of compilation, linking, and execution abound, and specializing in that domain may be advantageous in the long-term as opposed to investing in an overly generalized paradigm.
more »
« less
- Award ID(s):
- 1918497
- PAR ID:
- 10294469
- Date Published:
- Journal Name:
- Correctness 2021: Fifth International Workshop on Software Correctness for HPC Applications
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Non-volatile random access memory (NVRAM) offers byte-addressable persistence at speeds comparable to DRAM. However, with caches remaining volatile, automatic cache evictions can reorder updates to memory, potentially leaving persistent memory in an inconsistent state upon a system crash. Flush and fence instructions can be used to force ordering among updates, but are expensive. This has motivated significant work studying how to write correct and efficient persistent programs for NVRAM. In this paper, we present FliT, a C++ library that facilitates writing efficient persistent code. Using the library's default mode makes any linearizable data structure durable with minimal changes to the code. FliT avoids many redundant flush instructions by using a novel algorithm to track dirty cache lines. It also allows for extra optimizations, but achieves good performance even in its default setting. To describe the FliT library's capabilities and guarantees, we define a persistent programming interface, called the P-V Interface, which FliT implements. The P-V Interface captures the expected behavior of code in which some instructions' effects are persisted and some are not. We show that the interface captures the desired semantics of many practical algorithms in the literature. We apply the FliT library to four different persistent data structures, and show that across several workloads, persistence implementations, and data structure sizes, the FliT library always improves operation throughput, by at least 2.1X over a naive implementation in all but one workload.more » « less
-
Abstract The lack of continuous spatial and temporal sampling of hydrographic measurements in large parts of the Arctic Ocean remains a major obstacle for quantifying mean state and variability of the Arctic Ocean circulation. This shortcoming motivates an assessment of the utility of Argo-type floats, the challenges of deploying such floats due to the presence of sea ice, and the implications of extended times of no surfacing on hydrographic inferences. Within the framework of an Arctic coupled ocean–sea ice state estimate that is constrained to available satellite and in situ observations, we establish metrics for quantifying the usefulness of such floats. The likelihood of float surfacing strongly correlates with the annual sea ice minimum cover. Within the float lifetime of 4–5 years, surfacing frequency ranges from 10–100 days in seasonally sea ice–covered regions to 1–3 years in multiyear sea ice–covered regions. The longer the float drifts under ice without surfacing, the larger the uncertainty in its position, which translates into larger uncertainties in hydrographic measurements. Below the mixed layer, especially in the western Arctic, normalized errors remain below 1, suggesting that measurements along a path whose only known positions are the beginning and end points can help constrain numerical models and reduce hydrographic uncertainties. The error assessment presented is a first step in the development of quantitative methods for guiding the design of observing networks. These results can and should be used to inform a float network design with suggested locations of float deployment and associated expected hydrographic uncertainties.more » « less
-
Abstract Methods commonly used to estimate net primary production (NPP) from satellite observations are now being applied to biogeochemical (BGC) profiling float observations. Insights can be gained from regional differences in float and satellite NPP estimates that reveal gaps in our understanding and guide future NPP model development. We use 7 years of BGC profiling float data from the Northeast Pacific Ocean to quantify discrepancies between float and satellite NPP estimates and decompose them into contributions associated with the platform sensing method and depth resolution of observations. We find small, systematic seasonal discrepancies in the depth‐integrated NPP (iNPP) but much larger (>±100%) discrepancies in depth‐resolved NPP. Annual iNPP estimates from the two platforms are significantly, positively correlated, suggesting that they similarly track interannual variability in the study region. Using the long‐term satellite iNPP record, we identify elevated annual iNPP during two recent marine heatwaves and gain insights about ecosystem functionality.more » « less
-
Abstract Profiles of oxygen measurements from Argo profiling floats now vastly outnumber shipboard profiles. To correct for drift, float oxygen data are often initially adjusted to deployment casts, ship‐based climatologies, or, recently, measurements of atmospheric oxygen for in situ calibration. Air calibration enables accurate measurements in the upper ocean but may not provide similar accuracy at depth. Using a quality controlled shipboard data set, we find that the entire Argo oxygen data set is offset relative to shipboard measurements (float minus ship) at pressures of 1,450–2,000 db by a median of −1.9 μmol kg−1(mean ± SD of −1.9 ± 3.9, 95% confidence interval around the mean of {−2.2, −1.6}) and air‐calibrated floats are offset by −2.7 μmol kg−1(−3.0 ± 3.4 (CI95%{−3.7, −2.4}). The difference between float and shipboard oxygen is likely due to offsets in the float oxygen data and not oxygen changes at depth or biases in the shipboard data set. In addition to complicating the calculation of long‐term ocean oxygen changes, these float oxygen offsets impact the adjustment of float nitrate and pH measurements, therefore biasing important derived quantities such as the partial pressure of CO2(pCO2) and dissolved inorganic carbon. Correcting floats with air‐calibrated oxygen sensors for the float‐ship oxygen offsets alters float pH by a median of 3.0 mpH (3.1 ± 3.7) and float‐derived surfacepCO2by −3.2 μatm (−3.2 ± 3.9). This adjustment to floatpCO2represents half, or more, of the bias in float‐derivedpCO2reported in studies comparing floatpCO2to shipboardpCO2measurements.more » « less
An official website of the United States government

