skip to main content


Title: Posits and the state of numerical representations in the age of exascale and edge computing
Abstract

Growing constraints on memory utilization, power consumption, and I/O throughput have increasingly become limiting factors to the advancement of high performance computing (HPC) and edge computing applications. IEEE‐754 floating‐point types have been the de facto standard for floating‐point number systems for decades, but the drawbacks of this numerical representation leave much to be desired. Alternative representations are gaining traction, both in HPC and machine learning environments. Posits have recently been proposed as a drop‐in replacement for the IEEE‐754 floating‐point representation. We survey the state‐of‐the‐art and state‐of‐the‐practice in the development and use of posits in edge computing and HPC. The current literature supports posits as a promising alternative to traditional floating‐point systems, both as a stand‐alone replacement and in a mixed‐precision environment. Development and standardization of the posit type is ongoing, and much research remains to explore the application of posits in different domains, how to best implement them in hardware, and where they fit with other numerical representations.

 
more » « less
Award ID(s):
1910197 1943114
NSF-PAR ID:
10367705
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Software: Practice and Experience
Volume:
52
Issue:
2
ISSN:
0038-0644
Page Range / eLocation ID:
p. 619-635
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Lossy compression techniques are ubiquitous in many fields including imagery and video; however, the incursion of such lossy compression techniques in the computational fluid dynamics community has not advanced to the same extent in decades. In this work, the lossy compression of high-fidelity direct numerical simulation (DNS) is evaluated to assess the impact on various parameters of engineering interest. A Mach 2.5, spatially developing turbulent boundary layer (SDTBL) at a moderately high Reynolds number has been selected as the subject of the study. The ZFP compression scheme was chosen as the core driving algorithm for this study as it was carefully crafted for scientific, floating point data. The resilience of spectral quantities as well as two-point correlations is highlighted. Notwithstanding, we also noted that point-wise values calculated in the physical domain were prone to quantization errors at high compression ratios. Further, we have also presented the impact on higher order statistics. In summary, we have demonstrated that high fidelity results are within reach while achieving 1.45x to 9.82x reductions in required storage over single precision, IEEE 754-compliant data values. 
    more » « less
  2. Abstract

    The Fourier domain acceleration search (FDAS) is an effective technique for detecting faint binary pulsars in large radio astronomy data sets. This paper quantifies the sensitivity impact of reducing numerical precision in the graphics processing unit (GPU)-accelerated FDAS pipeline of the AstroAccelerate (AA) software package. The prior implementation used IEEE-754 single-precision in the entire binary pulsar detection pipeline, spending a large fraction of the runtime computing GPU-accelerated fast Fourier transforms. AA has been modified to use bfloat16 (and IEEE-754 double-precision to provide a “gold standard” comparison) within the Fourier domain convolution section of the FDAS routine. Approximately 20,000 synthetic pulsar filterbank files representing binary pulsars were generated using SIGPROC with a range of physical parameters. They have been processed using bfloat16, single-precision, and double-precision convolutions. All bfloat16 peaks are within 3% of the predicted signal-to-noise ratio of their corresponding single-precision peaks. Of 14,971 “bright” single-precision fundamental peaks above a power of 44.982 (our experimentally measured highest noise value), 14,602 (97.53%) have a peak in the same acceleration and frequency bin in the bfloat16 output plane, while in the remaining 369 the nearest peak is located in the adjacent acceleration bin. There is no bin drift measured between the single- and double-precision results. The bfloat16 version of FDAS achieves a speedup of approximately 1.6× compared to single-precision. A comparison between AA and the PRESTO software package is presented using observations collected with the GMRT of PSR J1544+4937, a 2.16 ms black widow pulsar in a 2.8 hr compact orbit.

     
    more » « less
  3. Posit is a recently proposed alternative to the floating point representation (FP). It provides tapered accuracy. Given a fixed number of bits, the posit representation can provide better precision for some numbers compared to FP, which has generated significant interest in numerous domains. Being a representation with tapered accuracy, it can introduce high rounding errors for numbers outside the above golden zone. Programmers currently lack tools to detect and debug errors while programming with posits. This paper presents PositDebug, a compile-time instrumentation that performs shadow execution with high pre- cision values to detect various errors in computation using posits. To assist the programmer in debugging the reported error, PositDebug also provides directed acyclic graphs of instructions, which are likely responsible for the error. A contribution of this paper is the design of the metadata per memory location for shadow execution that enables productive debugging of errors with long-running programs. We have used PositDebug to detect and debug errors in various numerical applications written using posits. To demonstrate that these ideas are applicable even for FP programs, we have built a shadow execution framework for FP programs that is an order of magnitude faster than Herbgrind. 
    more » « less
  4. null (Ed.)
    Given the importance of floating point (FP) performance in numerous domains, several new variants of FP and its alternatives have been proposed (e.g., Bfloat16, TensorFloat32, and posits). These representations do not have correctly rounded math libraries. Further, the use of existing FP libraries for these new representations can produce incorrect results. This paper proposes a novel approach for generating polynomial approximations that can be used to implement correctly rounded math libraries. Existing methods generate polynomials that approximate the real value of an elementary function 𝑓 (𝑥) and produce wrong results due to approximation errors and rounding errors in the implementation. In contrast, our approach generates polynomials that approximate the correctly rounded value of 𝑓 (𝑥) (i.e., the value of 𝑓 (𝑥) rounded to the target representation). It provides more margin to identify efficient polynomials that produce correctly rounded results for all inputs. We frame the problem of generating efficient polynomials that produce correctly rounded results as a linear programming problem. Using our approach, we have developed correctly rounded, yet faster, implementations of elementary functions for multiple target representations. 
    more » « less
  5. Floating-point types are notorious for their intricate representation. The effective use of mixed precision, i.e., using various precisions in different computations, is critical to achieve a good balance between accuracy and performance. Unfortunately, reasoning about mixed precision is difficult even for numerical experts. Techniques have been proposed to systematically search over floating-point variables and/or program instructions to find a faster, mixed-precision version of a given program. These techniques, however, are characterized by their black box nature, and face scalability limitations due to the large search space. In this paper, we exploit the community structure of floating-point variables to devise a scalable hierarchical search for precision tuning. Specifically, we perform dependence analysis and edge profiling to create a weighted dependence graph that presents a network of floating-point variables. We then formulate hierarchy construction on the network as a community detection problem, and present a hierarchical search algorithm that iteratively lowers precision with regard to communities. We implement our algorithm in the tool HiFPTuner, and show that it exhibits higher search efficiency over the state of the art for 75.9% of the experiments taking 59.6% less search time on average. Moreover, HiFPTuner finds more profitable configurations for 51.7% of the experiments, with one known to be as good as the global optimum found through exhaustive search. 
    more » « less