This paper presents a novel method for generating a single polynomial approximation that produces correctly rounded results for all inputs of an elementary function for multiple representations. The generated polynomial approximation has the nice property that the first few lower degree terms produce correctly rounded results for specific representations of smaller bitwidths, which we call progressive performance. To generate such progressive polynomial approximations, we approximate the correctly rounded result and formulate the computation of correctly rounded polynomial approximations as a linear program similar to our prior work on the RLIBM project. To enable the use of resulting polynomial approximations in mainstream libraries, we want to avoid piecewise polynomials with large lookup tables. We observe that the problem of computing polynomial approximations for elementary functions is a linear programming problem in low dimensions, i.e., with a small number of unknowns. We design a fast randomized algorithm for computing polynomial approximations with progressive performance. Our method produces correct and fast polynomials that require a small amount of storage. A few polynomial approximations from our prototype have already been incorporated into LLVMâ€™s math library.
more »
« less
An approach to generate correctly rounded math libraries for new floating point variants
Given the importance of floating point (FP) performance in numerous domains, several new variants of FP and its alternatives have been proposed (e.g., Bfloat16, TensorFloat32, and posits). These representations do not have correctly rounded math libraries. Further, the use of existing FP libraries for these new representations can produce incorrect results. This paper proposes a novel approach for generating polynomial approximations that can be used to implement correctly rounded math libraries. Existing methods generate polynomials that approximate the real value of an elementary function đť‘“ (đť‘Ą) and produce wrong results due to approximation errors and rounding errors in the implementation. In contrast, our approach generates polynomials that approximate the correctly rounded value of đť‘“ (đť‘Ą) (i.e., the value of đť‘“ (đť‘Ą) rounded to the target representation). It provides more margin to identify efficient polynomials that produce correctly rounded results for all inputs. We frame the problem of generating efficient polynomials that produce correctly rounded results as a linear programming problem. Using our approach, we have developed correctly rounded, yet faster, implementations of elementary functions for multiple target representations.
more »
« less
 NSFPAR ID:
 10273626
 Date Published:
 Journal Name:
 Proceedings of the ACM on Programming Languages
 Volume:
 5
 Issue:
 POPL
 ISSN:
 24751421
 Page Range / eLocation ID:
 1 to 30
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


null (Ed.)This paper proposes a set of techniques to develop correctly rounded math libraries for 32bit float and posit types. It enhances our RLIBM approach that frames the problem of generating correctly rounded libraries as a linear programming problem in the context of 16bit types to scale to 32bit types. Specifically, this paper proposes new algorithms to (1) generate polynomials that produce correctly rounded outputs for all inputs using counterexample guided polynomial generation, (2) generate efficient piecewise polynomials with bitpattern based domain splitting, and (3) deduce the amount of freedom available to produce correct results when range reduction involves multiple elementary functions. The resultant math library for the 32bit float type is faster than stateoftheart math libraries while producing the correct output for all inputs. We have also developed a set of correctly rounded elementary functions for 32bit posits.more » « less

This paper proposes fast polynomial evaluation methods for correctly rounded elementary functions generated using our RLibm approach. The resulting functions produce correct results for all inputs with multiple representations and rounding modes. Given an oracle, the RLibm approach approximates the correctly rounded result rather than the real value of an elementary function. A key observation is that there is an interval of real values around the correctly rounded result such that any real value in it rounds to the correct result. This interval is the maximum freedom available to RLibmâ€™s polynomial generation procedure. Subsequently, the problem of generating correctly rounded elementary functions using these intervals can be structured as a linear programming problem. Our prior work on the RLibm approach uses Hornerâ€™s method for polynomial evaluation. This paper explores polynomial evaluation techniques such as Knuthâ€™s coefficient adaptation procedure, parallel execution of operations using Estrinâ€™s procedure, and the use of fused multiplyadd operations in the context of the RLibm approach. If we take the polynomial generated by the RLibm approach and subsequently perform polynomial evaluation optimizations, it results in incorrect results due to rounding errors during polynomial evaluation. Hence, we propose to integrate the fast polynomial evaluation procedure in the RLibmâ€™s polynomial generation process. Our new polynomial evaluation procedure that combines parallel execution with fused multiplyadd operations outperforms the Hornerâ€™s method used by RLibmâ€™s correctly rounded functions. We show the resulting polynomials for 32bit float are not only correct but also faster than prior functions in RLibm by 24%more » « less

Mainstream math libraries for floating point (FP) do not produce correctly rounded results for all inputs. In contrast, CRLIBM and RLIBM provide correctly rounded implementations for a specific FP representation with one rounding mode. Using such libraries for a representation with a new rounding mode or with different precision will result in wrong results due to double rounding. This paper proposes a novel method to generate a single polynomial approximation that produces correctly rounded results for all inputs for multiple rounding modes and multiple precision configurations. To generate a correctly rounded library for n bits, our key idea is to generate a polynomial approximation for a representation with n +2bits using the roundtoodd mode. We prove that the resulting polynomial approximation will produce correctly rounded results for all five rounding modes in the standard and for multiple representations with k bits such that  E  +1 < k â‰¤ n , where  E  is the number of exponent bits in the representation. Similar to our prior work in the RLIBM project, we approximate the correctly rounded result when we generate the library with n +2bits using the roundtoodd mode. We also generate polynomial approximations by structuring it as a linear programming problem but propose enhancements to polynomial generation to handle the roundtoodd mode. Our prototype is the first 32bit float library that produces correctly rounded results with all rounding modes in the IEEE standard for all inputs with a single polynomial approximation. It also produces correctly rounded results for any FP configuration ranging from 10bits to 32bits while also being faster than mainstream libraries.more » « less

This paper proposes, EFTSanitizer, a fast shadow execution framework for detecting and debugging numerical errors during late stages of testing especially for longrunning applications. Any shadow execution framework needs an oracle to compare against the floating point (FP) execution. This paper makes a case for using error free transformations, which is a sequence of operations to compute the error of a primitive operation with existing hardware supported FP operations, as an oracle for shadow execution. Although the error of a single correctly rounded FP operation is bounded, the accumulation of errors across operations can result in exceptions, slow convergences, and even crashes. To ease the job of debugging such errors, EFTSanitizer provides a directed acyclic graph (DAG) that highlights the propagation of errors, which results in exceptions or crashes. Unlike prior work, DAGs produced by EFTSanitizer include operations that span various function calls while keeping the memory usage bounded. To enable the use of such shadow execution tools with longrunning applications, EFTSanitizer also supports starting the shadow execution at an arbitrary point in the dynamic execution, which we call selective shadow execution. EFTSanitizer is an order of magnitude faster than prior stateofart shadow execution tools such as FPSanitizer and Herbgrind. We have discovered new numerical errors and debugged them using EFTSanitizer.more » « less