NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

An SMT Formalization of Mixed-Precision Matrix Multiplication: Modeling Three Generations of Tensor Cores

Valpey, Benjamin; Li, Xinyi; Pai, Sreepathi; Gopalakrishnan, Ganesh (June 2025, NASA Formal Methods)
Titolo, Laura (Ed.)
Many recent computational accelerators provide non-standard (e.g., reduced precision) arithmetic operations to enhance performance for floating-point matrix multiplication. Unfortunately, the properties of these accelerators are not widely understood and lack sufficient descriptions of their behavior. This makes it difficult for tool builders beyond the original vendor to target or simulate the hardware correctly, or for algorithm designers to be confident in their code. To address these gaps, prior studies have probed the behavior of these units with manually crafted tests. Such tests are cumbersome to design, and adapting them as the accelerators evolve requires repeated manual effort. We present a formal model for the tensor cores of NVIDIA’s Volta, Turing, and Ampere GPUs. We identify specific properties—rounding mode, precision, and accumulation order—that drive these cores’ behavior. We formalize these properties and then use the formalization to automatically generate discriminating inputs that illustrate differences among machines. Our results confirm many of the findings of previous tensor core studies, but also identify subtle disagreements. In particular, NVIDIA’s machines do not, as previously reported, use round-to-zero for accumulation, and their 5-term accumulator requires 3 extra carry-out bits for full accuracy. Using our formal model, we analyze two existing algorithms that use half-precision tensor cores to accelerate single-precision multiplication with error correction. Our analysis reveals that the newer algorithm, designed to be more accurate than the first, is actually less accurate for certain inputs.
more » « less
Free, publicly-accessible full text available June 12, 2026
Mixing Condition Numbers and Oracles for Accurate Floating-point Debugging

Kulkarni, Bhargav; Panchekha, Pavel (May 2025, IEEE ARITH)
Melquiond, Guillaume; Tang, Ping_Tak_Peter (Ed.)
Recent advances have made numeric debugging tools much faster by using double-double oracles, and numeric analysis tools much more accurate by using condition numbers. But these techniques have downsides: double-double oracles have correlated error so miss floating-point errors while condition numbers cannot cleanly handle over- and underflow. We combine both techniques to avoid these downsides. Our combination, EXPLANIFLOAT, computes condition numbers using double- double arithmetic, which avoids correlated errors. To handle over- and underflow, it introduces a separate logarithmic oracle. As a result, EXPLANIFLOAT achieves a precision of 80.0% and a recall of 96.1% on a collection of 546 difficult numeric benchmarks: more accurate than double-double oracles yet dramatically faster than arbitrary-precision condition number computations.
more » « less
Free, publicly-accessible full text available May 6, 2026
Rigorous Error Analysis for Logarithmic Number Systems

Nguyen, Thanh_Son; Solovyev, Alexey; Arnold, Mark G; Gopalakrishnan, Ganesh (May 2025, IEEE ARITH)
Melquiond, Guillaume; Tang, Ping_Tak_Peter (Ed.)
Theorem proving demonstrates promising potential for verifying problems beyond the capabilities of SMT-solver-based verification tools. We explore and showcase the capability of Lean, an increasingly popular theorem-proving tool, in deriving the error bounds of table-based Logarithmic Number Systems (LNS). LNS reduces the number of bits needed to represent a high dynamic range of real numbers with finite precision and efficiently performs multiplication and division. However, in LNS, addition and subtraction become non-linear functions that must be approximated—typically using precomputed look-up tables. We provide the first rigorous analysis of LNS that covers first-order Taylor approximation, cotransformation techniques inspired by European Logarithmic Microprocessor, and the errors introduced by fixed-point arithmetic involved in LNS implementations. By analyzing all error sources and deriving symbolic error bounds for each, then accumulating these to obtain the final error bound, we prove the correctness of these bounds using Lean and its Mathlib library. We empirically validate our analysis using an exhaustive Python implementation, demonstrating that our analytical interpolation bounds are tight, and our analytical cotransformation bounds overestimate between one and two bits.
more » « less
Free, publicly-accessible full text available May 5, 2026
Target-Aware Implementation of Real Expressions

https://doi.org/10.1145/3669940.3707277

Saiki, Brett; Brough, Jackson; Regehr, Jonas; Ponce, Jesus; Pradeep, Varun; Akhileshwaran, Aditya; Tatlock, Zachary; Panchekha, Pavel (March 2025, ACM)

Free, publicly-accessible full text available March 30, 2026

Search for: All records