skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.

Title: Augmented Arithmetic Operations Proposed for IEEE-754 2018
Algorithms for extending arithmetic precision through compensated summation or arithmetics like double-double rely on operations commonly called twoSum and twoProduct. The current draft of the IEEE 754 standard specifies these operations under the names augmentedAddition and augmentedMultiplication. These operations were included after three decades of experience because of a motivating new use: bitwise reproducible arithmetic. Standardizing the operations provides a hardware acceleration target that can provide at least a 33 % speed improvements in reproducible dot product, placing reproducible dot product almost within a factor of two of common dot product. This paper provides history and motivation for standardizing these operations. We also define the operations, explain the rationale for all the specific choices, and provide parameterized test cases for new boundary behaviors.  more » « less
Award ID(s):
Author(s) / Creator(s):
Date Published:
Journal Name:
IEEE 25th Symposium on Computer Arithmetic (ARITH)
Page Range / eLocation ID:
45 to 52
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Prior work indicates that children have an untrained ability to approximately calculate using their approximate number system (ANS). For example, children can mentally double or halve a large array of discrete objects. Here, we asked whether children can per-form a true multiplication operation, flexibly attending to both the multiplier and multiplicand, prior to formal multiplication instruc-tion. We presented 5- to 8-year-olds with nonsymbolic multipli-cands (dot arrays) or symbolic multiplicands (Arabic numerals) ranging from 2 to 12 and with nonsymbolic multipliers ranging from 2 to 8. Children compared each imagined product with a vis-ible comparison quantity. Children performed with above-chance accuracy on both nonsymbolic and symbolic approximate multipli-cation, and their performance was dependent on the ratio between the imagined product and the comparison target. Children who could not solve any single-digit symbolic multiplication equations (e.g., 2  3) on a basic math test were nevertheless successful on both our approximate multiplication tasks, indicating that children have an intuitive sense of multiplication that emerges independent of formal instruction about symbolic multiplication. Nonsymbolic multiplication performance mediated the relation between chil-dren’s Weber fraction and symbolic math abilities, suggesting a pathway by which the ANS contributes to children’s emerging symbolic math competence. These findings may inform future educational interventions that allow children to use their basic arithmetic intuition as a scaffold to facilitate symbolic math learning. 
    more » « less
  2. While gliomas have become the most common cancerous brain tumors, manual diagnoses from 3D MRIs are time-consuming and possibly inconsistent when conducted by different radiotherapists, which leads to the pressing demand for automatic segmentation of brain tumors. State-of-the-art approaches employ FCNs to automatically segment the MRI scans. In particular, 3D U-Net has achieved notable performance and motivated a series of subsequent works. However, their significant size and heavy computation have impeded their actual deployment. Although there exists a body of literature on the compression of CNNs using low-precision representations, they either focus on storage reduction without computational improvement or cause severe performance degradation. In this article, we propose a CNN training algorithm that approximates weights and activations using non-negative integers along with trained affine mapping functions. Moreover, our approach allows the dot-product operations to be performed in an integer-arithmetic manner and defers the floating-point decoding and encoding phases until the end of layers. Experimental results on BraTS 2018 show that our trained affine mapping approach achieves near full-precision dice accuracy with 8-bit weights and activations. In addition, we achieve a dice accuracy within 0.005 and 0.01 of the full-precision counterparts when using 4-bit and 2-bit precisions, respectively. 
    more » « less
  3. In-memory-computing (IMC) SRAM architecture has gained significant attention as it achieves high energy efficiency for computing a convolutional neural network (CNN) model [1]. Recent works investigated the use of analog-mixed-signal (AMS) hardware for high area and energy efficiency [2], [3]. However, AMS hardware output is well known to be susceptible to process, voltage, and temperature (PVT) variations, limiting the computing precision and ultimately the inference accuracy of a CNN. We reconfirmed, through the simulation of a capacitor-based IMC SRAM macro that computes a 256D binary dot product, that the AMS computing hardware has a significant root-mean-square error (RMSE) of 22.5% across the worst-case voltage, temperature (Fig. 16.1.1 top left) and 3-sigma process variations (Fig. 16.1.1 top right). On the other hand, we can implement an IMC SRAM macro using robust digital logic [4], which can virtually eliminate the variability issue (Fig. 16.1.1 top). However, digital circuits require more devices than AMS counterparts (e.g., 28 transistors for a mirror full adder [FA]). As a result, a recent digital IMC SRAM shows a lower area efficiency of 6368F2/b (22nm, 4b/4b weight/activation) [5] than the AMS counterpart (1170F2/b, 65nm, 1b/1b) [3]. In light of this, we aim to adopt approximate arithmetic hardware to improve area and power efficiency and present two digital IMC macros (DIMC) with different levels of approximation (Fig. 16.1.1 bottom left). Also, we propose an approximation-aware training algorithm and a number format to minimize inference accuracy degradation induced by approximate hardware (Fig. 16.1.1 bottom right). We prototyped a 28nm test chip: for a 1b/1b CNN model for CIFAR-10 and across 0.5-to-1.1V supply, the DIMC with double-approximate hardware (DIMC-D) achieves 2569F2/b, 932-2219TOPS/W, 475-20032GOPS, and 86.96% accuracy, while for a 4b/1b CNN model, the DIMC with the single-approximate hardware (DIMC-S) achieves 3814F2/b, 458-990TOPS/W 
    more » « less
  4. null (Ed.)
    Abstract Background Significant progress has been made in advancing and standardizing tools for human genomic and biomedical research. Yet, the field of next-generation sequencing (NGS) analysis for microorganisms (including multiple pathogens) remains fragmented, lacks accessible and reusable tools, is hindered by local computational resource limitations, and does not offer widely accepted standards. One such “problem areas” is the analysis of Transposon Insertion Sequencing (TIS) data. TIS allows probing of almost the entire genome of a microorganism by introducing random insertions of transposon-derived constructs. The impact of the insertions on the survival and growth under specific conditions provides precise information about genes affecting specific phenotypic characteristics. A wide array of tools has been developed to analyze TIS data. Among the variety of options available, it is often difficult to identify which one can provide a reliable and reproducible analysis. Results Here we sought to understand the challenges and propose reliable practices for the analysis of TIS experiments. Using data from two recent TIS studies, we have developed a series of workflows that include multiple tools for data de-multiplexing, promoter sequence identification, transposon flank alignment, and read count repartition across the genome. Particular attention was paid to quality control procedures, such as determining the optimal tool parameters for the analysis and removal of contamination. Conclusions Our work provides an assessment of the currently available tools for TIS data analysis. It offers ready to use workflows that can be invoked by anyone in the world using our public Galaxy platform ( ). To lower the entry barriers, we have also developed interactive tutorials explaining details of TIS data analysis procedures at . 
    more » « less
  5. Abstract

    Reliable studies of the long-term dynamics of planetary systems require numerical integrators that are accurate and fast. The challenge is often formidable because the chaotic nature of many systems requires relative numerical error bounds at or close to machine precision (∼10−16, double-precision arithmetic); otherwise, numerical chaos may dominate over physical chaos. Currently, the speed/accuracy demands are usually only met by symplectic integrators. For example, the most up-to-date long-term astronomical solutions for the solar system in the past (widely used in, e.g., astrochronology and high-precision geological dating) have been obtained using symplectic integrators. However, the source codes of these integrators are unavailable. Here I present the symplectic integratororbitN(lean version 1.0) with the primary goal of generating accurate and reproducible long-term orbital solutions for near-Keplerian planetary systems (here the solar system) with a dominant massM0. Among other features,orbitN-1.0includesM0’s quadrupole moment, a lunar contribution, and post-Newtonian corrections (1PN) due toM0(fast symplectic implementation). To reduce numerical round-off errors, Kahan compensated summation was implemented. I useorbitNto provide insight into the effect of various processes on the long-term chaos in the solar system. Notably, 1PN corrections have the opposite effect on chaoticity/stability on a 100 Myr versus Gyr timescale. For the current application,orbitNis about as fast as or faster (factor 1.15–2.6) than comparable integrators, depending on hardware.1

    The orbitN source code (C) is available at

    more » « less