skip to main content


Title: Low-Precision Arithmetic for Fast Gaussian Processes
Low-precision arithmetic has had a transformative effect on the training of neural networks, reducing computation, memory and energy requirements. However, despite its promise, low-precision arithmetic has received little attention for Gaussian processes (GPs), largely because GPs require sophisticated linear algebra routines that are unstable in low-precision. We study the different failure modes that can occur when training GPs in half precision. To circumvent these failure modes, we propose a multi-faceted approach involving conjugate gradients with re-orthogonalization, mixed precision, and preconditioning. Our approach significantly improves the numerical stability and practical performance of conjugate gradients in low- precision over a wide range of settings, enabling GPs to train on 1.8 million data points in 10 hours on a single GPU, without any sparse approximations.  more » « less
Award ID(s):
1922658
NSF-PAR ID:
10350918
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
UAI 2022.
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. While gliomas have become the most common cancerous brain tumors, manual diagnoses from 3D MRIs are time-consuming and possibly inconsistent when conducted by different radiotherapists, which leads to the pressing demand for automatic segmentation of brain tumors. State-of-the-art approaches employ FCNs to automatically segment the MRI scans. In particular, 3D U-Net has achieved notable performance and motivated a series of subsequent works. However, their significant size and heavy computation have impeded their actual deployment. Although there exists a body of literature on the compression of CNNs using low-precision representations, they either focus on storage reduction without computational improvement or cause severe performance degradation. In this article, we propose a CNN training algorithm that approximates weights and activations using non-negative integers along with trained affine mapping functions. Moreover, our approach allows the dot-product operations to be performed in an integer-arithmetic manner and defers the floating-point decoding and encoding phases until the end of layers. Experimental results on BraTS 2018 show that our trained affine mapping approach achieves near full-precision dice accuracy with 8-bit weights and activations. In addition, we achieve a dice accuracy within 0.005 and 0.01 of the full-precision counterparts when using 4-bit and 2-bit precisions, respectively. 
    more » « less
  2. Abstract The connectivity of modern vehicles allows for the monitoring and analysis of a large amount of sensor data from vehicles during their normal operations. In recent years, there has been a growing interest in utilizing this data for the purposes of predictive maintenance. In this paper, a multi-label transfer learning approach is proposed using 14 different pretrained convolutional neural networks retrained with engine simulation data to predict the failure conditions of a selected set of engine components. The retrained classifier networks are designed such that concurrent failure modes of an exhaust gas recirculation, compressor, intercooler, and fuel injectors of a four-cylinder diesel engine can be identified. Time-series simulation data of various failure conditions, which include performance degradation, are generated to retrain the classifier networks to predict which components are failing at any given time. The test results of the retrained classifier networks show that the overall classification performance is good, with the normalized value of mean average precision varying from 0.6 to 0.65 for most of the retrained networks. To the best of the authors’ knowledge, this work represents the first attempt to characterize such time-series data utilizing a multi-label deep learning approach. 
    more » « less
  3. Abstract

    Geopolymers (GPs) are emerging, low‐density ceramic materials that are simple to manufacture, with high elastic modulus and strength, albeit with low toughness. Fiber reinforcements have been used to achieve varied ductile behaviors, but little is known about the GP addition to polymeric frame structures. Thus, drawing inspiration from the nanostructure of bones, this paper investigated an interpenetrating, co‐continuous composite consisting of a GP as the stiff but brittle phase, and a 3D‐printed polymer (PA12 White) as the soft and deformable phase. The composite mechanical properties and failure modes were studied experimentally using uniaxial compression and four‐point bending tests. The co‐continuous network constrained brittle cracking within the GP and reduced strain localization in the polymer. The results showed that the composite had higher strength (56.11 ± 2.12 MPa) and elastic modulus (6.08 ± 1.37 GPa) than the 3D‐printed polymer and had higher toughness (5.98 ± 0.24 MJ/mm3) than the GP for the specific geometries examined. The shape effect study demonstrated that cubic structures had higher elastic modulus and strength but at the expense of lower toughness when compared to rectangular prism structures. The study of scale effects indicated that increasing the number of periodic unit cells while maintaining consistent bulk dimensions led to augmented strength and toughness, albeit without statistically significant alterations in elastic modulus. Thus, this paper presents an experimental realization of a novel, bio‐inspired, interpenetrating, GP–polymer composite design, offering improved strength and toughness. It also provides valuable insights into the shape and size effects on the mechanical properties of this new composite.

     
    more » « less
  4. In-memory-computing (IMC) SRAM architecture has gained significant attention as it achieves high energy efficiency for computing a convolutional neural network (CNN) model [1]. Recent works investigated the use of analog-mixed-signal (AMS) hardware for high area and energy efficiency [2], [3]. However, AMS hardware output is well known to be susceptible to process, voltage, and temperature (PVT) variations, limiting the computing precision and ultimately the inference accuracy of a CNN. We reconfirmed, through the simulation of a capacitor-based IMC SRAM macro that computes a 256D binary dot product, that the AMS computing hardware has a significant root-mean-square error (RMSE) of 22.5% across the worst-case voltage, temperature (Fig. 16.1.1 top left) and 3-sigma process variations (Fig. 16.1.1 top right). On the other hand, we can implement an IMC SRAM macro using robust digital logic [4], which can virtually eliminate the variability issue (Fig. 16.1.1 top). However, digital circuits require more devices than AMS counterparts (e.g., 28 transistors for a mirror full adder [FA]). As a result, a recent digital IMC SRAM shows a lower area efficiency of 6368F2/b (22nm, 4b/4b weight/activation) [5] than the AMS counterpart (1170F2/b, 65nm, 1b/1b) [3]. In light of this, we aim to adopt approximate arithmetic hardware to improve area and power efficiency and present two digital IMC macros (DIMC) with different levels of approximation (Fig. 16.1.1 bottom left). Also, we propose an approximation-aware training algorithm and a number format to minimize inference accuracy degradation induced by approximate hardware (Fig. 16.1.1 bottom right). We prototyped a 28nm test chip: for a 1b/1b CNN model for CIFAR-10 and across 0.5-to-1.1V supply, the DIMC with double-approximate hardware (DIMC-D) achieves 2569F2/b, 932-2219TOPS/W, 475-20032GOPS, and 86.96% accuracy, while for a 4b/1b CNN model, the DIMC with the single-approximate hardware (DIMC-S) achieves 3814F2/b, 458-990TOPS/W 
    more » « less
  5. Deformable Convolutional Networks (DCN) have been proposed as a powerful tool to boost the representation power of Convolutional Neural Networks (CNN) in computer vision tasks via adaptive sampling of the input feature map. Much like vision transformers, DCNs utilize a more flexible inductive bias than standard CNNs and have also been shown to improve performance of particular models. For example, drop-in DCN layers were shown to increase the AP score of Mask RCNN by 10.6 points while introducing only 1% additional parameters and FLOPs, improving the state-of-the art model at the time of publication. However, despite evidence that more DCN layers placed earlier in the network can further improve performance, we have not seen this trend continue with further scaling of deformations in CNNs, unlike for vision transformers. Benchmarking experiments show that a realistically sized DCN layer (64H×64W, 64 in-out channel) incurs a 4× slowdown on a GPU platform, discouraging the more ubiquitous use of deformations in CNNs. These slowdowns are caused by the irregular input-dependent access patterns of the bilinear interpolation operator, which has a disproportionately low arithmetic intensity (AI) compared to the rest of the DCN. To address the disproportionate slowdown of DCNs and enable their expanded use in CNNs, we propose DefT, a series of workload-aware optimizations for DCN kernels. DefT identifies performance bottlenecks in DCNs and fuses specific operators that are observed to limit DCN AI. Our approach also uses statistical information of DCN workloads to adapt the workload tiling to the DCN layer dimensions, minimizing costly out-of-boundary input accesses. Experimental results show that DefT mitigates up to half of DCN slowdown over the current-art PyTorch implementation. This translates to a layerwise speedup of up to 134% and a reduction of normalized training time of 46% on a fully DCN-enabled ResNet model. 
    more » « less