- Award ID(s):
- 1929300
- PAR ID:
- 10169579
- Date Published:
- Journal Name:
- Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
We consider the post-training quantization problem, which discretizes the weights of pre-trained deep neural networks without re-training the model. We propose multipoint quantization, a quantization method that approximates a full-precision weight vector using a linear combination of multiple vectors of low-bit numbers; this is in contrast to typical quantization methods that approximate each weight using a single low precision number. Computationally, we construct the multipoint quantization with an efficient greedy selection procedure, and adaptively decides the number of low precision points on each quantized weight vector based on the error of its output. This allows us to achieve higher precision levels for important weights that greatly influence the outputs, yielding an 'effect of mixed precision' but without physical mixed precision implementations (which requires specialized hardware accelerators). Empirically, our method can be implemented by common operands, bringing almost no memory and computation overhead. We show that our method outperforms a range of state-of-the-art methods on ImageNet classification and it can be generalized to more challenging tasks like PASCAL VOC object detection.more » « less
-
With the recent demand of deploying neural network models on mobile and edge devices, it is desired to improve the model's generalizability on unseen testing data, as well as enhance the model's robustness under fixed-point quantization for efficient deployment. Minimizing the training loss, however, provides few guarantees on the generalization and quantization performance. In this work, we fulfill the need of improving generalization and quantization performance simultaneously by theoretically unifying them under the framework of improving the model's robustness against bounded weight perturbation and minimizing the eigenvalues of the Hessian matrix with respect to model weights. We therefore propose HERO, a Hessian-enhanced robust optimization method, to minimize the Hessian eigenvalues through a gradient-based training process, simultaneously improving the generalization and quantization performance. HERO enables up to a 3.8% gain on test accuracy, up to 30% higher accuracy under 80% training label perturbation, and the best post-training quantization accuracy across a wide range of precision, including a > 10% accuracy improvement over SGD-trained models for common model architectures on various datasets.more » « less
-
Abstract Increasing ocean temperatures have widespread consequences for coral reefs, one of which is coral bleaching. We analyzed a global network of associations between coral species and Symbiodiniaceae for resistance to temperature stress and robustness to perturbations. Null networks were created by changing either the physiological parameters of the nodes or the structures of the networks. We developed a bleaching model in which each link, association, is given a weight based on temperature thresholds for specific host–symbiont pairs and links are removed as temperature increases. Resistance to temperature stress was determined from the response of the networks to the bleaching model. Ecological robustness, defined by how much perturbation is needed to decrease the number of nodes by 50%, was determined for multiple removal models that considered traits of the hosts, symbionts, and their associations. Network resistance to bleaching and robustness to perturbations differed from the null networks and varied across spatial scales, supporting that thermal tolerances, local association patterns, and environment play an important role in network persistence. Networks were more robust to attacks on associations than to attacks on species. Although the global network was fairly robust to random link removals, when links are removed according to the bleaching model, robustness decreases by about 20%. Specific environmental attacks, in the form of increasing temperatures, destabilize the global network of coral species and Symbiodiniaceae. On a global scale, the network was more robust to removals of links with susceptible Symbiodiniaceae than it was to removals of links with susceptible hosts. Thus, the symbionts convey more stability to the symbiosis than the hosts when the system is under an environmental attack. However, our results also provide evidence that the environment of the networks affects robustness to link perturbations. Our work shows that ecological resistance and robustness can be assessed through network analysis that considers specific biological traits and functional weaknesses. The global network of associations between corals and Symbiodiniaceae and its distribution of thermal tolerances are non‐random, and the evolution of this architecture has led to higher sensitivity to environmental perturbations.
-
Deep convolutional neural network (DNN) has demonstrated phenomenal success and been widely used in many computer vision tasks. However, its enormous model size and high computing complexity prohibits its wide deployment into resource limited embedded system, such as FPGA and mGPU. As the two most widely adopted model compression techniques, weight pruning and quantization compress DNN model through introducing weight sparsity (i.e., forcing partial weights as zeros) and quantizing weights into limited bit-width values, respectively. Although there are works attempting to combine the weight pruning and quantization, we still observe disharmony between weight pruning and quantization, especially when more aggressive compression schemes (e.g., Structured pruning and low bit-width quantization) are used. In this work, taking FPGA as the test computing platform and Processing Elements (PE) as the basic parallel computing unit, we first propose a PE-wise structured pruning scheme, which introduces weight sparsification with considering of the architecture of PE. In addition, we integrate it with an optimized weight ternarization approach which quantizes weights into ternary values ({-1,0,+1}), thus converting the dominant convolution operations in DNN from multiplication-and-accumulation (MAC) to addition-only, as well as compressing the original model (from 32-bit floating point to 2-bit ternary representation) by at least 16 times. Then, we investigate and solve the coexistence issue between PE-wise Structured pruning and ternarization, through proposing a Weight Penalty Clipping (WPC) technique with self-adapting threshold. Our experiment shows that the fusion of our proposed techniques can achieve the best state-of-the-art ∼21× PE-wise structured compression rate with merely 1.74%/0.94% (top-1/top-5) accuracy degradation of ResNet-18 on ImageNet dataset.more » « less
-
The high computation and memory storage of large deep neural networks (DNNs) models pose intensive challenges to the conventional Von-Neumann architecture, incurring substantial data movements in the memory hierarchy. The memristor crossbar array has emerged as a promising solution to mitigate the challenges and enable low-power acceleration of DNNs. Memristor-based weight pruning and weight quantization have been separately investigated and proven effectiveness in reducing area and power consumption compared to the original DNN model. However, there has been no systematic investigation of memristor-based neuromorphic computing (NC) systems considering both weight pruning and weight quantization. In this paper, we propose an unified and systematic memristor-based framework considering both structured weight pruning and weight quantization by incorporating alternating direction method of multipliers (ADMM) into DNNs training. We consider hardware constraints such as crossbar blocks pruning, conductance range, and mismatch between weight value and real devices, to achieve high accuracy and low power and small area footprint. Our framework is mainly integrated by three steps, i.e., memristor- based ADMM regularized optimization, masked mapping and retraining. Experimental results show that our proposed frame- work achieves 29.81× (20.88×) weight compression ratio, with 98.38% (96.96%) and 98.29% (97.47%) power and area reduction on VGG-16 (ResNet-18) network where only have 0.5% (0.76%) accuracy loss, compared to the original DNN models. We share our models at anonymous link http://bit.ly/2Jp5LHJ .more » « less