Unstructured neural network pruning is an effective technique that can significantly reduce theoretical model size, computation demand and energy consumption of large neural networks without compromising accuracy. However, a number of fundamental questions about pruning are not answered yet. For example, do the pruned neural networks contain the same representations as the original network? Is pruning a compression or evolution process? Does pruning only work on trained neural networks? What is the role and value of the uncovered sparsity structure? In this paper, we strive to answer these questions by analyzing three unstructured pruning methods (magnitude based pruning, post-pruning re-initialization, and random sparse initialization). We conduct extensive experiments using the Singular Vector Canonical Correlation Analysis (SVCCA) tool to study and contrast layer representations of pruned and original ResNet, VGG, and ConvNet models. We have several interesting observations: 1) Pruned neural network models evolve to substantially different representations while still maintaining similar accuracy. 2) Initialized sparse models can achieve reasonably good accuracy compared to well engineered pruning methods. 3) Sparsity structures discovered by pruning models are not inherently important or useful.
more »
« less
Greedy Optimization Provably Wins the Lottery: Logarithmic Number of Winning Tickets is Enough
Despite the great success of deep learning, recent works show that large deep neural networks are often highly redundant and can be significantly reduced in size. However, the theoretical question of how much we can prune a neural network given a specified tolerance of accuracy drop is still open. This paper provides one answer to this question by proposing a greedy optimization based pruning method. The proposed method has the guarantee that the discrepancy between the pruned network and the original network decays with exponentially fast rate w.r.t. the size of the pruned network, under weak assumptions that apply for most practical settings. Empirically, our method improves prior arts on pruning various network architectures including ResNet, MobilenetV2/V3 on ImageNet.
more »
« less
- PAR ID:
- 10276253
- Date Published:
- Journal Name:
- Conference on Neural Information Processing Systems (NeurIPS)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
A binary neural network (BNN) is a compact form of neural network. Both the weights and activations in BNNs can be binary values, which leads to a significant reduction in both parameter size and computational complexity compared to their full-precision counterparts. Such reductions can directly translate into reduced memory footprint and computation cost in hardware, making BNNs highly suitable for a wide range of hardware accelerators. However, it is unclear whether and how a BNN can be further pruned for ultimate compactness. As both 0s and 1s are non-trivial in BNNs, it is not proper to adopt any existing pruning method of full- precision networks that interprets 0s as trivial. In this paper, we present a pruning method tailored to BNNs and illustrate that BNNs can be further pruned by using weight flipping frequency as an indicator of sensitivity to accuracy. The experiments performed on the binary versions of a 9- layer Network-in-Network (NIN) and the AlexNet with the CIFAR-10 dataset show that the proposed BNN-pruning method can achieve 20-40% reduction in binary operations with 0.5-1.0% accuracy drop, which leads to a 15-40% run- time speedup on a TitanX GPU.more » « less
-
In this article, we present a low-energy inference method for convolutional neural networks in image classification applications. The lower energy consumption is achieved by using a highly pruned (lower-energy) network if the resulting network can provide a correct output. More specifically, the proposed inference method makes use of two pruned neural networks (NNs), namely mildly and aggressively pruned networks, which are both designed offline. In the system, a third NN makes use of the input data for the online selection of the appropriate pruned network. The third network, for its feature extraction, employs the same convolutional layers as those of the aggressively pruned NN, thereby reducing the overhead of the online management. There is some accuracy loss induced by the proposed method where, for a given level of accuracy, the energy gain of the proposed method is considerably larger than the case of employing any one pruning level. The proposed method is independent of both the pruning method and the network architecture. The efficacy of the proposed inference method is assessed on Eyeriss hardware accelerator platform for some of the state-of-the-art NN architectures. Our studies show that this method may provide, on average, 70% energy reduction compared to the original NN at the cost of about 3% accuracy loss on the CIFAR-10 dataset.more » « less
-
Motivated by both theory and practice, we study how random pruning of the weights affects a neural network's neural tangent kernel (NTK). In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version. The equivalence is established under two cases. The first main result studies the infinite-width asymptotic. It is shown that given a pruning probability, for fully-connected neural networks with the weights randomly pruned at the initialization, as the width of each layer grows to infinity sequentially, the NTK of the pruned neural network converges to the limiting NTK of the original network with some extra scaling. If the network weights are rescaled appropriately after pruning, this extra scaling can be removed. The second main result considers the finite-width case. It is shown that to ensure the NTK's closeness to the limit, the dependence of width on the sparsity parameter is asymptotically linear, as the NTK's gap to its limit goes down to zero. Moreover, if the pruning probability is set to zero (i.e., no pruning), the bound on the required width matches the bound for fully-connected neural networks in previous works up to logarithmic factors. The proof of this result requires developing a novel analysis of a network structure which we called mask-induced pseudo-networks. Experiments are provided to evaluate our results.more » « less
-
Side-channel attacks leverage correlations between power consumption and intermediate encryption results to infer encryption keys. Recent studies show that deep learning offers promising results in the context of side-channel attacks. However, neural networks utilized in deep-learning side-channel attacks are complex with a substantial number of parameters and consume significant memory. As a result, it is challenging to perform deep-learning side-channel attacks on resource-constrained devices. In this paper, we propose a framework, TinyPower, which leverages pruning to reduce the number of neural network parameters for side-channel attacks. Pruned neural networks obtained from our framework can successfully run side-channel attacks with significantly fewer parameters and less memory. Specifically, we focus on structured pruning over filters of Convolutional Neural Networks (CNNs). We demonstrate the effectiveness of structured pruning over power and EM traces of AES-128 running on microcontrollers (AVR XMEGA and ARM STM32) and FPGAs (Xilinx Artix-7). Our experimental results show that we can achieve a reduction rate of 98.8% (e.g., reducing the number of parameters from 53.1 million to 0.59 million) on a CNN and still recover keys on XMEGA. For STM32 and Artix-7, we achieve a reduction rate of 92.9% and 87.3% on a CNN respectively. We also demonstrate that our pruned CNNs can effectively perform the attack phase of side-channel attacks on a Raspberry Pi 4 with less than 2.5 millisecond inference time per trace and less than 41 MB memory usage per CNN.more » « less
An official website of the United States government

