skip to main content


Title: A statistical approach for neural network pruning with application to internet of things
Abstract

Pruning is showing huge potential for compressing and accelerating deep neural networks by eliminating redundant parameters. Along with more terminal chips integrated with AI accelerators for internet of things (IoT) devices, structured pruning is gaining popularity with the edge computing research area. Different from filter pruning and group-wise pruning, stripe-wise pruning (SWP) conducts pruning at the level of stripes in each filter. By introducing filter skeleton (FS) to each stripe, the existing SWP method sets an absolute threshold for the values in FS and removes the stripes whose corresponding values in FS could not meet the threshold. Starting with investigation into the process of stripe wise convolution, we use the statistical properties of the weights located on each stripe to learn the importance between those stripes in a filter and remove stripes with low importance. Our pruned VGG-16 achieves the existing results by a fourfold reduction in parameter with only 0.4% decrease in accuracy. Results from comprehensive experiments on IoT devices are also presented.

 
more » « less
NSF-PAR ID:
10416329
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
EURASIP Journal on Wireless Communications and Networking
Volume:
2023
Issue:
1
ISSN:
1687-1499
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Deep convolutional neural network (DNN) has demonstrated phenomenal success and been widely used in many computer vision tasks. However, its enormous model size and high computing complexity prohibits its wide deployment into resource limited embedded system, such as FPGA and mGPU. As the two most widely adopted model compression techniques, weight pruning and quantization compress DNN model through introducing weight sparsity (i.e., forcing partial weights as zeros) and quantizing weights into limited bit-width values, respectively. Although there are works attempting to combine the weight pruning and quantization, we still observe disharmony between weight pruning and quantization, especially when more aggressive compression schemes (e.g., Structured pruning and low bit-width quantization) are used. In this work, taking FPGA as the test computing platform and Processing Elements (PE) as the basic parallel computing unit, we first propose a PE-wise structured pruning scheme, which introduces weight sparsification with considering of the architecture of PE. In addition, we integrate it with an optimized weight ternarization approach which quantizes weights into ternary values ({-1,0,+1}), thus converting the dominant convolution operations in DNN from multiplication-and-accumulation (MAC) to addition-only, as well as compressing the original model (from 32-bit floating point to 2-bit ternary representation) by at least 16 times. Then, we investigate and solve the coexistence issue between PE-wise Structured pruning and ternarization, through proposing a Weight Penalty Clipping (WPC) technique with self-adapting threshold. Our experiment shows that the fusion of our proposed techniques can achieve the best state-of-the-art ∼21× PE-wise structured compression rate with merely 1.74%/0.94% (top-1/top-5) accuracy degradation of ResNet-18 on ImageNet dataset. 
    more » « less
  2. Fu, Feng (Ed.)
    When two streams of pedestrians cross at an angle, striped patterns spontaneously emerge as a result of local pedestrian interactions. This clear case of self-organized pattern formation remains to be elucidated. In counterflows, with a crossing angle of 180°, alternating lanes of traffic are commonly observed moving in opposite directions, whereas in crossing flows at an angle of 90°, diagonal stripes have been reported. Naka (1977) hypothesized that stripe orientation is perpendicular to the bisector of the crossing angle. However, studies of crossing flows at acute and obtuse angles remain underdeveloped. We tested the bisector hypothesis in experiments on small groups (18-19 participants each) crossing at seven angles (30° intervals), and analyzed the geometric properties of stripes. We present two novel computational methods for analyzing striped patterns in pedestrian data: (i) an edge-cutting algorithm, which detects the dynamic formation of stripes and allows us to measure local properties of individual stripes; and (ii) a pattern-matching technique, based on the Gabor function, which allows us to estimate global properties (orientation and wavelength) of the striped pattern at a time T . We find an invariant property: stripes in the two groups are parallel and perpendicular to the bisector at all crossing angles. In contrast, other properties depend on the crossing angle: stripe spacing (wavelength), stripe size (number of pedestrians per stripe), and crossing time all decrease as the crossing angle increases from 30° to 180°, whereas the number of stripes increases with crossing angle. We also observe that the width of individual stripes is dynamically squeezed as the two groups cross each other. The findings thus support the bisector hypothesis at a wide range of crossing angles, although the theoretical reasons for this invariant remain unclear. The present results provide empirical constraints on theoretical studies and computational models of crossing flows. 
    more » « less
  3. Abstract

    Directed self-assembly of block copolymers (BCPs) enables nanofabrication at sub-10 nm dimensions, beyond the resolution of conventional lithography. However, directing the position, orientation, and long-range lateral order of BCP domains to produce technologically-useful patterns is a challenge. Here, we present a promising approach to direct assembly using spatial boundaries between planar, low-resolution regions on a surface with different composition. Pairs of boundaries are formed at the edges of isolated stripes on a background substrate. Vertical lamellae nucleate at and are pinned by chemical contrast at each stripe/substrate boundary, align parallel to boundaries, selectively propagate from boundaries into stripe interiors (whereas horizontal lamellae form on the background), and register to wide stripes to multiply the feature density. Ordered BCP line arrays with half-pitch of 6.4 nm are demonstrated on stripes >80 nm wide. Boundary-directed epitaxy provides an attractive path towards assembling, creating, and lithographically defining materials on sub-10 nm scales.

     
    more » « less
  4. Data reliability and availability, and serviceability (RAS) of erasure-coded data centers are highly affected by data repair induced by node failures. Compared to the recovery phase of the data repair, which is widely studied and well optimized, the failure identification phase of the data repair is less investigated. Moreover, in a traditional failure identification scheme, all chunks share the same identification time threshold, thus losing opportunities to further improve the RAS. To solve this problem, we propose RAFI, a novel risk-aware failure identification scheme. In RAFI, chunk failures in stripes experiencing different numbers of failed chunks are identified using different time thresholds. For those chunks in a high risk stripe (a stripe with many failed chunks), a shorter identification time is adopted, thus improving the overall data reliability and availability. For those chunks in a low risk stripe (one with only a few failed chunks), a longer identification time is adopted, thus reducing the repair network traffic. Therefore, the RAS can be improved simultaneously. We use both simulations and prototyping implementation to evaluate RAFI. Results collected from extensive simulations demonstrate the effectiveness and efficiency of RAFI on improving the RAS. We implement a prototype on HDFS to verify the correctness and evaluate the computational cost of RAFI. 
    more » « less
  5. There have been many recent attempts to extend the successes of convolutional neural networks (CNNs) from 2-dimensional (2D) image classification to 3-dimensional (3D) video recognition by exploring 3D CNNs. Considering the emerging growth of mobile or Internet of Things (IoT) market, it is essential to investigate the deployment of 3D CNNs on edge devices. Previous works have implemented standard 3D CNNs (C3D) on hardware platforms, however, they have not exploited model compression for acceleration of inference. This work proposes a hardware-aware pruning approach that can fully adapt to the loop tiling technique of FPGA design and is applied onto a novel 3D network called R(2+1)D. Leveraging the powerful ADMM, the proposed pruning method achieves simultaneous high accuracy and significant acceleration of computation on FPGA. With layer-wise pruning rates up to 10× and negligible accuracy loss, the pruned model is implemented on a Xilinx ZCU102 FPGA board, where the pruned model achieves 2.6× speedup compared with the unpruned version, and 2.3× speedup and 2.3× power efficiency improvement compared with state-of-the-art FPGA implementation of C3D. 
    more » « less