In recent years, Convolutional Neural Networks (CNNs) have shown superior capability in visual learning tasks. While accuracy-wise CNNs provide unprecedented performance, they are also known to be computationally intensive and energy demanding for modern computer systems. In this paper, we propose Virtual Pooling (ViP), a model-level approach to improve speed and energy consumption of CNN-based image classification and object detection tasks, with a provable error bound. We show the efficacy of ViP through experiments on four CNN models, three representative datasets, both desktop and mobile platforms, and two visual learning tasks, i.e., image classification and object detection. For example, ViP delivers 2.1x speedup with less than 1.5% accuracy degradation in ImageNet classification on VGG16, and 1.8x speedup with 0.025 mAP degradation in PASCAL VOC object detection with Faster-RCNN. ViP also reduces mobile GPU and CPU energy consumption by up to 55% and 70%, respectively. As a complementary method to existing acceleration approaches, ViP achieves 1.9x speedup on ThiNet leading to a combined speedup of 5.23x on VGG16. Furthermore, ViP provides a knob for machine learning practitioners to generate a set of CNN models with varying trade-offs between system speed/energy consumption and accuracy to better accommodate the requirements of their tasks. Code is available at https://github.com/cmu-enyac/VirtualPooling. 
                        more » 
                        « less   
                    
                            
                            Task-Driven convolutional recurrent models of the visual system.
                        
                    
    
            Feed-forward convolutional neural networks (CNNs) are currently state-of-the-art for object classification tasks such as ImageNet. Further, they are quantitatively accurate models of temporally-averaged responses of neurons in the primate brain's visual system. However, biological visual systems have two ubiquitous architectural features not shared with typical CNNs: local recurrence within cortical areas, and long-range feedback from downstream areas to upstream areas. Here we explored the role of recurrence in improving classification performance. We found that standard forms of recurrence (vanilla RNNs and LSTMs) do not perform well within deep CNNs on the ImageNet task. In contrast, novel cells that incorporated two structural features, bypassing and gating, were able to boost task accuracy substantially. We extended these design principles in an automated search over thousands of model architectures, which identified novel local recurrent cells and long-range feedback connections useful for object recognition. Moreover, these task-optimized ConvRNNs matched the dynamics of neural activity in the primate visual system better than feedforward networks, suggesting a role for the brain's recurrent connections in performing difficult visual behaviors. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1703161
- PAR ID:
- 10082848
- Date Published:
- Journal Name:
- Advances in Neural Information Processing Systems 2018
- Page Range / eLocation ID:
- 5295-5306
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            3D Convolutional Neural Networks (3D-CNN) have been used for object recognition based on the voxelized shape of an object. However, interpreting the decision making process of these 3D-CNNs is still an infeasible task. In this paper, we present a unique 3D-CNN based Gradient-weighted Class Activation Mapping method (3D-GradCAM) for visual explanations of the distinct local geometric features of interest within an object. To enable efficient learning of 3D geometries, we augment the voxel data with surface normals of the object boundary. We then train a 3D-CNN with this augmented data and identify the local features critical for decision-making using 3D GradCAM. An application of this feature identification framework is to recognize difficult-to-manufacture drilled hole features in a complex CAD geometry. The framework can be extended to identify difficult-to-manufacture features at multiple spatial scales leading to a real-time design for manufacturability decision support system.more » « less
- 
            We present a recurrent network for 3D reconstruction of neurons that sequentially generates binary masks for every object in an image with spatio-temporal consistency. Our network models consistency in two parts: (i) local, which allows exploring non-occluding and temporally-adjacent object relationships with bi-directional recurrence. (ii) non-local, which allows exploring long-range object relationships in the temporal domain with skip connections. Our proposed network is end-to-end trainable from an input image to a sequence of object masks, and, compared to methods relying on object boundaries, its output does not require post-processing. We evaluate our method on three benchmarks for neuron segmentation and achieved state-of-the-art performance on the SNEMI3D challenge.more » « less
- 
            null (Ed.)In biological brains, recurrent connections play a crucial role in cortical computation, modulation of network dynamics, and communication. However, in recurrent spiking neural networks (SNNs), recurrence is mostly constructed by random connections. How excitatory and inhibitory recurrent connections affect network responses and what kinds of connectivity benefit learning performance is still obscure. In this work, we propose a novel recurrent structure called the Laterally-Inhibited Self-Recurrent Unit (LISR), which consists of one excitatory neuron with a self-recurrent connection wired together with an inhibitory neuron through excitatory and inhibitory synapses. The self-recurrent connection of the excitatory neuron mitigates the information loss caused by the firing-and-resetting mechanism and maintains the long-term neuronal memory. The lateral inhibition from the inhibitory neuron to the corresponding excitatory neuron, on the one hand, adjusts the firing activity of the latter. On the other hand, it plays as a forget gate to clear the memory of the excitatory neuron. Based on speech and image datasets commonly used in neuromorphic computing, RSNNs based on the proposed LISR improve performance significantly by up to 9.26% over feedforward SNNs trained by a state-of-the-art backpropagation method with similar computational costs.more » « less
- 
            Abstract Recent analyses have found waves of neural activity traveling across entire visual cortical areas in awake animals. These traveling waves modulate the excitability of local networks and perceptual sensitivity. The general computational role of these spatiotemporal patterns in the visual system, however, remains unclear. Here, we hypothesize that traveling waves endow the visual system with the capacity to predict complex and naturalistic inputs. We present a network model whose connections can be rapidly and efficiently trained to predict individual natural movies. After training, a few input frames from a movie trigger complex wave patterns that drive accurate predictions many frames into the future solely from the network’s connections. When the recurrent connections that drive waves are randomly shuffled, both traveling waves and the ability to predict are eliminated. These results suggest traveling waves may play an essential computational role in the visual system by embedding continuous spatiotemporal structures over spatial maps.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    