skip to main content


Title: Network Compression via Cooperative Architecture Search and Distillation
Neural Architecture Search (NAS) and its variants are competitive in many computer vision tasks lately. In this paper, we develop a Cooperative Architecture Search and Distillation (CASD) method for network compression. Compared with prior art, our method achieves better performance in ResNet-164 pruning on CIFAR-10 and CIFAR-100 image classifications, promising to be extended to other tasks.  more » « less
Award ID(s):
1854434 1952644
NSF-PAR ID:
10335367
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings IEEE International Conference on Artificial Intelligence for Industries
ISSN:
2770-4718
Page Range / eLocation ID:
42-43
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Most existing neural architecture search (NAS) benchmarks and algorithms prioritize well-studied tasks, eg image classification on CIFAR or ImageNet. This makes the performance of NAS approaches in more diverse areas poorly understood. In this paper, we present NAS-Bench-360, a benchmark suite to evaluate methods on domains beyond those traditionally studied in architecture search, and use it to address the following question: do state-of-the-art NAS methods perform well on diverse tasks? To construct the benchmark, we curate ten tasks spanning a diverse array of application domains, dataset sizes, problem dimensionalities, and learning objectives. Each task is carefully chosen to interoperate with modern CNN-based search methods while possibly being far-afield from its original development domain. To speed up and reduce the cost of NAS research, for two of the tasks we release the precomputed performance of 15,625 architectures comprising a standard CNN search space. Experimentally, we show the need for more robust NAS evaluation of the kind NAS-Bench-360 enables by showing that several modern NAS procedures perform inconsistently across the ten tasks, with many catastrophically poor results. We also demonstrate how NAS-Bench-360 and its associated precomputed results will enable future scientific discoveries by testing whether several recent hypotheses promoted in the NAS literature hold on diverse tasks. NAS-Bench-360 is hosted at https://nb360. ml. cmu. edu. 
    more » « less
  2. Learning from one's mistakes is an effective human learning technique where the learners focus more on the topics where mistakes were made, so as to deepen their understanding. In this paper, we investigate if this human learning strategy can be applied in machine learning. We propose a novel machine learning method called Learning From Mistakes (LFM), wherein the learner improves its ability to learn by focusing more on the mistakes during revision. We formulate LFM as a three-stage optimization problem: 1) learner learns; 2) learner re-learns focusing on the mistakes, and; 3) learner validates its learning. We develop an efficient algorithm to solve the LFM problem. We apply the LFM framework to neural architecture search on CIFAR-10, CIFAR-100, and Imagenet. Experimental results strongly demonstrate the effectiveness of our model. 
    more » « less
  3. It has been shown by many researchers that transformers perform as well as convolutional neural networks in many computer vision tasks. Meanwhile, the large computational costs of its attention module hinder further studies and applications on edge devices. Some pruning methods have been developed to construct efficient vision transformers, but most of them have considered image classification tasks only. Inspired by these results, we propose SiDT, a method for pruning vision transformer backbones on more complicated vision tasks like object detection, based on the search of transformer dimensions. Experiments on CIFAR-100 and COCO datasets show that the backbones with 20% or 40% dimensions/parameters pruned can have similar or even better performance than the unpruned models. Moreover, we have also provided the complexity analysis and comparisons with the previous pruning methods. 
    more » « less
  4. The concept of stimulus feature tuning isfundamental to neuroscience. Cortical neurons acquire their feature-tuning properties by learning from experience and using proxy signs of tentative features’ potential usefulness that come from the spatial and/or temporal context in which these features occur. According to this idea, local but ultimately behaviorally useful features should be the ones that are predictably related to other such features either preceding them in time or taking place side-by-side with them. Inspired by this idea, in this paper, deep neural networks are combined with Canonical Correlation Analysis (CCA) for feature extraction and the power of the features is demonstrated using unsupervised cross-modal prediction tasks. CCA is a multi-view feature extraction method that finds correlated features across multiple datasets (usually referred to as views or modalities). CCA finds linear transformations of each view such that the extracted principal components, or features, have a maximal mutual correlation. CCA is a linear method, and the features are computed by a weighted sum of each view's variables. Once the weights are learned, CCA can be applied to new examples and used for cross-modal prediction by inferring the target-view features of an example from its given variables in a source (query) view. To test the proposed method, it was applied to the unstructured CIFAR-100 dataset of 60,000 images categorized into 100 classes, which are further grouped into 20 superclasses and used to demonstrate the mining of image-tag correlations. CCA was performed on the outputs of three pre-trained CNNs: AlexNet, ResNet, and VGG. Taking advantage of the mutually correlated features extracted with CCA, a search for nearest neighbors was performed in the canonical subspace common to both the query and the target views to retrieve the most matching examples in the target view, which successfully predicted the superclass membership of the tested views without any supervised training. 
    more » « less
  5. Tuning hyperparameters is a crucial but arduous part of the machine learning pipeline. Hyperparameter optimization is even more challenging in federated learning, where models are learned over a distributed network of heterogeneous devices; here, the need to keep data on device and perform local training makes it difficult to efficiently train and evaluate configurations. In this work, we investigate the problem of federated hyperparameter tuning. We first identify key challenges and show how standard approaches may be adapted to form baselines for the federated setting. Then, by making a novel connection to the neural architecture search technique of weight-sharing, we introduce a new method, FedEx, to accelerate federated hyperparameter tuning that is applicable to widely-used federated optimization methods such as FedAvg and recent variants. Theoretically, we show that a FedEx variant correctly tunes the on-device learning rate in the setting of online convex optimization across devices. Empirically, we show that FedEx can outperform natural baselines for federated hyperparameter tuning by several percentage points on the Shakespeare, FEMNIST, and CIFAR-10 benchmarks, obtaining higher accuracy using the same training budget. 
    more » « less