skip to main content


Search for: All records

Award ID contains: 1717657

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. In the past decade, Deep Neural Networks (DNNs), e.g., Convolutional Neural Networks, achieved human-level performance in vision tasks such as object classification and detection. However, DNNs are known to be computationally expensive and thus hard to be deployed in real-time and edge applications. Many previous works have focused on DNN model compression to obtain smaller parameter sizes and consequently, less computational cost. Such methods, however, often introduce noticeable accuracy degradation. In this work, we optimize a state-of-the-art DNN-based video detection framework—Deep Feature Flow (DFF) from the cloud end using three proposed ideas. First, we propose Asynchronous DFF (ADFF) to asynchronously execute the neural networks. Second, we propose a Video-based Dynamic Scheduling (VDS) method that decides the detection frequency based on the magnitude of movement between video frames. Last, we propose Spatial Sparsity Inference, which only performs the inference on part of the video frame and thus reduces the computation cost. According to our experimental results, ADFF can reduce the bottleneck latency from 89 to 19 ms. VDS increases the detection accuracy by 0.6% mAP without increasing computation cost. And SSI further saves 0.2 ms with a 0.6% mAP degradation of detection accuracy. 
    more » « less
  2. null (Ed.)
    The invention of Transformer model structure boosts the performance of Neural Machine Translation (NMT) tasks to an unprecedented level. Many previous works have been done to make the Transformer model more execution-friendly on resource-constrained platforms. These researches can be categorized into three key fields: Model Pruning, Transfer Learning, and Efficient Transformer Variants. The family of model pruning methods are popular for their simplicity in practice and promising compression rate and have achieved great success in the field of convolution neural networks (CNNs) for many vision tasks. Nonetheless, previous Transformer pruning works did not perform a thorough model analysis and evaluation on each Transformer component on off-the-shelf mobile devices. In this work, we analyze and prune transformer models at the line-wise granularity and also implement our pruning method on real mobile platforms. We explore the properties of all Transformer components as well as their sparsity features, which are leveraged to guide Transformer model pruning. We name our whole Transformer analysis and pruning pipeline as TPrune. In TPrune, we first propose Block-wise Structured Sparsity Learning (BSSL) to analyze Transformer model property. Then, based on the characters derived from BSSL, we apply Structured Hoyer Square (SHS) to derive the final pruned models. Comparing with the state-of-the-art Transformer pruning methods, TPrune is able to achieve a higher model compression rate with less performance degradation. Experimental results show that our pruned models achieve 1.16×–1.92× speedup on mobile devices with 0%–8% BLEU score degradation compared with the original Transformer model. 
    more » « less
  3. null (Ed.)
  4. As the model size of deep neural networks (DNNs) grows for better performance, the increase in computational cost associated with training and testing makes it extremely difficulty to deploy DNNs on end/edge devices with limited resources while also satisfying the response time requirement. To address this challenge, model compression which compresses model size and thus reduces computation cost is widely adopted in deep learning society. However, the practical impacts of hardware design are often ignored in these algorithm-level solutions, such as the increase of the random accesses to memory hierarchy and the constraints of memory capacity. On the other side, limited understanding about the computational needs at algorithm level may lead to unrealistic assumptions during the hardware designs. In this work, we will discuss this mismatch and provide how our approach addresses it through an interactive design practice across both software and hardware levels. 
    more » « less
  5. In recent years, machine learning research has largely shifted focus from the cloud to the edge. While the resulting algorithm- and hardware-level optimizations have enabled local execution for the majority of deep neural networks (DNNs) on edge devices, the sheer magnitude of DNNs associated with real-time video detection workloads has forced them to remain relegated to remote execution in the cloud. This problematic when combined with the strict latency requirements that are coupled with these workloads, and imposes a unique set of challenges not directly addressed in prior works. In this work, we design MobiEye, a cloud-based video detection system optimized for deployment in real-time mobile applications. MobiEye is able to achieve up to a 32% reduction in latency when compared to a conventional implementation of video detection system with only a marginal reduction in accuracy. 
    more » « less
  6. Deep Neural Networks (DNNs) have shown phenomenal success in a wide range of real-world applications. However, a concerning weakness of DNNs is that they are vulnerable to adversarial attacks. Although there exist methods to detect adversarial attacks, they often suffer constraints on specific attack types and provide limited information to downstream systems. We specifically note that existing adversarial detectors are often binary classifiers, which differentiate clean or adversarial examples. However, detection of adversarial examples is much more complicated than such a scenario. Our key insight is that the confidence probability of detecting an input sample as an adversarial example will be more useful for the system to properly take action to resist potential attacks. In this work, we propose an innovative method for fast confidence detection of adversarial attacks based on integrity of sensor pattern noise embedded in input examples. Experimental results show that our proposed method is capable of providing a confidence distribution model of most of popular adversarial attacks. Furthermore, our presented method can provide early attack warning with even the attack types based on different properties of the confidence distribution models. Since fast confidence detection is a computationally heavy task, we propose an FPGA-Based hardware architecture based on a series of optimization techniques, such as incremental multi-level quantization and etc. We realize our proposed method on an FPGA platform and achieve a high efficiency of 29.740 IPS/W with a power consumption of only 0.7626W. 
    more » « less
  7. In recent years, neuromorphic computing systems (NCS) have gained popularity in accelerating neural network computation because of their high energy efficiency. The known vulnerability of neural networks to adversarial attack, however, raises a severe security concern of NCS. In addition, there are certain application scenarios in which users have limited access to the NCS. In such scenarios, defense technologies that require changing the training methods of the NCS, e.g., adversarial training become impracticable. In this work, we propose AdverQuil – an efficient adversarial detection and alleviation technique for black-box NCS. AdverQuil can identify the adversarial strength of input examples and select the best strategy for NCS to respond to the attack, without changing structure/parameter of the original neural network or its training method. Experimental results show that on MNIST and CIFAR-10 datasets, AdverQuil achieves a high efficiency of 79.5 - 167K image/sec/watt. AdverQuil introduces less than 25% of hardware overhead, and can be combined with various adversarial alleviation techniques to provide a flexible trade-off between hardware cost, energy efficiency and classification accuracy. 
    more » « less
  8. Deep Neural Networks (DNNs) are pervasively applied in many artificial intelligence (AI) applications. The high performance of DNNs comes at the cost of larger size and higher compute complexity. Recent studies show that DNNs have much redundancy, such as the zero-value parameters and excessive numerical precision. To reduce computing complexity, many redundancy reduction techniques have been proposed, including pruning and data quantization. In this paper, we demonstrate our cooptimization of the DNN algorithm and hardware which exploits the model redundancy to accelerate DNNs. 
    more » « less
  9. Neural networks hold a critical domain in machine learning algorithms because of their self-adaptiveness and state-of-the-art performance. Before the testing (inference) phases in practical use, sophisticated training (learning) phases are required, calling for efficient training methods with higher accuracy and shorter converging time. Many existing studies focus on the training optimization on high-performance servers or computing clusters, e.g. GPU clusters. However, training neural networks on resource-constrained devices, e.g. mobile platforms, is an important research topic barely touched. In this paper, we implement AdaLearner–an adaptive distributed mobile learning system for neural networks that trains a single network with heterogenous mobile resources under the same local network in parallel. To exploit the potential of our system, we adapt neural networks training phase to mobile device-wise resources and fiercely decrease the transmission overhead for better system scalability. On three representative neural network structures trained from two image classification datasets, AdaLearner boosts the training phase significantly. For example, on LeNet, 1.75-3.37⇥ speedup is achieved when increasing the worker nodes from 2 to 8, thanks to the achieved high execution parallelism and excellent scalability. 
    more » « less
  10. Deep Neural Networks (DNNs) are pervasively used in a significant number of applications and platforms. To enhance the execution efficiency of large-scale DNNs, previous attempts focus mainly on client-server paradigms, relying on powerful external infrastructure, or model compression, with complicated pre-processing phases. Though effective, these methods overlook the optimization of DNNs on distributed mobile devices. In this work, we design and implement MeDNN, a local distributed mobile computing system with enhanced partitioning and deployment tailored for large-scale DNNs. In MeDNN, we first propose Greedy Two Dimensional Partition (GTDP), which can adaptively partition DNN models onto several mobile devices w.r.t. individual resource constraints. We also propose Structured Model Compact Deployment (SMCD), a mobile-friendly compression scheme which utilizes a structured sparsity pruning technique to further accelerate DNN execution. Experimental results show that, GTDP can accelerate the original DNN execution time by 1.86 – 2.44⇥ with 2 – 4 worker nodes. By utilizing SMCD, 26.5% of additional computing time and 14.2% of extra communication time are saved, on average, with negligible effect on the model accuracy. 
    more » « less