skip to main content

Title: Practical Accuracy Estimation for Efficient Deep Neural Network Testing
Deep neural network (DNN) has become increasingly popular and DNN testing is very critical to guarantee the correctness of DNN, i.e., the accuracy of DNN in this work. However, DNN testing suffers from a serious efficiency problem, i.e., it is costly to label each test input to know the DNN accuracy for the testing set, since labeling each test input involves multiple persons (even with domain-specific knowledge) in a manual way and the testing set is large-scale. To relieve this problem, we propose a novel and practical approach, called PACE (which is short for P ractical AC curacy E stimation), which selects a small set of test inputs that can precisely estimate the accuracy of the whole testing set. In this way, the labeling costs can be largely reduced by just labeling this small set of selected test inputs. Besides achieving a precise accuracy estimation, to make PACE more practical it is also required that it is interpretable, deterministic, and as efficient as possible. Therefore, PACE first incorporates clustering to interpretably divide test inputs with different testing capabilities (i.e., testing different functionalities of a DNN model) into different groups. Then, PACE utilizes the MMD-critic algorithm, a state-of-the-art example-based explanation algorithm, more » to select prototypes (i.e., the most representative test inputs) from each group, according to the group sizes, which can reduce the impact of noise due to clustering. Meanwhile, PACE also borrows the idea of adaptive random testing to select test inputs from the minority space (i.e., the test inputs that are not clustered into any group) to achieve great diversity under the required number of test inputs. The two parallel selection processes (i.e., selection from both groups and the minority space) compose the final small set of selected test inputs. We conducted an extensive study to evaluate the performance of PACE based on a comprehensive benchmark (i.e., 24 pairs of DNN models and testing sets) by considering different types of models (i.e., classification and regression models, high-accuracy and low-accuracy models, and CNN and RNN models) and different types of test inputs (i.e., original, mutated, and automatically generated test inputs). The results demonstrate that PACE is able to precisely estimate the accuracy of the whole testing set with only 1.181%∼2.302% deviations, on average, significantly outperforming the state-of-the-art approaches. « less
; ; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
ACM Transactions on Software Engineering and Methodology
Page Range or eLocation-ID:
1 to 35
Sponsoring Org:
National Science Foundation
More Like this
  1. The ever increasing size of deep neural network (DNN) models once implied that they were only limited to cloud data centers for runtime inference. Nonetheless, the recent plethora of DNN model compression techniques have successfully overcome this limit, turning into a reality that DNN-based inference can be run on numerous resource-constrained edge devices including mobile phones, drones, robots, medical devices, wearables, Internet of Things devices, among many others. Naturally, edge devices are highly heterogeneous in terms of hardware specification and usage scenarios. On the other hand, compressed DNN models are so diverse that they exhibit different tradeoffs in a multi-dimension space, and not a single model can achieve optimality in terms of all important metrics such as accuracy, latency and energy consumption. Consequently, how to automatically select a compressed DNN model for an edge device to run inference with optimal quality of experience (QoE) arises as a new challenge. The state-of-the-art approaches either choose a common model for all/most devices, which is optimal for a small fraction of edge devices at best, or apply device-specific DNN model compression, which is not scalable. In this paper, by leveraging the predictive power of machine learning and keeping end users in the loop,more »we envision an automated device-level DNN model selection engine for QoE-optimal edge inference. To concretize our vision, we formulate the DNN model selection problem into a contextual multi-armed bandit framework, where features of edge devices and DNN models are contexts and pre-trained DNN models are arms selected online based on the history of actions and users' QoE feedback. We develop an efficient online learning algorithm to balance exploration and exploitation. Our preliminary simulation results validate our algorithm and highlight the potential of machine learning for automating DNN model selection to achieve QoE-optimal edge inference.« less
  2. Abstract

    Nonlinear response history analysis (NLRHA) is generally considered to be a reliable and robust method to assess the seismic performance of buildings under strong ground motions. While NLRHA is fairly straightforward to evaluate individual structures for a select set of ground motions at a specific building site, it becomes less practical for performing large numbers of analyses to evaluate either (1) multiple models of alternative design realizations with a site‐specific set of ground motions, or (2) individual archetype building models at multiple sites with multiple sets of ground motions. In this regard, surrogate models offer an alternative to running repeated NLRHAs for variable design realizations or ground motions. In this paper, a recently developed surrogate modeling technique, called probabilistic learning on manifolds (PLoM), is presented to estimate structural seismic response. Essentially, the PLoM method provides an efficient stochastic model to develop mappings between random variables, which can then be used to efficiently estimate the structural responses for systems with variations in design/modeling parameters or ground motion characteristics. The PLoM algorithm is introduced and then used in two case studies of 12‐story buildings for estimating probability distributions of structural responses. The first example focuses on the mapping between variable designmore »parameters of a multidegree‐of‐freedom analysis model and its peak story drift and acceleration responses. The second example applies the PLoM technique to estimate structural responses for variations in site‐specific ground motion characteristics. In both examples, training data sets are generated for orthogonal input parameter grids, and test data sets are developed for input parameters with prescribed statistical distributions. Validation studies are performed to examine the accuracy and efficiency of the PLoM models. Overall, both examples show good agreement between the PLoM model estimates and verification data sets. Moreover, in contrast to other common surrogate modeling techniques, the PLoM model is able to preserve correlation structure between peak responses. Parametric studies are conducted to understand the influence of different PLoM tuning parameters on its prediction accuracy.

    « less
  3. In the medical sector, three-dimensional (3D) images are commonly used like computed tomography (CT) and magnetic resonance imaging (MRI). The 3D MRI is a non-invasive method of studying the soft-tissue structures in a knee joint for osteoarthritis studies. It can greatly improve the accuracy of segmenting structures such as cartilage, bone marrow lesion, and meniscus by identifying the bone structure first. U-net is a convolutional neural network that was originally designed to segment the biological images with limited training data. The input of the original U-net is a single 2D image and the output is a binary 2D image. In this study, we modified the U-net model to identify the knee bone structures using 3D MRI, which is a sequence of 2D slices. A fully automatic model has been proposed to detect and segment knee bones. The proposed model was trained, tested, and validated using 99 knee MRI cases where each case consists of 160 2D slices for a single knee scan. To evaluate the model’s performance, the similarity, dice coefficient (DICE), and area error metrics were calculated. Separate models were trained using different knee bone components including tibia, femur, patella, as well as a combined model for segmenting allmore »the knee bones. Using the whole MRI sequence (160 slices), the method was able to detect the beginning and ending bone slices first, and then segment the bone structures for all the slices in between. On the testing set, the detection model accomplished 98.79% accuracy and the segmentation model achieved DICE 96.94% and similarity 93.98%. The proposed method outperforms several state-of-the-art methods, i.e., it outperforms U-net by 3.68%, SegNet by 14.45%, and FCN-8 by 2.34%, in terms of DICE score using the same dataset.« less
  4. Alba, Mar (Ed.)
    Abstract Adaptive radiations are characterised by the diversification and ecological differentiation of species, and replicated cases of this process provide natural experiments for understanding the repeatability and pace of molecular evolution. During adaptive radiation, genes related to ecological specialisation may be subject to recurrent positive directional selection. However, it is not clear to what extent patterns of lineage-specific ecological specialisation (including phenotypic convergence) are correlated with shared signatures of molecular evolution. To test this, we sequenced whole exomes from a phylogenetically dispersed sample of 38 murine rodent species, a group characterised by multiple, nested adaptive radiations comprising extensive ecological and phenotypic diversity. We found that genes associated with immunity, reproduction, diet, digestion and taste have been subject to pervasive positive selection during the diversification of murine rodents. We also found a significant correlation between genome-wide positive selection and dietary specialisation, with a higher proportion of positively selected codon sites in derived dietary forms (i.e. carnivores and herbivores) than in ancestral forms (i.e. omnivores). Despite striking convergent evolution of skull morphology and dentition in two distantly related worm-eating specialists, we did not detect more genes with shared signatures of positive or relaxed selection than in a non-convergent species comparison. While amore »small number of the genes we detected can be incidentally linked to craniofacial morphology or diet, protein-coding regions are unlikely to be the primary genetic basis of this complex convergent phenotype. Our results suggest a link between positive selection and derived ecological phenotypes, and highlight specific genes and general functional categories that may have played an integral role in the extensive and rapid diversification of murine rodents.« less
  5. Abstract

    Deep neural networks (DNNs) are widely used to handle many difficult tasks, such as image classification and malware detection, and achieve outstanding performance. However, recent studies on adversarial examples, which have maliciously undetectable perturbations added to their original samples that are indistinguishable by human eyes but mislead the machine learning approaches, show that machine learning models are vulnerable to security attacks. Though various adversarial retraining techniques have been developed in the past few years, none of them is scalable. In this paper, we propose a new iterative adversarial retraining approach to robustify the model and to reduce the effectiveness of adversarial inputs on DNN models. The proposed method retrains the model with both Gaussian noise augmentation and adversarial generation techniques for better generalization. Furthermore, the ensemble model is utilized during the testing phase in order to increase the robust test accuracy. The results from our extensive experiments demonstrate that the proposed approach increases the robustness of the DNN model against various adversarial attacks, specifically, fast gradient sign attack, Carlini and Wagner (C&W) attack, Projected Gradient Descent (PGD) attack, and DeepFool attack. To be precise, the robust classifier obtained by our proposed approach can maintain a performance accuracy of 99%more »on average on the standard test set. Moreover, we empirically evaluate the runtime of two of the most effective adversarial attacks, i.e., C&W attack and BIM attack, to find that the C&W attack can utilize GPU for faster adversarial example generation than the BIM attack can. For this reason, we further develop a parallel implementation of the proposed approach. This parallel implementation makes the proposed approach scalable for large datasets and complex models.

    « less