skip to main content

Title: Diverse Knowledge Distillation (DKD): A Solution for Improving The Robustness of Ensemble Models Against Adversarial Attacks
This paper proposes an ensemble learning model that is resistant to adversarial attacks. To build resilience, we introduced a training process where each member learns a radically distinct latent space. Member models are added one at a time to the ensemble. Simultaneously, the loss function is regulated by a reverse knowledge distillation, forcing the new member to learn different features and map to a latent space safely distanced from those of existing members. We assessed the security and performance of the proposed solution on image classification tasks using CIFAR10 and MNIST datasets and showed security and performance improvement compared to the state of the art defense methods.
; ; ; ;
Award ID(s):
1718538 2146726
Publication Date:
Journal Name:
22nd International Symposium on Quality Electronic Design (ISQED)
Page Range or eLocation-ID:
319 to 324
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper presents a novel zero-shot learning approach towards personalized speech enhancement through the use of a sparsely active ensemble model. Optimizing speech denoising systems towards a particular test-time speaker can improve performance and reduce run-time complexity. However, test-time model adaptation may be challenging if collecting data from the test-time speaker is not possible. To this end, we propose using an ensemble model wherein each specialist module denoises noisy utterances from a distinct partition of training set speakers. The gating module inexpensively estimates test-time speaker characteristics in the form of an embedding vector and selects the most appropriate specialist module for denoising the test signal. Grouping the training set speakers into non-overlapping semantically similar groups is non-trivial and ill-defined. To do this, we first train a Siamese network using noisy speech pairs to maximize or minimize the similarity of its output vectors depending on whether the utterances derive from the same speaker or not. Next, we perform k-means clustering on the latent space formed by the averaged embedding vectors per training set speaker. In this way, we designate speaker groups and train specialist modules optimized around partitions of the complete training set. Our experiments show that ensemble models made upmore »of low-capacity specialists can outperform high-capacity generalist models with greater efficiency and improved adaptation towards unseen test-time speakers.« less
  2. Abstract

    Numerical weather prediction models and high-performance computing have significantly improved our ability to model near-surface variables, but their uncertainty quantification still remains a challenging task. Ensembles are usually produced to depict a series of possible future states of the atmosphere, as a means to quantify the prediction uncertainty, but this requires multiple instantiation of the model, leading to an increased computational cost. Weather analogs, alternatively, can be used to generate ensembles without repeated model runs. The analog ensemble (AnEn) is a technique to identify similar weather patterns for near-surface variables and quantify forecast uncertainty. Analogs are chosen based on a similarity metric that calculates the weighted multivariate Euclidean distance. However, identifying optimal weights for similarity metric becomes a bottleneck because it involves performing a constrained exhaustive search. As a result, only a few predictors were selected and optimized in previous AnEn studies. A new machine learning similarity metric is proposed to improve the theoretical framework on how weather analogs are identified. First, a deep learning network is trained to generate latent features using all the temporal multivariate input predictors. Analogs are then selected in this latent space, rather than the original predictor space. The proposed method does not requiremore »prior predictor selection and an exhaustive search, thus presenting a significant computational benefit and scalability. It is tested for surface wind speed and solar irradiance forecasts in Pennsylvania from 2017 to 2019. Results show that the proposed method is capable of handling a large number of predictors, and it outperforms the original similarity metric in RMSE, bias, and CRPS. Since the data-driven transformation network is trained using the historical record, the proposed method has been found to be more flexible for searching through a longer record.

    « less
  3. Abstract

    Stochastic model error schemes, such as the stochastic perturbed parameterization tendencies (SPPT) and independent SPPT (iSPPT) schemes, have become an increasingly accepted method to represent model error associated with uncertain subgrid-scale processes in ensemble prediction systems (EPSs). While much of the current literature focuses on the effects of these schemes on forecast skill, this research examines the physical processes by which iSPPT perturbations to the microphysics parameterization scheme yield variability in ensemble rainfall forecasts. Members of three 120-member Weather Research and Forecasting (WRF) Model ensemble case studies, including two distinct heavy rain events over Taiwan and one over the northeastern United States, are ranked according to an area-averaged accumulated rainfall metric in order to highlight differences between high- and low-precipitation forecasts. In each case, high-precipitation members are characterized by a damping of the microphysics water vapor and temperature tendencies over the region of heaviest rainfall, while the opposite is true for low-precipitation members. Physically, the perturbations to microphysics tendencies have the greatest impact at the cloud level and act to modify precipitation efficiency. To this end, the damping of tendencies in high-precipitation forecasts suppresses both the loss of water vapor due to condensation and the corresponding latent heat release,more »leading to grid-scale supersaturation. Conversely, amplified tendencies in low-precipitation forecasts yield both drying and increased positive buoyancy within clouds.

    « less
  4. Abstract

    This study examines the benefit of using a dynamical ensemble for 48 hr deterministic and probabilistic predictions of near‐surface fine particulate matter (PM2.5) over the contiguous United States (CONUS). Our ensemble design captures three key sources of uncertainties in PM2.5modeling including meteorology, emissions, and secondary organic aerosol (SOA) formation. Twenty‐four ensemble members were simulated using the Community Multiscale Air Quality (CMAQ) model during January, April, July, and October 2016. The raw ensemble mean performed better than most of the ensemble members but underestimated the observed PM2.5over the CONUS with the largest underestimation over the western CONUS owing to negative PM2.5bias in nearly all the members. To improve the ensemble performance, we calibrated the raw ensemble using model output statistics (MOS) and variance deficit methods. The calibrated ensemble captured the diurnal and day‐to‐day variability in observed PM2.5very well and exhibited almost zero mean bias. The mean bias in the calibrated ensemble was reduced by 90–100% in the western CONUS and by 40–100% in other parts of the CONUS, compared to the raw ensemble in all months. The corresponding reduction in root‐mean‐square error (RMSE) was 13–40%. The calibrated ensemble also showed 30% improvement in the RMSE and spread matching compared to themore »raw ensemble. We have also shown that a nine‐member ensemble based on combinations of three meteorological and three anthropogenic emission scenarios can be used as a smaller subset of the full ensemble when sufficient computational resources are not available in the operational setting.

    « less
  5. Variational autoencoders are artificial neural networks with the capability to reduce highly dimensional sets of data to smaller dimensional, latent representations. In this work, these models are applied to molecular dynamics simulations of the self-assembly of coarse-grained peptides to obtain a singled-valued order parameter for amyloid aggregation. This automatically learned order parameter is constructed by time-averaging the latent parameterizations of internal coordinate representations and compared to the nematic order parameter which is commonly used to study ordering of similar systems in literature. It is found that the latent space value provides more tailored insight into the aggregation mechanism’s details, correctly identifying fibril formation in instances where the nematic order parameter fails to do so. A means is provided by which the latent space value can be analyzed so that the major contributing internal coordinates are identified, allowing for a direct interpretation of the latent space order parameter in terms of the behavior of the system. The latent model is found to be an effective and convenient way of representing the data from the dynamic ensemble and provides a means of reducing the dimensionality of a system whose scale exceeds molecular systems so-far considered with similar tools. This bypasses a needmore »for researcher speculation on what elements of a system best contribute to summarizing major transitions, and suggests latent models are effective and insightful when applied to large systems with a diversity of complex behavior.« less