Abstract Direct inverse analysis of faults in machinery systems such as gears using first principle is intrinsically difficult, owing to the multiple time- and length-scales involved in vibration modeling. As such, data-driven approaches have been the mainstream, whereas supervised trainings are deemed effective. Nevertheless, existing techniques often fall short in their ability to generalize from discrete data labels to the continuous spectrum of possible faults, which is further compounded by various uncertainties. This research proposes an interpretability-enhanced deep learning framework that incorporates Bayesian principles, effectively transforming convolutional neural networks (CNNs) into dynamic predictive models and significantly amplifying their generalizability with more accessible insights of the model's reasoning processes. Our approach is distinguished by a novel implementation of Bayesian inference, enabling the navigation of the probabilistic nuances of gear fault severities. By integrating variational inference into the deep learning architecture, we present a methodology that excels in leveraging limited data labels to reveal insights into both observed and unobserved fault conditions. This approach improves the model's capacity for uncertainty estimation and probabilistic generalization. Experimental validation on a lab-scale gear setup demonstrated the framework's superior performance, achieving nearly 100% accuracy in classifying known fault conditions, even in the presence of significant noise, and maintaining 96.15% accuracy when dealing with unseen fault severities. These results underscore the method's capability in discovering implicit relations between known and unseen faults, facilitating extended fault diagnosis, and effectively managing large degrees of measurement uncertainties.
more »
« less
ZERO-SHOT GENERALIZATION ACROSS ARCHITECTURES FOR VISUAL CLASSIFICATION
Generalization to unseen data is a key desideratum for deep networks, but its re- lation to classification accuracy is unclear. Using a minimalist vision dataset and a measure of generalizability, we show that popular networks, from deep con- volutional networks (CNNs) to transformers, vary in their power to extrapolate to unseen classes both across layers and across architectures. Accuracy is not a good predictor of generalizability, and generalization varies non-monotonically with layer depth. Our code is available at github.com/dyballa/generalization
more »
« less
- Award ID(s):
- 1822650
- PAR ID:
- 10559196
- Publisher / Repository:
- https://openreview.net/forum?id=orYMrUv7eu
- Date Published:
- Format(s):
- Medium: X
- Location:
- Vienna
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
With the recent demand of deploying neural network models on mobile and edge devices, it is desired to improve the model's generalizability on unseen testing data, as well as enhance the model's robustness under fixed-point quantization for efficient deployment. Minimizing the training loss, however, provides few guarantees on the generalization and quantization performance. In this work, we fulfill the need of improving generalization and quantization performance simultaneously by theoretically unifying them under the framework of improving the model's robustness against bounded weight perturbation and minimizing the eigenvalues of the Hessian matrix with respect to model weights. We therefore propose HERO, a Hessian-enhanced robust optimization method, to minimize the Hessian eigenvalues through a gradient-based training process, simultaneously improving the generalization and quantization performance. HERO enables up to a 3.8% gain on test accuracy, up to 30% higher accuracy under 80% training label perturbation, and the best post-training quantization accuracy across a wide range of precision, including a > 10% accuracy improvement over SGD-trained models for common model architectures on various datasets.more » « less
-
While recent years have witnessed a steady trend of applying Deep Learning (DL) to networking systems, most of the underlying Deep Neural Networks (DNNs) suffer two major limitations. First, they fail to generalize to topologies unseen during training. This lack of generalizability hampers the ability of the DNNs to make good decisions every time the topology of the networking system changes. Second, existing DNNs commonly operate as "blackboxes" that are difficult to interpret by network operators, and hinder their deployment in practice. In this paper, we propose to rely on a recently developed family of graph-based DNNs to address the aforementioned limitations. More specifically, we focus on a network congestion prediction application and apply Graph Attention (GAT) models to make congestion predictions per link using the graph topology and time series of link loads as inputs. Evaluations on three real backbone networks demonstrate the benefits of our proposed approach in terms of prediction accuracy, generalizability, and interpretability.more » « less
-
The capacity to generalize to future unseen data stands as one of the utmost crucial attributes of deep neural networks. Sharpness-Aware Minimization (SAM) aims to enhance the generalizability by minimizing worst-case loss using one-step gradient ascent as an approximation. However, as training progresses, the non-linearity of the loss landscape increases, rendering one-step gradient ascent less effective. On the other hand, multi-step gradient ascent will incur higher training cost. In this paper, we introduce a normalized Hessian trace to accurately measure the curvature of loss landscape on both training and test sets. In particular, to counter excessive non-linearity of loss landscape, we propose Curvature Regularized SAM (CR-SAM), integrating the normalized Hessian trace as a SAM regularizer. Additionally, we present an efficient way to compute the trace via finite differences with parallelism. Our theoretical analysis based on PAC-Bayes bounds establishes the regularizer's efficacy in reducing generalization error. Empirical evaluation on CIFAR and ImageNet datasets shows that CR-SAM consistently enhances classification performance for ResNet and Vision Transformer (ViT) models across various datasets. Our code is available at https://github.com/TrustAIoT/CR-SAM.more » « less
-
The prosperity of deep learning and automated machine learning (AutoML) is largely rooted in the development of novel neural networks -- but what defines and controls the "goodness" of networks in an architecture space? Test accuracy, a golden standard in AutoML, is closely related to three aspects: (1) expressivity (how complicated functions a network can approximate over the training data); (2) convergence (how fast the network can reach low training error under gradient descent); (3) generalization (whether a trained network can be generalized from the training data to unseen samples with low test error). However, most previous theory papers focus on fixed model structures, largely ignoring sophisticated networks used in practice. To facilitate the interpretation and understanding of the architecture design by AutoML, we target connecting a bigger picture: how does the architecture jointly impact its expressivity, convergence, and generalization? We demonstrate the "no free lunch" behavior in networks from an architecture space: given a fixed budget on the number of parameters, there does not exist a single architecture that is optimal in all three aspects. In other words, separately optimizing expressivity, convergence, and generalization will achieve different networks in the architecture space. Our analysis can explain a wide range of observations in AutoML. Experiments on popular benchmarks confirm our theoretical analysis. Our codes are attached in the supplement.more » « less
An official website of the United States government

