skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: NeuralDivergence: Exploring and Understanding Neural Networks by Comparing Activation Distributions
As deep neural networks are increasingly used in solving highstake problems, there is a pressing need to understand their internal decision mechanisms. Visualization has helped address this problem by assisting with interpreting complex deep neural networks. However, current tools often support only single data instances, or visualize layers in isolation. We present NEURALDIVERGENCE, an interactive visualization system that uses activation distributions as a high-level summary of what a model has learned. NEURALDIVERGENCE enables users to interactively summarize and compare activation distributions across layers, classes, and instances (e.g., pairs of adversarial attacked and benign images), helping them gain better understanding of neural network models.  more » « less
Award ID(s):
1704701 1563816
PAR ID:
10095875
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
PacificVis 2019
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We explore convergence of deep neural networks with the popular ReLU activation function, as the depth of the networks tends to infinity. To this end, we introduce the notion of activation domains and activation matrices of a ReLU network. By replacing applications of the ReLU activation function by multiplications with activation matrices on activation domains, we obtain an explicit expression of the ReLU network. We then identify the convergence of the ReLU networks as convergence of a class of infinite products of matrices. Sufficient and necessary conditions for convergence of these infinite products of matrices are studied. As a result, we establish necessary conditions for ReLU networks to converge that the sequence of weight matrices converges to the identity matrix and the sequence of the bias vectors converges to zero as the depth of ReLU networks increases to infinity. Moreover, we obtain sufficient conditions in terms of the weight matrices and bias vectors at hidden layers for pointwise convergence of deep ReLU networks. These results provide mathematical insights to convergence of deep neural networks. Experiments are conducted to mathematically verify the results and to illustrate their potential usefulness in initialization of deep neural networks. 
    more » « less
  2. null (Ed.)
    The nonlinearity of activation functions used in deep learning models is crucial for the success of predictive models. Several simple nonlinear functions, including Rectified Linear Unit (ReLU) and Leaky-ReLU (L-ReLU) are commonly used in neural networks to impose the nonlinearity. In practice, these functions remarkably enhance the model accuracy. However, there is limited insight into the effects of nonlinearity in neural networks on their performance. Here, we investigate the performance of neural network models as a function of nonlinearity using ReLU and L-ReLU activation functions in the context of different model architectures and data domains. We use entropy as a measurement of the randomness, to quantify the effects of nonlinearity in different architecture shapes on the performance of neural networks. We show that the ReLU nonliearity is a better choice for activation function mostly when the network has sufficient number of parameters. However, we found that the image classification models with transfer learning seem to perform well with L-ReLU in fully connected layers. We show that the entropy of hidden layer outputs in neural networks can fairly represent the fluctuations in information loss as a function of nonlinearity. Furthermore, we investigate the entropy profile of shallow neural networks as a way of representing their hidden layer dynamics. 
    more » « less
  3. While neural networks are used for classification tasks across domains, a long-standing open problem in machine learning is determining whether neural networks trained using standard procedures are consistent for classification, i.e., whether such models minimize the probability of misclassification for arbitrary data distributions. In this work, we identify and construct an explicit set of neural network classifiers that are consistent. Since effective neural networks in practice are typically both wide and deep, we analyze infinitely wide networks that are also infinitely deep. In particular, using the recent connection between infinitely wide neural networks and neural tangent kernels, we provide explicit activation functions that can be used to construct networks that achieve consistency. Interestingly, these activation functions are simple and easy to implement, yet differ from commonly used activations such as ReLU or sigmoid. More generally, we create a taxonomy of infinitely wide and deep networks and show that these models implement one of three well-known classifiers depending on the activation function used: 1) 1-nearest neighbor (model predictions are given by the label of the nearest training example); 2) majority vote (model predictions are given by the label of the class with the greatest representation in the training set); or 3) singular kernel classifiers (a set of classifiers containing those that achieve consistency). Our results highlight the benefit of using deep networks for classification tasks, in contrast to regression tasks, where excessive depth is harmful. 
    more » « less
  4. Inkjet-printed circuits on flexible substrates are rapidly emerging as a key technology in flexible electronics, driven by their minimal fabrication process, cost-effectiveness, and environmental sustainability. Recent advancements in inkjet-printed devices and circuits have broadened their applications in both sensing and computing. Building on this progress, this work has developed a nonlinear computational element coined as mTanh to serve as an activation function in neural networks. Activation functions are essential in neural networks as they introduce nonlinearity, enabling machine learning models to capture complex patterns. However, widely used functions such as Tanh and sigmoid often suffer from the vanishing gradient problem, limiting the depth of neural networks. To address this, alternative functions like ReLU and Leaky ReLU have been explored, yet these also introduce challenges such as the dying ReLU issue, bias shifting, and noise sensitivity. The proposed mTanh activation function effectively mitigates the vanishing gradient problem, allowing for the development of deeper neural network architectures without compromising training efficiency. This study demonstrates the feasibility of mTanh as an activation function by integrating it into an Echo State Network to predict the Mackey–Glass time series signal. The results show that mTanh performs comparably to Tanh, ReLU, and Leaky ReLU in this task. Additionally, the vanishing gradient resistance of the mTanh function was evaluated by implementing it in a deep multi-layer perceptron model for Fashion MNIST image classification. The study indicates that mTanh enables the addition of 3–5 extra layers compared to Tanh and sigmoid, while exhibiting vanishing gradient resistance similar to ReLU. These results highlight the potential of mTanh as a promising activation function for deep learning models, particularly in flexible electronics applications. 
    more » « less
  5. Abstract Deep neural networks such as GoogLeNet, ResNet, and BERT have achieved impressive performance in tasks such as image and text classification. To understand how such performance is achieved, we probe a trained deep neural network by studying neuron activations, i.e.combinations of neuron firings, at various layers of the network in response to a particular input. With a large number of inputs, we aim to obtain a global view of what neurons detect by studying their activations. In particular, we develop visualizations that show the shape of the activation space, the organizational principle behind neuron activations, and the relationships of these activations within a layer. Applying tools from topological data analysis, we presentTopoAct, a visual exploration system to study topological summaries of activation vectors. We present exploration scenarios usingTopoActthat provide valuable insights into learned representations of neural networks. We expectTopoActto give a topological perspective that enriches the current toolbox of neural network analysis, and to provide a basis for network architecture diagnosis and data anomaly detection. 
    more » « less