Despite the great success of Convolutional Neural Networks (CNNs) in Computer Vision and Natural Language Processing, the working mechanism behind CNNs is still under extensive discussion and research. Driven by strong demand for the theoretical explanation of neural networks, some researchers utilize information theory to provide insight into the black-box model. However, to the best of our knowledge, employing information theory to quantitatively analyze and qualitatively visualize neural networks has not been extensively studied in the visualization community. In this paper, we combine information entropies and visualization techniques to shed light on how CNN works. Specifically, we first introduce a data model to organize the data that can be extracted from CNN models. Then we propose two ways to calculate entropy under different circumstances. To provide a fundamental understanding of the basic building blocks of CNNs (e.g., convolutional layers, pooling layers, normalization layers) from an information-theoretic perspective, we develop a visual analysis system, CNNSlicer. CNNSlicer allows users to interactively explore the amount of information changes inside the model. With case studies on the widely used benchmark datasets (MNIST and CIFAR-10), we demonstrate the effectiveness of our system in opening the black-box of CNNs.
more »
« less
An Information-theoretic Visual Analysis Framework for Convolutional Neural Networks
Despite the great success of Convolutional Neural Networks (CNNs) in Computer Vision and Natural Language Processing, the working mechanism behind CNNs is still under extensive discussion and research. Driven by strong demand for the theoretical explanation of neural networks, some researchers utilize information theory to provide insight into the black-box model. However, to the best of our knowledge, employing information theory to quantitatively analyze and qualitatively visualize neural networks has not been extensively studied in the visualization community. In this paper, we combine information entropies and visualization techniques to shed light on how CNN works. Specifically, we first introduce a data model to organize the data that can be extracted from CNN models. Then we propose two ways to calculate entropy under different circumstances. To provide a fundamental understanding of the basic building blocks of CNNs (e.g., convolutional layers, pooling layers, normalization layers) from an information-theoretic perspective, we develop a visual analysis system, CNNSlicer. CNNSlicer allows users to interactively explore the amount of information changes inside the model. With case studies on the widely used benchmark datasets (MNIST and CIFAR-10), we demonstrate the effectiveness of our system in opening the black-box of CNNs.
more »
« less
- Award ID(s):
- 1955764
- PAR ID:
- 10328000
- Date Published:
- Journal Name:
- Smart Tools and Apps for Graphics (STAG)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In the past decade, deep neural networks, and specifically convolutional neural networks (CNNs), have been becoming a primary tool in the field of biomedical image analysis, and are used intensively in other fields such as object or face recognition. CNNs have a clear advantage in their ability to provide superior performance, yet without the requirement to fully understand the image elements that reflect the biomedical problem at hand, and without designing specific algorithms for that task. The availability of easy-to-use libraries and their non-parametric nature make CNN the most common solution to problems that require automatic biomedical image analysis. But while CNNs have many advantages, they also have certain downsides. The features determined by CNNs are complex and unintuitive, and therefore CNNs often work as a “Black Box”. Additionally, CNNs learn from any piece of information in the pixel data that can provide a discriminative signal, making it more difficult to control what the CNN actually learns. Here we follow common practices to test whether CNNs can classify biomedical image datasets, but instead of using the entire image we use merely parts of the images that do not have biomedical content. The experiments show that CNNs can provide high classification accuracy even when they are trained with datasets that do not contain any biomedical information, or can be systematically biased by irrelevant information in the image data. The presence of such consistent irrelevant data is difficult to identify, and can therefore lead to biased experimental results. Possible solutions to this downside of CNNs can be control experiments, as well as other protective practices to validate the results and avoid biased conclusions based on CNN-generated annotations.more » « less
-
Convolutional neural networks (CNNs) have been employed along with variational Monte Carlo methods for finding the ground state of quantum many-body spin systems with great success. However, it remains uncertain how CNNs, with a model complexity that scales at most linearly with the number of particles, solve the “curse of dimensionality” and efficiently represent wavefunctions in exponentially large Hilbert spaces. In this work, we use methodologies from information theory, group theory and machine learning, to elucidate how CNN captures relevant physics of quantum systems. We connect CNNs to a class of restricted maximum entropy (MaxEnt) and entangled plaquette correlator product state (EP-CPS) models that approximate symmetry constrained classical correlations between subsystems. For the final part of the puzzle, inspired by similar analyses for matrix product states and tensor networks, we show that the CNNs rely on the spectrum of each subsystem's entanglement Hamiltonians as captured by the size of the convolutional filter. All put together, these allow CNNs to simulate exponential quantum wave functions using a model that scales at most linear in system size as well as provide clues into when CNNs might fail to simulate Hamiltonians. We incorporate our insights into a new training algorithm and demonstrate its improved efficiency, accuracy, and robustness. Finally, we use regression analysis to show how the CNNs solutions can be used to identify salient physical features of the system that are the most relevant to an efficient approximation. Our integrated approach can be extended to similarly analyzing other neural network architectures and quantum spin systems. Published by the American Physical Society2025more » « less
-
Computer vision often uses highly accurate Convolutional Neural Networks (CNNs), but these deep learning models are associated with ever-increasing energy and computation requirements. Producing more energy-efficient CNNs often requires model training which can be cost-prohibitive. We propose a novel, automated method to make a pretrained CNN more energyefficient without re-training. Given a pretrained CNN, we insert a threshold layer that filters activations from the preceding layers to identify regions of the image that are irrelevant, i.e. can be ignored by the following layers while maintaining accuracy. Our modified focused convolution operation saves inference latency (by up to 25%) and energy costs (by up to 22%) on various popular pretrained CNNs, with little to no loss in accuracymore » « less
-
Convolutional Neural Networks (CNNs) are widely used in various domains, including image recognition, medical diagnosis and autonomous driving. Recent advances in dataflow-based CNN accelerators have enabled CNN inference in resource-constrained edge devices. These dataflow accelerators utilize inherent data reuse of convolution layers to process CNN models efficiently. Concealing the architecture of CNN models is critical for privacy and security. This article evaluates memory-based side-channel information to recover CNN architectures from dataflow-based CNN inference accelerators. The proposed attack exploits spatial and temporal data reuse of the dataflow mapping on CNN accelerators and architectural hints to recover the structure of CNN models. Experimental results demonstrate that our proposed side-channel attack can recover the structures of popular CNN models, namely, Lenet, Alexnet, VGGnet16, and YOLOv2.more » « less
An official website of the United States government

