skip to main content

This content will become publicly available on June 27, 2024

Title: Experimental Observations of the Topology of Convolutional Neural Network Activations
Topological data analysis (TDA) is a branch of computational mathematics, bridging algebraic topology and data science, that provides compact, noise-robust representations of complex structures. Deep neural networks (DNNs) learn millions of parameters associated with a series of transformations defined by the model architecture resulting in high-dimensional, difficult to interpret internal representations of input data. As DNNs become more ubiquitous across multiple sectors of our society, there is increasing recognition that mathematical methods are needed to aid analysts, researchers, and practitioners in understanding and interpreting how these models' internal representations relate to the final classification. In this paper we apply cutting edge techniques from TDA with the goal of gaining insight towards interpretability of convolutional neural networks used for image classification. We use two common TDA approaches to explore several methods for modeling hidden layer activations as high-dimensional point clouds, and provide experimental evidence that these point clouds capture valuable structural information about the model's process. First, we demonstrate that a distance metric based on persistent homology can be used to quantify meaningful differences between layers and discuss these distances in the broader context of existing representational similarity metrics for neural network interpretability. Second, we show that a mapper graph can provide semantic insight as to how these models organize hierarchical class knowledge at each layer. These observations demonstrate that TDA is a useful tool to help deep learning practitioners unlock the hidden structures of their models.  more » « less
Award ID(s):
2134223 2205418
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the AAAI Conference on Artificial Intelligence
Page Range / eLocation ID:
9470 to 9479
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Topological data analysis (TDA) is a tool from data science and mathematics that is beginning to make waves in environmental science. In this work, we seek to provide an intuitive and understandable introduction to a tool from TDA that is particularly useful for the analysis of imagery, namely, persistent homology. We briefly discuss the theoretical background but focus primarily on understanding the output of this tool and discussing what information it can glean. To this end, we frame our discussion around a guiding example of classifying satellite images from the sugar, fish, flower, and gravel dataset produced for the study of mesoscale organization of clouds by Rasp et al. We demonstrate how persistent homology and its vectorization, persistence landscapes, can be used in a workflow with a simple machine learning algorithm to obtain good results, and we explore in detail how we can explain this behavior in terms of image-level features. One of the core strengths of persistent homology is how interpretable it can be, so throughout this paper we discuss not just the patterns we find but why those results are to be expected given what we know about the theory of persistent homology. Our goal is that readers of this paper will leave with a better understanding of TDA and persistent homology, will be able to identify problems and datasets of their own for which persistent homology could be helpful, and will gain an understanding of the results they obtain from applying the included GitHub example code.

    Significance Statement

    Information such as the geometric structure and texture of image data can greatly support the inference of the physical state of an observed Earth system, for example, in remote sensing to determine whether wildfires are active or to identify local climate zones. Persistent homology is a branch of topological data analysis that allows one to extract such information in an interpretable way—unlike black-box methods like deep neural networks. The purpose of this paper is to explain in an intuitive manner what persistent homology is and how researchers in environmental science can use it to create interpretable models. We demonstrate the approach to identify certain cloud patterns from satellite imagery and find that the resulting model is indeed interpretable.

    more » « less
  2. Borenstein, Elhanan (Ed.)
    Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno , achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at (a python package) and (a command line tool). 
    more » « less
  3. IEEE Open Journal of the Computer Society (Ed.)
    While neural networks have been achieving increasingly significant excitement in solving classification tasks such as natural language processing, their lack of interpretability becomes a great challenge for neural networks to be deployed in certain high-stakes human-centered applications. To address this issue, we propose a new approach for generating interpretable predictions by inferring a simple three-layer neural network with threshold activations, so that it can benefit from effective neural network training algorithms and at the same time, produce human-understandable explanations for the results. In particular, the hidden layer neurons in the proposed model are trained with floating point weights and binary output activations. The output neuron is also trainable as a threshold logic function that implements a disjunctive operation, forming the logical-OR of the first-level threshold logic functions. This neural network can be trained using state-of-the-art training methods to achieve high prediction accuracy. An important feature of the proposed architecture is that only a simple greedy algorithm is required to provide an explanation with the prediction that is human-understandable. In comparison with other explainable decision models, our proposed approach achieves more accurate predictions on a broad set of tabular data classification datasets. 
    more » « less
  4. We introduce a new general-purpose approach to deep learning on three-dimensional surfaces based on the insight that a simple diffusion layer is highly effective for spatial communication. The resulting networks are automatically robust to changes in resolution and sampling of a surface—a basic property that is crucial for practical applications. Our networks can be discretized on various geometric representations, such as triangle meshes or point clouds, and can even be trained on one representation and then applied to another. We optimize the spatial support of diffusion as a continuous network parameter ranging from purely local to totally global, removing the burden of manually choosing neighborhood sizes. The only other ingredients in the method are a multi-layer perceptron applied independently at each point and spatial gradient features to support directional filters. The resulting networks are simple, robust, and efficient. Here, we focus primarily on triangle mesh surfaces and demonstrate state-of-the-art results for a variety of tasks, including surface classification, segmentation, and non-rigid correspondence. 
    more » « less
  5. Abstract Representations of the world environment play a crucial role in artificial intelligence. It is often inefficient to conduct reasoning and inference directly in the space of raw sensory representations, such as pixel values of images. Representation learning allows us to automatically discover suitable representations from raw sensory data. For example, given raw sensory data, a deep neural network learns nonlinear representations at its hidden layers, which are subsequently used for classification (or regression) at its output layer. This happens implicitly during training through minimizing a supervised or unsupervised loss. In this letter, we study the dynamics of such implicit nonlinear representation learning. We identify a pair of a new assumption and a novel condition, called the on-model structure assumption and the data architecture alignment condition. Under the on-model structure assumption, the data architecture alignment condition is shown to be sufficient for the global convergence and necessary for global optimality. Moreover, our theory explains how and when increasing network size does and does not improve the training behaviors in the practical regime. Our results provide practical guidance for designing a model structure; for example, the on-model structure assumption can be used as a justification for using a particular model structure instead of others. As an application, we then derive a new training framework, which satisfies the data architecture alignment condition without assuming it by automatically modifying any given training algorithm dependent on data and architecture. Given a standard training algorithm, the framework running its modified version is empirically shown to maintain competitive (practical) test performances while providing global convergence guarantees for deep residual neural networks with convolutions, skip connections, and batch normalization with standard benchmark data sets, including MNIST, CIFAR-10, CIFAR-100, Semeion, KMNIST, and SVHN. 
    more » « less