skip to main content


Title: Topology Preserving Data Reduction for Computing Persistent Homology
An emerging method for data analysis is called Topological Data Analysis (TDA). TDA is based in the mathematical field of topology and examines the properties of spaces under continuous deformation. One of the key tools used for TDA is called persistent homology which considers the connectivity of points in a d-dimensional point cloud at different spatial resolutions to identify topological properties (holes, loops, and voids) in the space. Persistent homology then classifies the topological features by their persistence through the range of spatial connectivity. Unfortunately the memory and run-time complexity of computing persistent homology is exponential and current tools can only process a few thousand points in R3. Fortunately, the use of data reduction techniques enables persistent homology to be applied to much larger point clouds. Techniques to reduce the data range from random sampling of points to clustering the data and using the cluster centroids as the reduced data. While several data reduction approaches appear to preserve the large topological features present in the original point cloud, no systematic study comparing the efficacy of different data clustering techniques in preserving the persistent homology results has been performed. This paper explores the question of topology preserving data reductions and describes formally when and how topological features can be mischaracterized or lost by data reduction techniques. The paper also performs an experimental assessment of data reduction techniques and resilient effects on the persistent homology. In particular, data reduction by random selection is compared to cluster centroids extracted from different data clustering algorithms.  more » « less
Award ID(s):
1909096
NSF-PAR ID:
10350973
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
International Workshop on Big Data Reduction
Page Range / eLocation ID:
2681 to 2690
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Persistent homology is used for computing topological features of a space at different spatial resolutions. It is one of the main tools from computational topology that is applied to the problems of data analysis. Despite several attempts to reduce its complexity, persistent homology remains expensive in both time and space. These limits are such that the largest data sets to which the method can be applied have the number of points of the order of thousands in R^3. This paper explores a technique intended to reduce the number of data points while preserving the salient topological features of the data. The proposed technique enables the computation of persistent homology on a reduced version of the original input data without affecting significant components of the output. Since the run time of persistent homology is exponential in the number of data points, the proposed data reduction method facilitates the computation in a fraction of the time required for the original data. Moreover, the data reduction method can be combined with any existing technique that simplifies the computation of persistent homology. The data reduction is performed by creating small groups of \emph{similar} data points, called nano-clusters, and then replacing the points within each nano-cluster with its cluster center. The persistence homology of the reduced data differs from that of the original data by an amount bounded by the radius of the nano-clusters. The theoretical analysis is backed by experimental results showing that persistent homology is preserved by the proposed data reduction technique. 
    more » « less
  2. Persistent homology is a method of data analysis that is based in the mathematical field of topology. Unfortunately, the run-time and memory complexities associated with computing persistent homology inhibit general use for the analysis of big data. For example, the best tools currently available to compute persistent homology can process only a few thousand data points in R^3. Several studies have proposed using sampling or data reduction methods to attack this limit. While these approaches enable the computation of persistent homology on much larger data sets, the methods are approximate. Furthermore, while they largely preserve the results of large topological features, they generally miss reporting information about the small topological features that are present in the data set. While this abstraction is useful in many cases, there are data analysis needs where the smaller features are also significant (e.g., brain artery analysis). This paper explores a combination of data reduction and data partitioning to compute persistent homology on big data that enables the identification of both large and small topological features from the input data set. To reduce the approximation errors that typically accompany data reduction for persistent homology, the described method also includes a mechanism of ``upscaling'' the data circumscribing the large topological features that are computed from the sampled data. The designed experimental method provides significant results for improving the scale at which persistent homology can be performed 
    more » « less
  3. Over the last two decades, topological data analysis (TDA) has emerged as a very powerful data analytic approach that can deal with various data modalities of varying complexities. One of the most commonly used tools in TDA is persistent homology (PH), which can extract topological properties from data at various scales. The aim of this article is to introduce TDA concepts to a statistical audience and provide an approach to analyzing multivariate time series data. The application’s focus will be on multivariate brain signals and brain connectivity networks. Finally, this paper concludes with an overview of some open problems and potential application of TDA to modeling directionality in a brain network, as well as the casting of TDA in the context of mixed effect models to capture variations in the topological properties of data collected from multiple subjects.

     
    more » « less
  4. null (Ed.)
    Abstract In developmental biology as well as in other biological systems, emerging structure and organization can be captured using time-series data of protein locations. In analyzing this time-dependent data, it is a common challenge not only to determine whether topological features emerge, but also to identify the timing of their formation. For instance, in most cells, actin filaments interact with myosin motor proteins and organize into polymer networks and higher-order structures. Ring channels are examples of such structures that maintain constant diameters over time and play key roles in processes such as cell division, development, and wound healing. Given the limitations in studying interactions of actin with myosin in vivo, we generate time-series data of protein polymer interactions in cells using complex agent-based models. Since the data has a filamentous structure, we propose sampling along the actin filaments and analyzing the topological structure of the resulting point cloud at each time. Building on existing tools from persistent homology, we develop a topological data analysis (TDA) method that assesses effective ring generation in this dynamic data. This method connects topological features through time in a path that corresponds to emergence of organization in the data. In this work, we also propose methods for assessing whether the topological features of interest are significant and thus whether they contribute to the formation of an emerging hole (ring channel) in the simulated protein interactions. In particular, we use the MEDYAN simulation platform to show that this technique can distinguish between the actin cytoskeleton organization resulting from distinct motor protein binding parameters. 
    more » « less
  5. Topological data analysis (TDA) is a branch of computational mathematics, bridging algebraic topology and data science, that provides compact, noise-robust representations of complex structures. Deep neural networks (DNNs) learn millions of parameters associated with a series of transformations defined by the model architecture resulting in high-dimensional, difficult to interpret internal representations of input data. As DNNs become more ubiquitous across multiple sectors of our society, there is increasing recognition that mathematical methods are needed to aid analysts, researchers, and practitioners in understanding and interpreting how these models' internal representations relate to the final classification. In this paper we apply cutting edge techniques from TDA with the goal of gaining insight towards interpretability of convolutional neural networks used for image classification. We use two common TDA approaches to explore several methods for modeling hidden layer activations as high-dimensional point clouds, and provide experimental evidence that these point clouds capture valuable structural information about the model's process. First, we demonstrate that a distance metric based on persistent homology can be used to quantify meaningful differences between layers and discuss these distances in the broader context of existing representational similarity metrics for neural network interpretability. Second, we show that a mapper graph can provide semantic insight as to how these models organize hierarchical class knowledge at each layer. These observations demonstrate that TDA is a useful tool to help deep learning practitioners unlock the hidden structures of their models. 
    more » « less