skip to main content

Title: A Survey on the High Performance Computation of Persistent Homology
Persistent Homology is a computational method of data mining in the field of Topological Data Analysis. Large-scale data analysis with persistent homology is computationally expensive and memory intensive. The performance of persistent homology has been rigorously studied to optimize data encoding and intermediate data structures for high-performance computation. This paper provides an application-centric survey of the High-Performance Computation of Persistent Homology. Computational topology concepts are reviewed and detailed for a broad data science and engineering audience.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE Transactions on Knowledge and Data Engineering
Page Range / eLocation ID:
1 to 1
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Persistent homology is used for computing topological features of a space at different spatial resolutions. It is one of the main tools from computational topology that is applied to the problems of data analysis. Despite several attempts to reduce its complexity, persistent homology remains expensive in both time and space. These limits are such that the largest data sets to which the method can be applied have the number of points of the order of thousands in R^3. This paper explores a technique intended to reduce the number of data points while preserving the salient topological features of the data. The proposed technique enables the computation of persistent homology on a reduced version of the original input data without affecting significant components of the output. Since the run time of persistent homology is exponential in the number of data points, the proposed data reduction method facilitates the computation in a fraction of the time required for the original data. Moreover, the data reduction method can be combined with any existing technique that simplifies the computation of persistent homology. The data reduction is performed by creating small groups of \emph{similar} data points, called nano-clusters, and then replacing the points within each nano-cluster with its cluster center. The persistence homology of the reduced data differs from that of the original data by an amount bounded by the radius of the nano-clusters. The theoretical analysis is backed by experimental results showing that persistent homology is preserved by the proposed data reduction technique. 
    more » « less
  2. Persistent Homology (PH) is a method of Topological Data Analysis that analyzes the topological structure of data to help data scientists infer relationships in the data to assist in informed decision- making. A significant component in the computation of PH is the construction and use of a complex that represents the topological structure of the data. Some complex types are fast to construct but space inefficient whereas others are costly to construct and space efficient. Unfortunately, existing complex types are not both fast to construct and compact. This paper works to increase the scope of PH to support the computation of low dimensional homologies (H0 –H10 ) in high-dimension, big data. In particular, this paper exploits the desirable properties of the Vietoris–Rips Complex (VR-Complex) and the Delaunay Complex in order to construct a sparsified complex. The VR-Complex uses a distance matrix to quickly generate a complex up to the desired homology dimension. In contrast, the Delaunay Complex works at the dimensionality of the data to generate a sparsified complex. While construction of the VR-Complex is fast, its size grows exponentially by the size and dimension of the data set; in contrast, the Delaunay complex is significantly smaller for any given data dimension. However, its construction requires the computation of a Delaunay Triangulation that has high computational complexity. As a result, it is difficult to construct a Delaunay Complex for data in dimensions d > 6 that contains more than a few hundred points. The techniques in this paper enable the computation of topological preserving sparsification of k-Simplices (where k ≪ d) to quickly generate a reduced sparsified complex sufficient to compute homologies up to k-subspace, irrespective of the data dimensionality d. 
    more » « less
  3. Persistent homology is perhaps the most popular and useful tool offered by topological data analysis, with point-cloud data being the most common setup. Its older cousin, the Euler characteristic curve (ECC) is less expressive, but far easier to compute. It is particularly suitable for analyzing imaging data, and is commonly used in fields ranging from astrophysics to biomedical image analysis. These fields are embracing GPU computations to handle increasingly large datasets. We therefore propose an optimized GPU implementation of ECC computation for 2D and 3D grayscale images. The goal of this paper is twofold. First, we offer a practical tool, illustrating its performance with thorough experimentation, but also explain its inherent shortcomings. Second, this simple algorithm serves as a perfect backdrop for highlighting basic GPU programming techniques that make our implementation so efficient, and some common pitfalls we avoided. This is intended as a step towards a wider usage of GPU programming in computational geometry and topology software. We find this is particularly important as geometric and topological tools are used in conjunction with modern, GPU-accelerated machine learning frameworks. 
    more » « less
  4. Persistent homology (PH) matrix reduction is an important tool for data analytics in many application areas. Due to its highly irregular execution patterns in computation, it is challenging to gain high efficiency in parallel processing for increasingly large data sets. In this paper, we introduce HYPHA, a HYbrid Persistent Homology matrix reduction Accelerator, to make parallel processing highly efficient on both GPU and multicore. The essential foundation of our algorithm design and implementation is the separation of SIMT and MIMD parallelisms in PH matrix reduction computation. With such a separation, we are able to perform massive parallel scanning operations on GPU in a super-fast manner, which also collects rich information from an input boundary matrix for further parallel reduction operations on multicore with high efficiency. The HYPHA framework may provide a general purpose guidance to high performance computing on multiple hardware accelerators. To our best knowledge, HYPHA achieves the highest performance in PH matrix reduction execution. Our experiments show speedups of up to 116x against the standard PH algorithm. Compared to the state-of-the-art parallel PH software packages, such as PHAT and DIPHA, HYPHA outperforms their fastest PH matrix reduction algorithms by factor up to 2.3x. 
    more » « less
  5. Cycle representatives of persistent homology classes can be used to provide descriptions of topological features in data. However, the non-uniqueness of these representatives creates ambiguity and can lead to many different interpretations of the same set of classes. One approach to solving this problem is to optimize the choice of representative against some measure that is meaningful in the context of the data. In this work, we provide a study of the effectiveness and computational cost of several ℓ 1 minimization optimization procedures for constructing homological cycle bases for persistent homology with rational coefficients in dimension one, including uniform-weighted and length-weighted edge-loss algorithms as well as uniform-weighted and area-weighted triangle-loss algorithms. We conduct these optimizations via standard linear programming methods, applying general-purpose solvers to optimize over column bases of simplicial boundary matrices. Our key findings are: 1) optimization is effective in reducing the size of cycle representatives, though the extent of the reduction varies according to the dimension and distribution of the underlying data, 2) the computational cost of optimizing a basis of cycle representatives exceeds the cost of computing such a basis, in most data sets we consider, 3) the choice of linear solvers matters a lot to the computation time of optimizing cycles, 4) the computation time of solving an integer program is not significantly longer than the computation time of solving a linear program for most of the cycle representatives, using the Gurobi linear solver, 5) strikingly, whether requiring integer solutions or not, we almost always obtain a solution with the same cost and almost all solutions found have entries in { ‐ 1,0,1 } and therefore, are also solutions to a restricted ℓ 0 optimization problem, and 6) we obtain qualitatively different results for generators in Erdős-Rényi random clique complexes than in real-world and synthetic point cloud data. 
    more » « less