 Award ID(s):
 1440420
 NSFPAR ID:
 10193705
 Date Published:
 Journal Name:
 2018 IEEE International Conference on Big Data
 Page Range / eLocation ID:
 327 to 334
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

An emerging method for data analysis is called Topological Data Analysis (TDA). TDA is based in the mathematical field of topology and examines the properties of spaces under continuous deformation. One of the key tools used for TDA is called persistent homology which considers the connectivity of points in a ddimensional point cloud at different spatial resolutions to identify topological properties (holes, loops, and voids) in the space. Persistent homology then classifies the topological features by their persistence through the range of spatial connectivity. Unfortunately the memory and runtime complexity of computing persistent homology is exponential and current tools can only process a few thousand points in R3. Fortunately, the use of data reduction techniques enables persistent homology to be applied to much larger point clouds. Techniques to reduce the data range from random sampling of points to clustering the data and using the cluster centroids as the reduced data. While several data reduction approaches appear to preserve the large topological features present in the original point cloud, no systematic study comparing the efficacy of different data clustering techniques in preserving the persistent homology results has been performed. This paper explores the question of topology preserving data reductions and describes formally when and how topological features can be mischaracterized or lost by data reduction techniques. The paper also performs an experimental assessment of data reduction techniques and resilient effects on the persistent homology. In particular, data reduction by random selection is compared to cluster centroids extracted from different data clustering algorithms.more » « less

Persistent homology is a method of data analysis that is based in the mathematical field of topology. Unfortunately, the runtime and memory complexities associated with computing persistent homology inhibit general use for the analysis of big data. For example, the best tools currently available to compute persistent homology can process only a few thousand data points in R^3. Several studies have proposed using sampling or data reduction methods to attack this limit. While these approaches enable the computation of persistent homology on much larger data sets, the methods are approximate. Furthermore, while they largely preserve the results of large topological features, they generally miss reporting information about the small topological features that are present in the data set. While this abstraction is useful in many cases, there are data analysis needs where the smaller features are also significant (e.g., brain artery analysis). This paper explores a combination of data reduction and data partitioning to compute persistent homology on big data that enables the identification of both large and small topological features from the input data set. To reduce the approximation errors that typically accompany data reduction for persistent homology, the described method also includes a mechanism of ``upscaling'' the data circumscribing the large topological features that are computed from the sampled data. The designed experimental method provides significant results for improving the scale at which persistent homology can be performedmore » « less

Topological Data Analysis (TDA) is a data mining technique to characterize the topological features of data. Persistent Homology (PH) is an important tool of TDA that has been applied to a wide range of applications. However its time and space complexities motivates a need for new methods to compute the PH of highdimensional data. An important, and memory intensive, element in the computation of PH is the complex constructed from the input data. In general, PH tools use and focus on optimizing simplicial complexes; less frequently cubical complexes are also studied. This paper develops a method to construct polytopal complexes (or complexes constructed of any mix of convex polytopes) in any dimension Rn . In general, polytopal complexes are significantly smaller than simplicial or cubical complexes. This paper includes an experimental assessment of the impact that polytopal complexes have on memory complexity and output results of a PH computation.more » « less

Persistent Homology (PH) is a method of Topological Data Analysis that analyzes the topological structure of data to help data scientists infer relationships in the data to assist in informed decision making. A significant component in the computation of PH is the construction and use of a complex that represents the topological structure of the data. Some complex types are fast to construct but space inefficient whereas others are costly to construct and space efficient. Unfortunately, existing complex types are not both fast to construct and compact. This paper works to increase the scope of PH to support the computation of low dimensional homologies (H0 –H10 ) in highdimension, big data. In particular, this paper exploits the desirable properties of the Vietoris–Rips Complex (VRComplex) and the Delaunay Complex in order to construct a sparsified complex. The VRComplex uses a distance matrix to quickly generate a complex up to the desired homology dimension. In contrast, the Delaunay Complex works at the dimensionality of the data to generate a sparsified complex. While construction of the VRComplex is fast, its size grows exponentially by the size and dimension of the data set; in contrast, the Delaunay complex is significantly smaller for any given data dimension. However, its construction requires the computation of a Delaunay Triangulation that has high computational complexity. As a result, it is difficult to construct a Delaunay Complex for data in dimensions d > 6 that contains more than a few hundred points. The techniques in this paper enable the computation of topological preserving sparsification of kSimplices (where k ≪ d) to quickly generate a reduced sparsified complex sufficient to compute homologies up to ksubspace, irrespective of the data dimensionality d.more » « less

Abstract Matrix reduction is the standard procedure for computing the persistent homology of a filtered simplicial complex with
m simplices. Its output is a particular decomposition of the total boundary matrix, from which the persistence diagrams and generating cycles are derived. Persistence diagrams are known to vary continuously with respect to their input, motivating the study of their computation for timevarying filtered complexes. Computing persistence dynamically can be reduced to maintaining a valid decomposition under adjacent transpositions in the filtration order. Since there are such transpositions, this maintenance procedure exhibits limited scalability and is often too fine for many applications. We propose a coarser strategy for maintaining the decomposition over a 1parameter family of filtrations. By reduction to a particular longest common subsequence problem, we show that the minimal number of decomposition updates$$O(m^2)$$ $O\left({m}^{2}\right)$d can be found in time and$$O(m \log \log m)$$ $O(mloglogm)$O (m ) space, and that the corresponding sequence of permutations—which we call aschedule —can be constructed in time. We also show that, in expectation, the storage needed to employ this strategy is actually sublinear in$$O(d m \log m)$$ $O(dmlogm)$m . Exploiting this connection, we show experimentally that the decrease in operations to compute diagrams across a family of filtrations is proportional to the difference between the expected quadratic number of states and the proposed sublinear coarsening. Applications to video data, dynamic metric space data, and multiparameter persistence are also presented.