skip to main content


Title: Persistence Homology of Proximity Hyper-Graphs for Higher Dimensional Big Data
Persistent Homology (PH) is a method of Topological Data Analysis that analyzes the topological structure of data to help data scientists infer relationships in the data to assist in informed decision- making. A significant component in the computation of PH is the construction and use of a complex that represents the topological structure of the data. Some complex types are fast to construct but space inefficient whereas others are costly to construct and space efficient. Unfortunately, existing complex types are not both fast to construct and compact. This paper works to increase the scope of PH to support the computation of low dimensional homologies (H0 –H10 ) in high-dimension, big data. In particular, this paper exploits the desirable properties of the Vietoris–Rips Complex (VR-Complex) and the Delaunay Complex in order to construct a sparsified complex. The VR-Complex uses a distance matrix to quickly generate a complex up to the desired homology dimension. In contrast, the Delaunay Complex works at the dimensionality of the data to generate a sparsified complex. While construction of the VR-Complex is fast, its size grows exponentially by the size and dimension of the data set; in contrast, the Delaunay complex is significantly smaller for any given data dimension. However, its construction requires the computation of a Delaunay Triangulation that has high computational complexity. As a result, it is difficult to construct a Delaunay Complex for data in dimensions d > 6 that contains more than a few hundred points. The techniques in this paper enable the computation of topological preserving sparsification of k-Simplices (where k ≪ d) to quickly generate a reduced sparsified complex sufficient to compute homologies up to k-subspace, irrespective of the data dimensionality d.  more » « less
Award ID(s):
1909096
NSF-PAR ID:
10466297
Author(s) / Creator(s):
;
Date Published:
Journal Name:
IEEE International Conference on Big Data
Page Range / eLocation ID:
65 to 74
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Topological Data Analysis (TDA) is a data mining technique to characterize the topological features of data. Persistent Homology (PH) is an important tool of TDA that has been applied to a wide range of applications. However its time and space complexities motivates a need for new methods to compute the PH of high-dimensional data. An important, and memory intensive, element in the computation of PH is the complex constructed from the input data. In general, PH tools use and focus on optimizing simplicial complexes; less frequently cubical complexes are also studied. This paper develops a method to construct polytopal complexes (or complexes constructed of any mix of convex polytopes) in any dimension Rn . In general, polytopal complexes are significantly smaller than simplicial or cubical complexes. This paper includes an experimental assessment of the impact that polytopal complexes have on memory complexity and output results of a PH computation. 
    more » « less
  2. Persistent Homology (PH) is computationally expensive and is thus generally employed with strict limits on the (i) maximum connectivity distance and (ii) dimensions of homology groups to compute (unless working with trivially small data sets). As a result, most studies with PH only work with H0 and H1 homology groups. This paper examines the identification and isolation of regions of data sets where high dimensional topological features are suspected to be located. These regions are analyzed with PH to characterize the high dimensional homology groups contained in that region. Since only the region around a suspected topological feature is analyzed, it is possible to identify high dimension homologies piecewise and then assemble the results into a scalable characterization of the original data set. 
    more » « less
  3. Buchin, Kevin and (Ed.)
    We show how a filtration of Delaunay complexes can be used to approximate the persistence diagram of the distance to a point set in ℝ^d. Whereas the full Delaunay complex can be used to compute this persistence diagram exactly, it may have size O(n^⌈d/2⌉). In contrast, our construction uses only O(n) simplices. The central idea is to connect Delaunay complexes on progressively denser subsamples by considering the flips in an incremental construction as simplices in d+1 dimensions. This approach leads to a very simple and straightforward proof of correctness in geometric terms, because the final filtration is dual to a (d+1)-dimensional Voronoi construction similar to the standard Delaunay filtration. We also, show how this complex can be efficiently constructed. 
    more » « less
  4. null (Ed.)
    Abstract In this paper, we introduce and study representation homology of topological spaces, which is a natural homological extension of representation varieties of fundamental groups. We give an elementary construction of representation homology parallel to the Loday–Pirashvili construction of higher Hochschild homology; in fact, we establish a direct geometric relation between the two theories by proving that the representation homology of the suspension of a (pointed connected) space is isomorphic to its higher Hochschild homology. We also construct some natural maps and spectral sequences relating representation homology to other homology theories associated with spaces (such as Pontryagin algebras, ${{\mathbb{S}}}^1$-equivariant homology of the free loop space, and stable homology of automorphism groups of f.g. free groups). We compute representation homology explicitly (in terms of known invariants) in a number of interesting cases, including spheres, suspensions, complex projective spaces, Riemann surfaces, and some 3-dimensional manifolds, such as link complements in ${\mathbb{R}}^3$ and the lens spaces $ L(p,q) $. In the case of link complements, we identify the representation homology in terms of ordinary Hochschild homology, which gives a new algebraic invariant of links in ${\mathbb{R}}^3$. 
    more » « less
  5. Tile self-assembly is a well-studied theoretical model of geometric computation based on nanoscale DNA-based molecular systems. Here, we study the two-handed tile self-assembly model or 2HAM at general temperatures, in contrast with prior study limited to small constant temperatures, leading to surprising results. We obtain constructions at larger (i.e., hotter) temperatures that disprove prior conjectures and break well-known bounds for low-temperature systems via new methods of temperature-encoded information. In particular, for all n∈N , we assemble n×n squares using O(2log∗n) tile types, thus breaking the well-known information theoretic lower bound of Rothemund and Winfree. Using this construction, we then show how to use the temperature to encode general shapes and construct them at scale with O(2log∗K) tiles, where K denotes the Kolmogorov complexity of the target shape. Following, we refute a long-held conjecture by showing how to use temperature to construct n×O(1) rectangles using only O(logn/loglogn) tile types. We also give two small systems to generate nanorulers of varying length based solely on varying the system temperature. These results constitute the first real demonstration of the power of high temperature systems for tile assembly in the 2HAM. This leads to several directions for future explorations which we discuss in the conclusion. 
    more » « less