skip to main content


Title: A Domain-Oblivious Approach for Learning Concise Representations of Filtered Topological Spaces for Clustering
Persistence diagrams have been widely used to quantify the underlying features of filtered topological spaces in data visualization. In many applications, computing distances between diagrams is essential; however, computing these distances has been challenging due to the computational cost. In this paper, we propose a persistence diagram hashing framework that learns a binary code representation of persistence diagrams, which allows for fast computation of distances. This framework is built upon a generative adversarial network (GAN) with a diagram distance loss function to steer the learning process. Instead of using standard representations, we hash diagrams into binary codes, which have natural advantages in large-scale tasks. The training of this model is domain-oblivious in that it can be computed purely from synthetic, randomly created diagrams. As a consequence, our proposed method is directly applicable to various datasets without the need for retraining the model. These binary codes, when compared using fast Hamming distance, better maintain topological similarity properties between datasets than other vectorized representations. To evaluate this method, we apply our framework to the problem of diagram clustering and we compare the quality and performance of our approach to the state-of-the-art. In addition, we show the scalability of our approach on a dataset with 10k persistence diagrams, which is not possible with current techniques. Moreover, our experimental results demonstrate that our method is significantly faster with the potential of less memory usage, while retaining comparable or better quality comparisons.  more » « less
Award ID(s):
2046730 1664858 1664848 2136744
NSF-PAR ID:
10302557
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE Transactions on Visualization and Computer Graphics
ISSN:
1077-2626
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Larochelle, H. ; Ranzato, M. ; Hadsell, R. ; Balcan, M.F. ; and Lin, H. (Ed.)
    Persistent homology has become an important tool for extracting geometric and topological features from data, whose multi-scale features are summarized in a persistence diagram. From a statistical perspective, however, persistence diagrams are very sensitive to perturbations in the input space. In this work, we develop a framework for constructing robust persistence diagrams from superlevel filtrations of robust density estimators constructed using reproducing kernels. Using an analogue of the influence function on the space of persistence diagrams, we establish the proposed framework to be less sensitive to outliers. The robust persistence diagrams are shown to be consistent estimators in the bottleneck distance, with the convergence rate controlled by the smoothness of the kernel — this, in turn, allows us to construct uniform confidence bands in the space of persistence diagrams. Finally, we demonstrate the superiority of the proposed approach on benchmark datasets. 
    more » « less
  2. Recently a new feature representation framework based on a topological tool called persistent homology (and its persistence diagram summary) has gained much momentum. A series of methods have been developed to map a persistence diagram to a vector representation so as to facilitate the downstream use of machine learning tools. In these approaches, the importance (weight) of different persistence features are usually pre-set. However often in practice, the choice of the weight-functions hould depend on the nature of the specific data at hand. It is thus highly desirable to learn a best weight-function (and thus metric for persistence diagrams) from labelled data. We study this problem and develop a new weighted kernel, called WKPI, for persistence summaries, as well as an optimization framework to learn the weight (and thus kernel). We apply the learned kernel to the challenging task of graph classification, and show that our WKPI-based classification framework obtains similar or (sometimes significantly) better results than the best results from a range of previous graph classification frameworks on benchmark datasets. 
    more » « less
  3. As the field of Topological Data Analysis continues to show success in theory and in applications, there has been increasing interest in using tools from this field with methods for machine learning. Using persistent homology, specifically persistence diagrams, as inputs to machine learning techniques requires some mathematical creativity. The space of persistence diagrams does not have the desirable properties for machine learning, thus methods such as kernel methods and vectorization methods have been developed. One such featurization of persistence diagrams by Perea, Munch and Khasawneh uses continuous, compactly supported functions, referred to as "template functions," which results in a stable vector representation of the persistence diagram. In this paper, we provide a method of adaptively partitioning persistence diagrams to improve these featurizations based on localized information in the diagrams. Additionally, we provide a framework to adaptively select parameters required for the template functions in order to best utilize the partitioning method. We present results for application to example data sets comparing classification results between template function featurizations with and without partitioning, in addition to other methods from the literature. 
    more » « less
  4. Chambers, Erin W. ; Gudmundsson, Joachim (Ed.)
    We first introduce the notion of meta-rank for a 2-parameter persistence module, an invariant that captures the information behind images of morphisms between 1D slices of the module. We then define the meta-diagram of a 2-parameter persistence module to be the Möbius inversion of the meta-rank, resulting in a function that takes values from signed 1-parameter persistence modules. We show that the meta-rank and meta-diagram contain information equivalent to the rank invariant and the signed barcode. This equivalence leads to computational benefits, as we introduce an algorithm for computing the meta-rank and meta-diagram of a 2-parameter module M indexed by a bifiltration of n simplices in O(n³) time. This implies an improvement upon the existing algorithm for computing the signed barcode, which has O(n⁴) time complexity. This also allows us to improve the existing upper bound on the number of rectangles in the rank decomposition of M from O(n⁴) to O(n³). In addition, we define notions of erosion distance between meta-ranks and between meta-diagrams, and show that under these distances, meta-ranks and meta-diagrams are stable with respect to the interleaving distance. Lastly, the meta-diagram can be visualized in an intuitive fashion as a persistence diagram of diagrams, which generalizes the well-understood persistence diagram in the 1-parameter setting. 
    more » « less
  5. We first introduce the notion of meta-rank for a 2-parameter persistence module, an invariant that captures the information behind images of morphisms between 1D slices of the module. We then define the meta-diagram of a 2-parameter persistence module to be the Möbius inversion of the meta-rank, resulting in a function that takes values from signed 1-parameter persistence modules. We show that the meta-rank and meta-diagram contain information equivalent to the rank invariant and the signed barcode. This equivalence leads to computational benefits, as we introduce an algorithm for computing the meta-rank and meta-diagram of a 2-parameter module M indexed by a bifiltration of n simplices in O(n^3) time. This implies an improvement upon the existing algorithm for computing the signed barcode, which has O(n^4) time complexity. This also allows us to improve the existing upper bound on the number of rectangles in the rank decomposition of M from O(n^4) to O(n^3). In addition, we define notions of erosion distance between meta-ranks and between meta-diagrams, and show that under these distances, meta-ranks and meta-diagrams are stable with respect to the interleaving distance. Lastly, the meta-diagram can be visualized in an intuitive fashion as a persistence diagram of diagrams, which generalizes the well-understood persistent diagram in the 1-parameter setting. 
    more » « less