skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.


Title: Efficient Approximation of Multiparameter Persistence Modules
Topological Data Analysis is a growing area of data science, which aims at computing and characterizing the geometry and topology of data sets, in order to produce useful descriptors for subsequent statistical and machine learning tasks. Its main computational tool is persistent homology, which amounts to track the topological changes in growing families of subsets of the data set itself, called filtrations, and encode them in an algebraic object, called persistence module. Even though algorithms and theoretical properties of modules are now well-known in the single-parameter case, that is, when there is only one filtration to study, much less is known in the multi-parameter case, where several filtrations are given at once. Though more complicated, the resulting persistence modules are usually richer and encode more information, making them better descriptors for data science. In this article, we present the first approximation scheme, which is based on fibered barcodes and exact matchings, two constructions that stem from the theory of single-parameter persistence, for computing and decomposing general multi-parameter persistence modules. Our algorithm has controlled complexity and running time, and works in arbitrary dimension, i.e., with an arbitrary number of filtrations. Moreover, when restricting to specific classes of multi-parameter persistence modules, namely the ones that can be decomposed into intervals, we establish theoretical results about the approximation error between our estimate and the true module in terms of interleaving distance. Finally, we present empirical evidence validating output quality and speed-up on several data sets.  more » « less
Award ID(s):
1912194
NSF-PAR ID:
10358770
Author(s) / Creator(s):
Date Published:
Journal Name:
ArXivorg
ISSN:
2331-8422
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    One-dimensional persistent homology is arguably the most important and heavily used computational tool in topological data analysis. Additional information can be extracted from datasets by studying multi-dimensional persistence modules and by utilizing cohomological ideas, e.g. the cohomological cup product. In this work, given a single parameter filtration, we investigate a certain 2-dimensional persistence module structure associated with persistent cohomology, where one parameter is the cup-length$$\ell \ge 0$$0and the other is the filtration parameter. This new persistence structure, called thepersistent cup module, is induced by the cohomological cup product and adapted to the persistence setting. Furthermore, we show that this persistence structure is stable. By fixing the cup-length parameter$$\ell $$, we obtain a 1-dimensional persistence module, called the persistent$$\ell $$-cup module, and again show it is stable in the interleaving distance sense, and study their associated generalized persistence diagrams. In addition, we consider a generalized notion of apersistent invariant, which extends both therank invariant(also referred to aspersistent Betti number), Puuska’s rank invariant induced by epi-mono-preserving invariants of abelian categories, and the recently-definedpersistent cup-length invariant, and we establish their stability. This generalized notion of persistent invariant also enables us to lift the Lyusternik-Schnirelmann (LS) category of topological spaces to a novel stable persistent invariant of filtrations, called thepersistent LS-category invariant.

     
    more » « less
  2. Characterizing the dynamics of time-evolving data within the framework of topological data analysis (TDA) has been attracting increasingly more attention. Popular instances of time-evolving data include flocking/swarming behaviors in animals and social networks in the human sphere. A natural mathematical model for such collective behaviors is a dynamic point cloud, or more generally a dynamic metric space (DMS). In this paper we extend the Rips filtration stability result for (static) metric spaces to the setting of DMSs. We do this by devising a certain three-parameter “spatiotemporal” filtration of a DMS. Applying the homology functor to this filtration gives rise to multidimensional persistence module derived from the DMS. We show that this multidimensional module enjoys stability under a suitable generalization of the Gromov–Hausdorff distance which permits metrization of the collection of all DMSs. On the other hand, it is recognized that, in general, comparing two multidimensional persistence modules leads to intractable computational problems. For the purpose of practical comparison of DMSs, we focus on both the rank invariant or the dimension function of the multidimensional persistence module that is derived from a DMS. We specifically propose to utilize a certain metric d for comparing these invariants: In our work this d is either (1) a certain generalization of the erosion distance by Patel, or (2) a specialized version of the well-known interleaving distance. In either case, the metric d can be computed in polynomial time. 
    more » « less
  3. Larochelle, H. ; Ranzato, M. ; Hadsell, R. ; Balcan, M.F. ; Lin, H. (Ed.)
    In the last decade, there has been increasing interest in topological data analysis, a new methodology for using geometric structures in data for inference and learning. A central theme in the area is the idea of persistence, which in its most basic form studies how measures of shape change as a scale parameter varies. There are now a number of frameworks that support statistics and machine learning in this context. However, in many applications there are several different parameters one might wish to vary: for example, scale and density. In contrast to the one-parameter setting, techniques for applying statistics and machine learning in the setting of multiparameter persistence are not well understood due to the lack of a concise representation of the results. We introduce a new descriptor for multiparameter persistence, which we call the Multiparameter Persistence Image, that is suitable for machine learning and statistical frameworks, is robust to perturbations in the data, has finer resolution than existing descriptors based on slicing, and can be efficiently computed on data sets of realistic size. Moreover, we demonstrate its efficacy by comparing its performance to other multiparameter descriptors on several classification tasks. 
    more » « less
  4. Abstract The $p$-tensor Ising model is a one-parameter discrete exponential family for modeling dependent binary data, where the sufficient statistic is a multi-linear form of degree $p \geqslant 2$. This is a natural generalization of the matrix Ising model that provides a convenient mathematical framework for capturing, not just pairwise, but higher-order dependencies in complex relational data. In this paper, we consider the problem of estimating the natural parameter of the $p$-tensor Ising model given a single sample from the distribution on $N$ nodes. Our estimate is based on the maximum pseudolikelihood (MPL) method, which provides a computationally efficient algorithm for estimating the parameter that avoids computing the intractable partition function. We derive general conditions under which the MPL estimate is $\sqrt N$-consistent, that is, it converges to the true parameter at rate $1/\sqrt N$. Our conditions are robust enough to handle a variety of commonly used tensor Ising models, including spin glass models with random interactions and models where the rate of estimation undergoes a phase transition. In particular, this includes results on $\sqrt N$-consistency of the MPL estimate in the well-known $p$-spin Sherrington–Kirkpatrick model, spin systems on general $p$-uniform hypergraphs and Ising models on the hypergraph stochastic block model (HSBM). In fact, for the HSBM we pin down the exact location of the phase transition threshold, which is determined by the positivity of a certain mean-field variational problem, such that above this threshold the MPL estimate is $\sqrt N$-consistent, whereas below the threshold no estimator is consistent. Finally, we derive the precise fluctuations of the MPL estimate in the special case of the $p$-tensor Curie–Weiss model, which is the Ising model on the complete $p$-uniform hypergraph. An interesting consequence of our results is that the MPL estimate in the Curie–Weiss model saturates the Cramer–Rao lower bound at all points above the estimation threshold, that is, the MPL estimate incurs no loss in asymptotic statistical efficiency in the estimability regime, even though it is obtained by minimizing only an approximation of the true likelihood function for computational tractability. 
    more » « less
  5. Persistent homology is used for computing topological features of a space at different spatial resolutions. It is one of the main tools from computational topology that is applied to the problems of data analysis. Despite several attempts to reduce its complexity, persistent homology remains expensive in both time and space. These limits are such that the largest data sets to which the method can be applied have the number of points of the order of thousands in R^3. This paper explores a technique intended to reduce the number of data points while preserving the salient topological features of the data. The proposed technique enables the computation of persistent homology on a reduced version of the original input data without affecting significant components of the output. Since the run time of persistent homology is exponential in the number of data points, the proposed data reduction method facilitates the computation in a fraction of the time required for the original data. Moreover, the data reduction method can be combined with any existing technique that simplifies the computation of persistent homology. The data reduction is performed by creating small groups of \emph{similar} data points, called nano-clusters, and then replacing the points within each nano-cluster with its cluster center. The persistence homology of the reduced data differs from that of the original data by an amount bounded by the radius of the nano-clusters. The theoretical analysis is backed by experimental results showing that persistent homology is preserved by the proposed data reduction technique. 
    more » « less