Onedimensional persistent homology is arguably the most important and heavily used computational tool in topological data analysis. Additional information can be extracted from datasets by studying multidimensional persistence modules and by utilizing cohomological ideas, e.g. the cohomological cup product. In this work, given a single parameter filtration, we investigate a certain 2dimensional persistence module structure associated with persistent cohomology, where one parameter is the cuplength
Efficient Approximation of Multiparameter Persistence Modules
Topological Data Analysis is a growing area of data science, which aims at computing and characterizing the geometry and topology of data sets, in order to produce useful descriptors for subsequent statistical and machine learning tasks. Its main computational tool is persistent homology, which amounts to track the topological changes in growing families of subsets of the data set itself, called filtrations, and encode them in an algebraic object, called persistence module. Even though algorithms and theoretical properties of modules are now wellknown in the singleparameter case, that is, when there is only one filtration to study, much less is known in the multiparameter case, where several filtrations are given at once. Though more complicated, the resulting persistence modules are usually richer and encode more information, making them better descriptors for data science. In this article, we present the first approximation scheme, which is based on fibered barcodes and exact matchings, two constructions that stem from the theory of singleparameter persistence, for computing and decomposing general multiparameter persistence modules. Our algorithm has controlled complexity and running time, and works in arbitrary dimension, i.e., with an arbitrary number of filtrations. Moreover, when restricting to specific classes of multiparameter persistence modules, namely the ones that can be decomposed into intervals, we establish theoretical results about the approximation error between our estimate and the true module in terms of interleaving distance. Finally, we present empirical evidence validating output quality and speedup on several data sets.
more »
« less
 Award ID(s):
 1912194
 NSFPAR ID:
 10358770
 Date Published:
 Journal Name:
 ArXivorg
 ISSN:
 23318422
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Abstract and the other is the filtration parameter. This new persistence structure, called the$$\ell \ge 0$$ $\ell \ge 0$persistent cup module , is induced by the cohomological cup product and adapted to the persistence setting. Furthermore, we show that this persistence structure is stable. By fixing the cuplength parameter , we obtain a 1dimensional persistence module, called the persistent$$\ell $$ $\ell $ cup module, and again show it is stable in the interleaving distance sense, and study their associated generalized persistence diagrams. In addition, we consider a generalized notion of a$$\ell $$ $\ell $persistent invariant , which extends both therank invariant (also referred to aspersistent Betti number ), Puuska’s rank invariant induced by epimonopreserving invariants of abelian categories, and the recentlydefinedpersistent cuplength invariant , and we establish their stability. This generalized notion of persistent invariant also enables us to lift the LyusternikSchnirelmann (LS) category of topological spaces to a novel stable persistent invariant of filtrations, called thepersistent LScategory invariant . 
Characterizing the dynamics of timeevolving data within the framework of topological data analysis (TDA) has been attracting increasingly more attention. Popular instances of timeevolving data include flocking/swarming behaviors in animals and social networks in the human sphere. A natural mathematical model for such collective behaviors is a dynamic point cloud, or more generally a dynamic metric space (DMS). In this paper we extend the Rips filtration stability result for (static) metric spaces to the setting of DMSs. We do this by devising a certain threeparameter “spatiotemporal” filtration of a DMS. Applying the homology functor to this filtration gives rise to multidimensional persistence module derived from the DMS. We show that this multidimensional module enjoys stability under a suitable generalization of the Gromov–Hausdorff distance which permits metrization of the collection of all DMSs. On the other hand, it is recognized that, in general, comparing two multidimensional persistence modules leads to intractable computational problems. For the purpose of practical comparison of DMSs, we focus on both the rank invariant or the dimension function of the multidimensional persistence module that is derived from a DMS. We specifically propose to utilize a certain metric d for comparing these invariants: In our work this d is either (1) a certain generalization of the erosion distance by Patel, or (2) a specialized version of the wellknown interleaving distance. In either case, the metric d can be computed in polynomial time.more » « less

Larochelle, H. ; Ranzato, M. ; Hadsell, R. ; Balcan, M.F. ; Lin, H. (Ed.)In the last decade, there has been increasing interest in topological data analysis, a new methodology for using geometric structures in data for inference and learning. A central theme in the area is the idea of persistence, which in its most basic form studies how measures of shape change as a scale parameter varies. There are now a number of frameworks that support statistics and machine learning in this context. However, in many applications there are several different parameters one might wish to vary: for example, scale and density. In contrast to the oneparameter setting, techniques for applying statistics and machine learning in the setting of multiparameter persistence are not well understood due to the lack of a concise representation of the results. We introduce a new descriptor for multiparameter persistence, which we call the Multiparameter Persistence Image, that is suitable for machine learning and statistical frameworks, is robust to perturbations in the data, has finer resolution than existing descriptors based on slicing, and can be efficiently computed on data sets of realistic size. Moreover, we demonstrate its efficacy by comparing its performance to other multiparameter descriptors on several classification tasks.more » « less

Abstract The $p$tensor Ising model is a oneparameter discrete exponential family for modeling dependent binary data, where the sufficient statistic is a multilinear form of degree $p \geqslant 2$. This is a natural generalization of the matrix Ising model that provides a convenient mathematical framework for capturing, not just pairwise, but higherorder dependencies in complex relational data. In this paper, we consider the problem of estimating the natural parameter of the $p$tensor Ising model given a single sample from the distribution on $N$ nodes. Our estimate is based on the maximum pseudolikelihood (MPL) method, which provides a computationally efficient algorithm for estimating the parameter that avoids computing the intractable partition function. We derive general conditions under which the MPL estimate is $\sqrt N$consistent, that is, it converges to the true parameter at rate $1/\sqrt N$. Our conditions are robust enough to handle a variety of commonly used tensor Ising models, including spin glass models with random interactions and models where the rate of estimation undergoes a phase transition. In particular, this includes results on $\sqrt N$consistency of the MPL estimate in the wellknown $p$spin Sherrington–Kirkpatrick model, spin systems on general $p$uniform hypergraphs and Ising models on the hypergraph stochastic block model (HSBM). In fact, for the HSBM we pin down the exact location of the phase transition threshold, which is determined by the positivity of a certain meanfield variational problem, such that above this threshold the MPL estimate is $\sqrt N$consistent, whereas below the threshold no estimator is consistent. Finally, we derive the precise fluctuations of the MPL estimate in the special case of the $p$tensor Curie–Weiss model, which is the Ising model on the complete $p$uniform hypergraph. An interesting consequence of our results is that the MPL estimate in the Curie–Weiss model saturates the Cramer–Rao lower bound at all points above the estimation threshold, that is, the MPL estimate incurs no loss in asymptotic statistical efficiency in the estimability regime, even though it is obtained by minimizing only an approximation of the true likelihood function for computational tractability.more » « less

Persistent homology is used for computing topological features of a space at different spatial resolutions. It is one of the main tools from computational topology that is applied to the problems of data analysis. Despite several attempts to reduce its complexity, persistent homology remains expensive in both time and space. These limits are such that the largest data sets to which the method can be applied have the number of points of the order of thousands in R^3. This paper explores a technique intended to reduce the number of data points while preserving the salient topological features of the data. The proposed technique enables the computation of persistent homology on a reduced version of the original input data without affecting significant components of the output. Since the run time of persistent homology is exponential in the number of data points, the proposed data reduction method facilitates the computation in a fraction of the time required for the original data. Moreover, the data reduction method can be combined with any existing technique that simplifies the computation of persistent homology. The data reduction is performed by creating small groups of \emph{similar} data points, called nanoclusters, and then replacing the points within each nanocluster with its cluster center. The persistence homology of the reduced data differs from that of the original data by an amount bounded by the radius of the nanoclusters. The theoretical analysis is backed by experimental results showing that persistent homology is preserved by the proposed data reduction technique.more » « less