Persistent Homology is a computational method of data mining in the field of Topological Data Analysis. Large-scale data analysis with persistent homology is computationally expensive and memory intensive. The performance of persistent homology has been rigorously studied to optimize data encoding and intermediate data structures for high-performance computation. This paper provides an application-centric survey of the High-Performance Computation of Persistent Homology. Computational topology concepts are reviewed and detailed for a broad data science and engineering audience.
more »
« less
Minimal Cycle Representatives in Persistent Homology Using Linear Programming: An Empirical Study With User’s Guide
Cycle representatives of persistent homology classes can be used to provide descriptions of topological features in data. However, the non-uniqueness of these representatives creates ambiguity and can lead to many different interpretations of the same set of classes. One approach to solving this problem is to optimize the choice of representative against some measure that is meaningful in the context of the data. In this work, we provide a study of the effectiveness and computational cost of several ℓ 1 minimization optimization procedures for constructing homological cycle bases for persistent homology with rational coefficients in dimension one, including uniform-weighted and length-weighted edge-loss algorithms as well as uniform-weighted and area-weighted triangle-loss algorithms. We conduct these optimizations via standard linear programming methods, applying general-purpose solvers to optimize over column bases of simplicial boundary matrices. Our key findings are: 1) optimization is effective in reducing the size of cycle representatives, though the extent of the reduction varies according to the dimension and distribution of the underlying data, 2) the computational cost of optimizing a basis of cycle representatives exceeds the cost of computing such a basis, in most data sets we consider, 3) the choice of linear solvers matters a lot to the computation time of optimizing cycles, 4) the computation time of solving an integer program is not significantly longer than the computation time of solving a linear program for most of the cycle representatives, using the Gurobi linear solver, 5) strikingly, whether requiring integer solutions or not, we almost always obtain a solution with the same cost and almost all solutions found have entries in { ‐ 1,0,1 } and therefore, are also solutions to a restricted ℓ 0 optimization problem, and 6) we obtain qualitatively different results for generators in Erdős-Rényi random clique complexes than in real-world and synthetic point cloud data.
more »
« less
- PAR ID:
- 10336630
- Date Published:
- Journal Name:
- Frontiers in Artificial Intelligence
- Volume:
- 4
- ISSN:
- 2624-8212
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Kernel methods for solving partial differential equations work coordinate-free on the surface and yield high approximation rates for smooth solutions. Localized Lagrange bases have proven to alleviate the computational complexity of usual kernel methods for data fitting problems, but the efficient numerical solution of the ill-conditioned linear systems of equations arising from kernel- based Galerkin solutions to PDEs is a challenging problem which has not been addressed in the literature so far. In this article we apply the framework of the geometric multigrid method with a τ ≥ 2-cycle to scattered, quasi-uniform point clouds on the surface. We show that the resulting solver can be accelerated by using the Lagrange function decay and obtain satisfying convergence rates by a rigorous analysis. In particular, we show that the computational cost of the linear solver scales log-linear in the degrees of freedom.more » « less
-
null (Ed.)We consider the best subset selection problem in linear regression—that is, finding a parsimonious subset of the regression variables that provides the best fit to the data according to some predefined criterion. We are primarily concerned with alternatives to cross-validation methods that do not require data partitioning and involve a range of information criteria extensively studied in the statistical literature. We show that the problem of interest can be modeled using fractional mixed-integer optimization, which can be tackled by leveraging recent advances in modern optimization solvers. The proposed algorithms involve solving a sequence of mixed-integer quadratic optimization problems (or their convexifications) and can be implemented with off-the-shelf solvers. We report encouraging results in our computational experiments, with respect to both the optimization and statistical performance. Summary of Contribution: This paper considers feature selection problems with information criteria. We show that by adopting a fractional optimization perspective (a well-known field in nonlinear optimization and operations research), it is possible to leverage recent advances in mixed-integer quadratic optimization technology to tackle traditional statistical problems long considered intractable. We present extensive computational experiments, with both synthetic and real data, illustrating that the new fractional optimization approach is orders of magnitude faster than existing approaches in the literature.more » « less
-
We introduce harmonic persistent homology spaces for filtrations of finite simplicial complexes. As a result we can associate concrete subspaces of cycles to each bar of the barcode of the filtration. We prove stability of the harmonic persistent homology subspaces, as well as the subspaces associated to the bars of the barcodes, under small perturbations of functions defining them. We relate the notion of ``essential simplices'' introduced in an earlier work to identify simplices which play a significant role in the birth of a bar, with that of harmonic persistent homology. We prove that the harmonic representatives of simple bars maximizes the ``relative essential content'' amongst all representatives of the bar, where the relative essential content is the weight a particular cycle puts on the set of essential simplices. \footnote{An extended abstract of the paper appeared in the Proceedings of the IEEE Symposium on the Foundations of Computer Science, 2021.}more » « less
-
Topological Data Analysis (TDA) is a data mining technique to characterize the topological features of data. Persistent Homology (PH) is an important tool of TDA that has been applied to a wide range of applications. However its time and space complexities motivates a need for new methods to compute the PH of high-dimensional data. An important, and memory intensive, element in the computation of PH is the complex constructed from the input data. In general, PH tools use and focus on optimizing simplicial complexes; less frequently cubical complexes are also studied. This paper develops a method to construct polytopal complexes (or complexes constructed of any mix of convex polytopes) in any dimension Rn . In general, polytopal complexes are significantly smaller than simplicial or cubical complexes. This paper includes an experimental assessment of the impact that polytopal complexes have on memory complexity and output results of a PH computation.more » « less
An official website of the United States government

