Topological Data Analysis is a growing area of data science, which aims at computing and characterizing the geometry and topology of data sets, in order to produce useful descriptors for subsequent statistical and machine learning tasks. Its main computational tool is persistent homology, which amounts to track the topological changes in growing families of subsets of the data set itself, called filtrations, and encode them in an algebraic object, called persistence module. Even though algorithms and theoretical properties of modules are now well-known in the single-parameter case, that is, when there is only one filtration to study, much less is known in the multi-parameter case, where several filtrations are given at once. Though more complicated, the resulting persistence modules are usually richer and encode more information, making them better descriptors for data science. In this article, we present the first approximation scheme, which is based on fibered barcodes and exact matchings, two constructions that stem from the theory of single-parameter persistence, for computing and decomposing general multi-parameter persistence modules. Our algorithm has controlled complexity and running time, and works in arbitrary dimension, i.e., with an arbitrary number of filtrations. Moreover, when restricting to specific classes of multi-parameter persistence modules, namely the ones that can be decomposed into intervals, we establish theoretical results about the approximation error between our estimate and the true module in terms of interleaving distance. Finally, we present empirical evidence validating output quality and speed-up on several data sets.
more »
« less
Multiparameter Persistence Image for Topological Machine Learning.
In the last decade, there has been increasing interest in topological data analysis, a new methodology for using geometric structures in data for inference and learning. A central theme in the area is the idea of persistence, which in its most basic form studies how measures of shape change as a scale parameter varies. There are now a number of frameworks that support statistics and machine learning in this context. However, in many applications there are several different parameters one might wish to vary: for example, scale and density. In contrast to the one-parameter setting, techniques for applying statistics and machine learning in the setting of multiparameter persistence are not well understood due to the lack of a concise representation of the results. We introduce a new descriptor for multiparameter persistence, which we call the Multiparameter Persistence Image, that is suitable for machine learning and statistical frameworks, is robust to perturbations in the data, has finer resolution than existing descriptors based on slicing, and can be efficiently computed on data sets of realistic size. Moreover, we demonstrate its efficacy by comparing its performance to other multiparameter descriptors on several classification tasks.
more »
« less
- Award ID(s):
- 1912194
- PAR ID:
- 10297072
- Editor(s):
- Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.F.; Lin, H.
- Date Published:
- Journal Name:
- Advances in neural information processing systems
- Volume:
- 33
- ISSN:
- 1049-5258
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Lawrence, Neil (Ed.)Topological data analysis (TDA) is gaining prominence across a wide spectrum of machine learning tasks that spans from manifold learning to graph classification. A pivotal technique within TDA is persistent homology (PH), which furnishes an exclusive topological imprint of data by tracing the evolution of latent structures as a scale parameter changes. Present PH tools are confined to analyzing data through a single filter parameter. However, many scenarios necessitate the consideration of multiple relevant parameters to attain finer insights into the data. We address this issue by introducing the Effective Multidimensional Persistence (EMP) framework. This framework empowers the exploration of data by simultaneously varying multiple scale parameters. The framework integrates descriptor functions into the analysis process, yielding a highly expressive data summary. It seamlessly integrates established single PH summaries into multidimensional counterparts like EMP Landscapes, Silhouettes, Images, and Surfaces. These summaries represent data’s multidimensional aspects as matrices and arrays, aligning effectively with diverse ML models. We provide theoretical guarantees and stability proofs for EMP summaries. We demonstrate EMP’s utility in graph classification tasks, showing its effectiveness. Results reveal EMP enhances various single PH descriptors, outperforming cutting-edge methods on multiple benchmark datasets.more » « less
-
Chatter detection has become a prominent subject of interest due to its effect on cutting tool life, surface finish and spindle of machine tool. Most of the existing methods in chatter detection literature are based on signal processing and signal decomposition. In this study, we use topological features of data simulating cutting tool vibrations, combined with four supervised machine learning algorithms to diagnose chatter in the milling process. Persistence diagrams, a method of representing topological features, are not easily used in the context of machine learning, so they must be transformed into a form that is more amenable. Specifically, we will focus on two different methods for featurizing persistence diagrams, Carlsson coordinates and template functions. In this paper, we provide classification results for simulated data from various cutting configurations, including upmilling and downmilling, in addition to the same data with some added noise. Our results show that Carlsson Coordinates and Template Functions yield accuracies as high as 96% and 95%, respectively. We also provide evidence that these topological methods are noise robust descriptors for chatter detection.more » « less
-
Reid, Nancy (Ed.)Manifold learning (ML), also known as nonlinear dimension reduction, is a set of methods to find the low-dimensional structure of data. Dimension reduction for large, high-dimensional data is not merely a way to reduce the data; the new representations and descriptors obtained by ML reveal the geometric shape of high-dimensional point clouds and allow one to visualize, denoise, and interpret them. This review presents the underlying principles of ML, its representative methods, and their statistical foundations, all from a practicing statistician's perspective. It describes the trade-offs and what theory tells us about the parameter and algorithmic choices we make in order to obtain reliable conclusions. Expected final online publication date for the Annual Review of Statistics and Its Application, Volume 11 is March 2024. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.more » « less
-
Recently a new feature representation framework based on a topological tool called persistent homology (and its persistence diagram summary) has gained much momentum. A series of methods have been developed to map a persistence diagram to a vector representation so as to facilitate the downstream use of machine learning tools. In these approaches, the importance (weight) of different persistence features are usually pre-set. However often in practice, the choice of the weight-functions hould depend on the nature of the specific data at hand. It is thus highly desirable to learn a best weight-function (and thus metric for persistence diagrams) from labelled data. We study this problem and develop a new weighted kernel, called WKPI, for persistence summaries, as well as an optimization framework to learn the weight (and thus kernel). We apply the learned kernel to the challenging task of graph classification, and show that our WKPI-based classification framework obtains similar or (sometimes significantly) better results than the best results from a range of previous graph classification frameworks on benchmark datasets.more » « less
An official website of the United States government

