skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Lower Bounding Minimal Faithful Sets of Verbose Persistence Diagrams
Recent developments in shape reconstruction and comparison call for the use of many different types of topological descriptors (persistence diagrams, Euler characteristic functions, etc.). We establish a framework that allows for quantitative comparisons of topological descriptor types and therefore may be used as a tool in more rigorously justifying choices made in applications. We then use this framework to partially order a set of six common topological descriptor types. In particular, the resulting poset gives insight into the advantages of using verbose rather than concise topological descriptors. We then provide lower bounds on the size of sets of descriptors that are complete discrete invariants of simplicial complexes, both tight and worst case. This work sets up a rigorous theory that allows for future comparisons and analysis of topological descriptor types.  more » « less
Award ID(s):
2046730 1854336
PAR ID:
10518579
Author(s) / Creator(s):
; ;
Publisher / Repository:
EuroCG
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Recent developments in shape reconstruction and comparison call for the use of many different (topological) descriptor types, such as persistence diagrams and Euler characteristic functions. We establish a framework to quantitatively compare the strength of different descriptor types, setting up a theory that allows for future comparisons and analysis of descriptor types and that can inform choices made in applications. We use this framework to partially order a set of six common descriptor types. We then give lower bounds on the size of sets of descriptors that uniquely correspond to simplicial complexes, giving insight into the advantages of using verbose rather than concise topological descriptors. 
    more » « less
  2. Descriptors are physically-inspired, symmetry-preserving schemes for representing atomistic systems that play a central role in the construction of models of potential energy surfaces. Although physical intuition can be flexibly encoded into descriptor schemes, they are generally ultimately guided only by the spatial or topological arrangement of atoms in the system. However, since interatomic potential models aim to capture the variation of the potential energy with respect to atomic configurations, it is conceivable that they would benefit from descriptor schemes that implicitly encode both structural and energetic information rather than structural information alone. Therefore, we propose a novel approach for the optimisation of descriptors based on encoding information about geodesic distances along potential energy manifolds into the hyperparameters of commonly used descriptor schemes. To accomplish this, we combine two ideas: (1) a differential-geometric approach for the fast estimation of approximate geodesic distances [Zhu et al., J. Chem. Phys. 150, 164103 (2019)]; and (2) an information-theoretic evaluation metric – information imbalance – for measuring the shared information between two distance measures [Glielmo et al. PNAS Nexus, 1, 1 (2022)]. Using three example molecules – ethanol, malonaldehyde, and aspirin – from the MD22 dataset, we first show that Euclidean (in Cartesian coordinates) and geodesic distances are inequivalent distance measures, indicating the need for updated ground-truth distance measures that go beyond the Euclidean (or, more broadly, spatial) distance. We then utilize a Bayesian optimisation framework to show that descriptors (in this case, atom-centred symmetry functions) can be optimized to maximally express a certain type of distance information, such as Euclidean or geodesic information. We also show that modifying the Bayesian optimisation algorithm to minimise a combined objective function – the sum of the descriptor↔Euclidean and descriptor↔geodesic information imbalances – can yield descriptors that not only optimally express both Euclidean and geodesic distance information simultaneously, but in fact resolve substantial disagreements between descriptors optimized to encode only one type of distance measure. We discuss the relevance of our approach to the design of more physically rich and informative descriptors that can encode useful, alternative information about molecular systems. 
    more » « less
  3. Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.F.; Lin, H. (Ed.)
    In the last decade, there has been increasing interest in topological data analysis, a new methodology for using geometric structures in data for inference and learning. A central theme in the area is the idea of persistence, which in its most basic form studies how measures of shape change as a scale parameter varies. There are now a number of frameworks that support statistics and machine learning in this context. However, in many applications there are several different parameters one might wish to vary: for example, scale and density. In contrast to the one-parameter setting, techniques for applying statistics and machine learning in the setting of multiparameter persistence are not well understood due to the lack of a concise representation of the results. We introduce a new descriptor for multiparameter persistence, which we call the Multiparameter Persistence Image, that is suitable for machine learning and statistical frameworks, is robust to perturbations in the data, has finer resolution than existing descriptors based on slicing, and can be efficiently computed on data sets of realistic size. Moreover, we demonstrate its efficacy by comparing its performance to other multiparameter descriptors on several classification tasks. 
    more » « less
  4. Lawrence, Neil (Ed.)
    Topological data analysis (TDA) is gaining prominence across a wide spectrum of machine learning tasks that spans from manifold learning to graph classification. A pivotal technique within TDA is persistent homology (PH), which furnishes an exclusive topological imprint of data by tracing the evolution of latent structures as a scale parameter changes. Present PH tools are confined to analyzing data through a single filter parameter. However, many scenarios necessitate the consideration of multiple relevant parameters to attain finer insights into the data. We address this issue by introducing the Effective Multidimensional Persistence (EMP) framework. This framework empowers the exploration of data by simultaneously varying multiple scale parameters. The framework integrates descriptor functions into the analysis process, yielding a highly expressive data summary. It seamlessly integrates established single PH summaries into multidimensional counterparts like EMP Landscapes, Silhouettes, Images, and Surfaces. These summaries represent data’s multidimensional aspects as matrices and arrays, aligning effectively with diverse ML models. We provide theoretical guarantees and stability proofs for EMP summaries. We demonstrate EMP’s utility in graph classification tasks, showing its effectiveness. Results reveal EMP enhances various single PH descriptors, outperforming cutting-edge methods on multiple benchmark datasets. 
    more » « less
  5. Chen, Tsu-Wei; Long, Stephen P (Ed.)
    Abstract Shape plays a fundamental role in biology. Traditional phenotypic analysis methods measure some features but fail to measure the information embedded in shape comprehensively. To extract, compare and analyse this information embedded in a robust and concise way, we turn to topological data analysis (TDA), specifically the Euler characteristic transform. TDA measures shape comprehensively using mathematical representations based on algebraic topology features. To study its use, we compute both traditional and topological shape descriptors to quantify the morphology of 3121 barley seeds scanned with X-ray computed tomography (CT) technology at 127 μm resolution. The Euler characteristic transform measures shape by analysing topological features of an object at thresholds across a number of directional axes. A Kruskal–Wallis analysis of the information encoded by the topological signature reveals that the Euler characteristic transform picks up successfully the shape of the crease and bottom of the seeds. Moreover, while traditional shape descriptors can cluster the seeds based on their accession, topological shape descriptors can cluster them further based on their panicle. We then successfully train a support vector machine to classify 28 different accessions of barley based exclusively on the shape of their grains. We observe that combining both traditional and topological descriptors classifies barley seeds better than using just traditional descriptors alone. This improvement suggests that TDA is thus a powerful complement to traditional morphometrics to comprehensively describe a multitude of ‘hidden’ shape nuances which are otherwise not detected. 
    more » « less