skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning Hyperbolic Representations of Topological Features
Learning task-specific representations of persistence diagrams is an important problem in topological data analysis and machine learning. However, current state of the art methods are restricted in terms of their expressivity as they are focused on Euclidean representations. Persistence diagrams often contain features of infinite persistence (i.e., essential features) and Euclidean spaces shrink their importance relative to non-essential features because they cannot assign infinite distance to finite points. To deal with this issue, we propose a method to learn representations of persistence diagrams on hyperbolic spaces, more specifically on the Poincare ball. By representing features of infinite persistence infinitesimally close to the boundary of the ball, their distance to non-essential features approaches infinity, thereby their relative importance is preserved. This is achieved without utilizing extremely high values for the learnable parameters, thus the representation can be fed into downstream optimization methods and trained efficiently in an end-to-end fashion. We present experimental results on graph and image classification tasks and show that the performance of our method is on par with or exceeds the performance of other state of the art methods.  more » « less
Award ID(s):
1932620
PAR ID:
10302276
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of the International Conference on Learning Representations (ICLR)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Persistence diagrams have been widely used to quantify the underlying features of filtered topological spaces in data visualization. In many applications, computing distances between diagrams is essential; however, computing these distances has been challenging due to the computational cost. In this paper, we propose a persistence diagram hashing framework that learns a binary code representation of persistence diagrams, which allows for fast computation of distances. This framework is built upon a generative adversarial network (GAN) with a diagram distance loss function to steer the learning process. Instead of using standard representations, we hash diagrams into binary codes, which have natural advantages in large-scale tasks. The training of this model is domain-oblivious in that it can be computed purely from synthetic, randomly created diagrams. As a consequence, our proposed method is directly applicable to various datasets without the need for retraining the model. These binary codes, when compared using fast Hamming distance, better maintain topological similarity properties between datasets than other vectorized representations. To evaluate this method, we apply our framework to the problem of diagram clustering and we compare the quality and performance of our approach to the state-of-the-art. In addition, we show the scalability of our approach on a dataset with 10k persistence diagrams, which is not possible with current techniques. Moreover, our experimental results demonstrate that our method is significantly faster with the potential of less memory usage, while retaining comparable or better quality comparisons. 
    more » « less
  2. Instructional materials in organic chemistry include a wide variety of representations, such as chemical formulas, line-angle diagrams, ball-and-stick diagrams, and electrostatic potential maps (EPMs). Students tend to focus on explicit features of a representation while they are reasoning about chemical concepts. This study examined the affordances of electrostatic potential maps in students’ approaches when the maps were integrated into four foundational organic chemistry problems using an experimental design approach. First-semester organic chemistry students were surveyed from two different institutions, where they made predictions and explained their reasoning behind identifying an electrophilic site, predicting the product, selecting the faster reaction, and classifying a mechanism. Students were randomly assigned to one of four surveys that differed by the representation they were given for the prompts: chemical formula, line-angle diagram, ball-and-stick diagram, and EPM. Responses from students with EPMs were analyzed and compared to responses from students with the non-EPM representations. Results indicated that students with EPMs had higher performance depending on the problem. They were also more likely to cite electronic features such as electron density, nucleophilicity, etc., and were more likely to use causal reasoning in their explanations. This study offers evidence in support of affordances of EPMs in promoting students’ use of electronic features and causal reasoning. This evidence adds to the chemistry education literature by offering a potential means for promoting students’ use of electronic features and causal reasoning by incorporating EPMs into assessment items. Implications for instruction include using EPMs in both instruction and assessment as a tool to help students build skills around invoking electrostatics and causal reasoning to solve problems in organic chemistry. 
    more » « less
  3. Defining the similarity between chemical entities is an essential task in polymer informatics, enabling ranking, clustering, and classification. Despite its importance, the pairwise chemical similarity of polymers remains an open problem. Here, a similarity function for polymers with well-defined backbones is designed based on polymers’ stochastic graph representations generated from canonical BigSMILES, a structurally based line notation for describing macromolecules. The stochastic graph representations are separated into three parts: repeat units, end groups, and polymer topology. The earth mover’s distance is utilized to calculate the similarity of the repeat units and end groups, while the graph edit distance is used to calculate the similarity of the topology. These three values can be linearly or nonlinearly combined to yield an overall pairwise chemical similarity score for polymers that is largely consistent with the chemical intuition of expert users and is adjustable based on the relative importance of different chemical features for a given similarity problem. This method gives a reliable solution to quantitatively calculate the pairwise chemical similarity score for polymers and represents a vital step toward building search engines and quantitative design tools for polymer data. 
    more » « less
  4. Goaoc, Xavier and (Ed.)
    The space of persistence diagrams under bottleneck distance is known to have infinite doubling dimension. Because many metric search algorithms and data structures have bounds that depend on the dimension of the search space, the high-dimensionality makes it difficult to analyze and compare asymptotic running times of metric search algorithms on this space. We introduce the notion of nearly-doubling metrics, those that are Gromov-Hausdorff close to metric spaces of bounded doubling dimension and prove that bounded $$k$$-point persistence diagrams are nearly-doubling. This allows us to prove that in some ways, persistence diagrams can be expected to behave like a doubling metric space. We prove our results in great generality, studying a large class of quotient metrics (of which the persistence plane is just one example). We also prove bounds on the dimension of the $$k$$-point bottleneck space over such metrics. The notion of being nearly-doubling in this Gromov-Hausdorff sense is likely of more general interest. Some algorithms that have a dependence on the dimension can be analyzed in terms of the dimension of the nearby metric rather than that of the metric itself. We give a specific example of this phenomenon by analyzing an algorithm to compute metric nets, a useful operation on persistence diagrams. 
    more » « less
  5. Abstract Topological data analysis (TDA) methods can be useful for classification and clustering tasks in many different fields as they can provide two dimensional persistence diagrams that summarize important information about the shape of potentially complex and high dimensional data sets. The space of persistence diagrams can be endowed with various metrics, which admit a statistical structure and allow to use these summaries for machine learning algorithms, e.g. the Wasserstein distance. However, computing the distance between two persistence diagrams involves finding an optimal way to match the points of the two diagrams and may not always be an easy task for classical computers. Recently, quantum algorithms have shown the potential to speedup the process of obtaining the persistence information displayed on persistence diagrams by estimating the spectra of persistent Dirac operators. So, in this work we explore the potential of quantum computers to estimate the distance between persistence diagrams as the next step in the design of a fully quantum framework for TDA. In particular we propose variational quantum algorithms for the Wasserstein distance as well as the d p c distance. Our implementation is a weighted version of the quantum approximate optimization Algorithm that relies on control clauses to encode the constraints of the optimization problem. 
    more » « less