skip to main content


Title: Comparing Distance Metrics on Vectorized Persistence Summaries
The persistence diagram (PD) is an important tool in topological data analysis for encoding an abstract representation of the homology of a shape at different scales. Different vectorizations of PD summary are commonly used in machine learning applications, however distances between vectorized persistence summaries may differ greatly from the distances between the original PDs. Surprisingly, no research has been carried out in this area before. In this work we compare distances between PDs and between different commonly used vectorizations. Our results give new insights into comparing vectorized persistence summaries and can be used to design better feature-based learning models based on PDs  more » « less
Award ID(s):
1664858
NSF-PAR ID:
10310977
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Topological Data Analysis and Beyond Workshop at the 34th Conference on Neural Information Processing Systems (NeurIPS 2020)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In recent years, Wyoming has developed Computer Science (CS) standards for adoption and use within K-12 classrooms. These standards, adopted in January of 2022, go into effect for the 2022-2023 school year. The University of Wyoming has offered two different computer science week-long professional developments for teachers. Many K-12 teachers do not have a CS background, so developing CS lessons plans can be a challenge in these PDs.This research study is centered around three central questions: 1) To what extent did K-12 teachers integrate computing topics into their PD created lesson plans; 2) How do the teacher perceptions from the two CS PDs compare to each other; and 3) How was the CS PD translated to classroom activity? The first PD opportunity (n=14), was designed to give hands-on learning with CS topics focused on cybersecurity. The second PD opportunity (n=28), focused on integrating CS into existing curricula. At the end of each of these PDs, teacher K-12 teachers incorporated CS topics into their selected existing lesson plan(s). Additionally, a support network was implemented to support excellence in CS education throughout the state. This research study team evaluated the lesson plans developed during each PD event, by using a rubric on each lesson plan. Researchers collected exit surveys from the teachers. Implementation metrics were also gathered, including, how long each lesson lasted, how many students were involved in the implementation, what grades the student belonged to, the basic demographics of the students, the type of course the lesson plan was housed in, if the K-12 teacher reached their intended purpose, what evidence the K-12 teacher had of the success of their lesson plan, data summaries based on supplied evidence, how the K-12 teachers would change the lesson, the challenges and successes they experienced, and samples of student work. Quantitative analysis was basic descriptive statistics. Findings, based on evaluation of 40+ lessons, taught to over 1500 K-12 students, indicate that when assessed on a three point rubric of struggling, emerging, or excellent - certain components (e.g., organization, objectives, integration, activities & assessment, questions, and catch) of K-12 teacher created lessons plans varied drastically. In particular, lesson plan organization, integration, and questions each had a significant number of submissions which were evaluated as "struggling" [45%, 46%, 41%] through interesting integration, objectives, activities & assessment, and catch all saw submissions which were evaluated as "excellent" [43%, 48%, 43%, 48%]. The relationship between existing K-12 policies and expectations surfaces within these results and in combination with other findings leads to implications for the translation of current research practices into pre-collegiate PDs. 
    more » « less
  2. Persistence diagrams have been widely used to quantify the underlying features of filtered topological spaces in data visualization. In many applications, computing distances between diagrams is essential; however, computing these distances has been challenging due to the computational cost. In this paper, we propose a persistence diagram hashing framework that learns a binary code representation of persistence diagrams, which allows for fast computation of distances. This framework is built upon a generative adversarial network (GAN) with a diagram distance loss function to steer the learning process. Instead of using standard representations, we hash diagrams into binary codes, which have natural advantages in large-scale tasks. The training of this model is domain-oblivious in that it can be computed purely from synthetic, randomly created diagrams. As a consequence, our proposed method is directly applicable to various datasets without the need for retraining the model. These binary codes, when compared using fast Hamming distance, better maintain topological similarity properties between datasets than other vectorized representations. To evaluate this method, we apply our framework to the problem of diagram clustering and we compare the quality and performance of our approach to the state-of-the-art. In addition, we show the scalability of our approach on a dataset with 10k persistence diagrams, which is not possible with current techniques. Moreover, our experimental results demonstrate that our method is significantly faster with the potential of less memory usage, while retaining comparable or better quality comparisons. 
    more » « less
  3. Many data sets can be viewed as a noisy sampling of an underlying space, and tools from topological data analysis can characterize this structure for the purpose of knowledge discovery. One such tool is persistent homology, which provides a multiscale description of the homological features within a data set. A useful representation of this homological information is a persistence diagram (PD). Efforts have been made to map PDs into spaces with additional structure valuable to machine learning tasks. We convert a PD to a finite dimensional vector representation which we call a persistence image (PI), and prove the stability of this transformation with respect to small perturbations in the inputs. The discriminatory power of PIs is compared against existing methods, showing significant performance gains. We explore the use of PIs with vector-based machine learning tools, such as linear sparse support vector machines, which identify features containing discriminating topological information. Finally, high accuracy inference of parameter values from the dynamic output of a discrete dynamical system (the linked twist map) and a partial differential equation (the anisotropic Kuramoto-Sivashinsky equation) provide a novel application of the discriminatory power of PIs. 
    more » « less
  4. The Maker Partnership Program (MPP) is an NSF-supported project that addresses the critical need for models of professional development (PD) and support that help elementary-level science teachers integrate computer science and computational thinking (CS and CT) into their classroom practices. The MPP aims to foster integration of these disciplines through maker pedagogy and curriculum. The MPP was designed as a research-practice partnership that allows researchers and practitioners to collaborate and iteratively design, implement and test the PD and curriculum. This paper describes the key elements of the MPP and early findings from surveys of teachers and students participating in the program. Our research focuses on learning how to develop teachers’ capacity to integrate CS and CT into elementary-level science instruction; understanding whether and how this integrated instruction promotes deeper student learning of science, CS and CT, as well as interest and engagement in these subjects; and exploring how the model may need to be adapted to fit local contexts. Participating teachers reported gaining knowledge and confidence for implementing the maker curriculum through the PDs. They anticipated that the greatest implementation challenges would be lack of preparation time, inaccessible computer hardware, lack of administrative support, and a lack of CS knowledge. Student survey results show that most participants were interested in CS and science at the beginning of the program. Student responses to questions about their disposition toward collaboration and persistence suggest some room for growth. Student responses to questions about who does CS are consistent with prevalent gender stereotypes (e.g., boys are naturally better than girls at computer programming), particularly among boys. 
    more » « less
  5. null (Ed.)
    Through the use of examples, we explain one way in which applied topology has evolved since the birth of persistent homology in the early 2000s. The first applications of topology to data emphasized the global shape of a dataset, such as the three-circle model for 3 Ă— 3 pixel patches from natural images, or the configuration space of the cyclo-octane molecule, which is a sphere with a Klein bottle attached via two circles of singularity. In these studies of global shape, short persistent homology bars are disregarded as sampling noise. More recently, however, persistent homology has been used to address questions about the local geometry of data. For instance, how can local geometry be vectorized for use in machine learning problems? Persistent homology and its vectorization methods, including persistence landscapes and persistence images, provide popular techniques for incorporating both local geometry and global topology into machine learning. Our meta-hypothesis is that the short bars are as important as the long bars for many machine learning tasks. In defense of this claim, we survey applications of persistent homology to shape recognition, agent-based modeling, materials science, archaeology, and biology. Additionally, we survey work connecting persistent homology to geometric features of spaces, including curvature and fractal dimension, and various methods that have been used to incorporate persistent homology into machine learning. 
    more » « less