skip to main content

Title: Comparing Distance Metrics on Vectorized Persistence Summaries
The persistence diagram (PD) is an important tool in topological data analysis for encoding an abstract representation of the homology of a shape at different scales. Different vectorizations of PD summary are commonly used in machine learning applications, however distances between vectorized persistence summaries may differ greatly from the distances between the original PDs. Surprisingly, no research has been carried out in this area before. In this work we compare distances between PDs and between different commonly used vectorizations. Our results give new insights into comparing vectorized persistence summaries and can be used to design better feature-based learning models based on PDs  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Topological Data Analysis and Beyond Workshop at the 34th Conference on Neural Information Processing Systems (NeurIPS 2020)
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Persistence diagrams have been widely used to quantify the underlying features of filtered topological spaces in data visualization. In many applications, computing distances between diagrams is essential; however, computing these distances has been challenging due to the computational cost. In this paper, we propose a persistence diagram hashing framework that learns a binary code representation of persistence diagrams, which allows for fast computation of distances. This framework is built upon a generative adversarial network (GAN) with a diagram distance loss function to steer the learning process. Instead of using standard representations, we hash diagrams into binary codes, which have natural advantages in large-scale tasks. The training of this model is domain-oblivious in that it can be computed purely from synthetic, randomly created diagrams. As a consequence, our proposed method is directly applicable to various datasets without the need for retraining the model. These binary codes, when compared using fast Hamming distance, better maintain topological similarity properties between datasets than other vectorized representations. To evaluate this method, we apply our framework to the problem of diagram clustering and we compare the quality and performance of our approach to the state-of-the-art. In addition, we show the scalability of our approach on a dataset with 10k persistence diagrams, which is not possible with current techniques. Moreover, our experimental results demonstrate that our method is significantly faster with the potential of less memory usage, while retaining comparable or better quality comparisons. 
    more » « less
  2. Many data sets can be viewed as a noisy sampling of an underlying space, and tools from topological data analysis can characterize this structure for the purpose of knowledge discovery. One such tool is persistent homology, which provides a multiscale description of the homological features within a data set. A useful representation of this homological information is a persistence diagram (PD). Efforts have been made to map PDs into spaces with additional structure valuable to machine learning tasks. We convert a PD to a finite dimensional vector representation which we call a persistence image (PI), and prove the stability of this transformation with respect to small perturbations in the inputs. The discriminatory power of PIs is compared against existing methods, showing significant performance gains. We explore the use of PIs with vector-based machine learning tools, such as linear sparse support vector machines, which identify features containing discriminating topological information. Finally, high accuracy inference of parameter values from the dynamic output of a discrete dynamical system (the linked twist map) and a partial differential equation (the anisotropic Kuramoto-Sivashinsky equation) provide a novel application of the discriminatory power of PIs. 
    more » « less
  3. In recent years, Wyoming has developed Computer Science (CS) standards for adoption and use within K-12 classrooms. These standards, adopted in January of 2022, go into effect for the 2022-2023 school year. The University of Wyoming has offered two different computer science week-long professional developments for teachers. Many K-12 teachers do not have a CS background, so developing CS lessons plans can be a challenge in these PDs.This research study is centered around three central questions: 1) To what extent did K-12 teachers integrate computing topics into their PD created lesson plans; 2) How do the teacher perceptions from the two CS PDs compare to each other; and 3) How was the CS PD translated to classroom activity? The first PD opportunity (n=14), was designed to give hands-on learning with CS topics focused on cybersecurity. The second PD opportunity (n=28), focused on integrating CS into existing curricula. At the end of each of these PDs, teacher K-12 teachers incorporated CS topics into their selected existing lesson plan(s). Additionally, a support network was implemented to support excellence in CS education throughout the state. This research study team evaluated the lesson plans developed during each PD event, by using a rubric on each lesson plan. Researchers collected exit surveys from the teachers. Implementation metrics were also gathered, including, how long each lesson lasted, how many students were involved in the implementation, what grades the student belonged to, the basic demographics of the students, the type of course the lesson plan was housed in, if the K-12 teacher reached their intended purpose, what evidence the K-12 teacher had of the success of their lesson plan, data summaries based on supplied evidence, how the K-12 teachers would change the lesson, the challenges and successes they experienced, and samples of student work. Quantitative analysis was basic descriptive statistics. Findings, based on evaluation of 40+ lessons, taught to over 1500 K-12 students, indicate that when assessed on a three point rubric of struggling, emerging, or excellent - certain components (e.g., organization, objectives, integration, activities & assessment, questions, and catch) of K-12 teacher created lessons plans varied drastically. In particular, lesson plan organization, integration, and questions each had a significant number of submissions which were evaluated as "struggling" [45%, 46%, 41%] through interesting integration, objectives, activities & assessment, and catch all saw submissions which were evaluated as "excellent" [43%, 48%, 43%, 48%]. The relationship between existing K-12 policies and expectations surfaces within these results and in combination with other findings leads to implications for the translation of current research practices into pre-collegiate PDs. 
    more » « less
  4. Aims. With the accumulation of polarization data in the gamma-ray burst (GRB) prompt phase, polarization models can be tested. Methods. We predicted the time-integrated polarizations of 37 GRBs with polarization observation. We used their observed spectral parameters to do this. In the model, the emission mechanism is synchrotron radiation, and the magnetic field configuration in the emission region was assumed to be large-scale ordered. Therefore, the predicted polarization degrees (PDs) are upper limits. Results. For most GRBs detected by the Gamma-ray Burst Polarimeter (GAP), POLAR, and AstroSat, the predicted PD can match the corresponding observed PD. Hence the synchrotron-emission model in a large-scale ordered magnetic field can interpret both the moderately low PDs (∼10%) detected by POLAR and relatively high PDs (∼45%) observed by GAP and AstroSat well. Therefore, the magnetic fields in these GRB prompt phases or at least during the peak times are dominated by the ordered component. However, the predicted PDs of GRB 110721A observed by GAP and GRB 180427A observed by AstroSat are both lower than the observed values. Because the synchrotron emission in an ordered magnetic field predicts the upper-limit of the PD for the synchrotron-emission models, PD observations of the two bursts challenge the synchrotron-emission model. Then we predict the PDs of the High-energy Polarimetry Detector (HPD) and Low-energy Polarimetry Detector (LPD) on board the upcoming POLAR-2. In the synchrotron-emission models, the concentrated PD values of the GRBs detected by HPD will be higher than the LPD, which might be different from the predictions of the dissipative photosphere model. Therefore, more accurate multiband polarization observations are highly desired to test models of the GRB prompt phase. 
    more » « less
  5. null (Ed.)
    The impacts of COVID-19 have led to a rapid pivot in the delivery of professional development (PD) for new teachers to [PROGRAM]. [PROGRAM] previously provided a week-long, in-person, intensive PD in the summer for teachers but PD was shifted online to a mixture of synchronous and asynchronous sessions during the summer of 2020. The goal of this work in progress is to present how the [PROGRAM] team adapted teacher PD to establish community among our teachers and between teachers and staff, use this connection to enhance our responsiveness in PD, and deliver the engaging content of the [PROGRAM] curriculum. Teachers engaging remotely in [PROGRAM] activities have led to productive adaptations based on their challenges. The lessons learned reflecting back upon the PD will inform the design, delivery, and content of future [PROGRAM] teacher PDs. It is expected that future PD and professional learning offerings will continue to utilize flexible modalities and novel online tools, while also working to better align to PD standards. 
    more » « less