Current complex prediction models are the result of fitting deep neural networks, graph convolutional networks or transducers to a set of training data. A key challenge with these models is that they are highly parameterized, which makes describing and interpreting the prediction strategies difficult. We use topological data analysis to transform these complex prediction models into a simplified topological view of the prediction landscape. The result is a map of the predictions that enables inspection of the model results with more specificity than dimensionalityreduction methods such as tSNE and UMAP. The methods scale up to large datasets across different domains. We present a case study of a transformerbased model previously designed to predict expression levels of a piece of DNA in thousands of genomic tracks. When the model is used to study mutations in the
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

Abstract BRCA1 gene, our topological analysis shows that it is sensitive to the location of a mutation and the exon structure ofBRCA1 in ways that cannot be found with tools based on dimensionality reduction. Moreover, the topological framework offers multiple ways to inspect results, including an error estimate that is more accurate than model uncertainty. Further studies show how these ideas produce useful results in graphbased learning and image classification.Free, publiclyaccessible full text available November 17, 2024 
Abstract The notion of generalized rank in the context of multiparameter persistence has become an important ingredient for defining interesting homological structures such as generalized persistence diagrams. However, its efficient computation has not yet been studied in the literature. We show that the generalized rank over a finite interval
I of a indexed persistence module$$\textbf{Z}^2$$ ${Z}^{2}$M is equal to the generalized rank of the zigzag module that is induced on a certain path inI tracing mostly its boundary. Hence, we can compute the generalized rank ofM overI by computing the barcode of the zigzag module obtained by restricting to that path. IfM is the homology of a bifiltrationF of simplices (while accounting for multicriticality) and$$t$$ $t$I consists of points, this computation takes$$t$$ $t$ time where$$O(t^\omega )$$ $O\left({t}^{\omega}\right)$ is the exponent of matrix multiplication. We apply this result to obtain an improved algorithm for the following problem. Given a bifiltration inducing a module$$\omega \in [2,2.373)$$ $\omega \in [2,2.373)$M , determine whetherM is interval decomposable and, if so, compute all intervals supporting its indecomposable summands. 
Chambers, Erin W. ; Gudmundsson, Joachim (Ed.)We first introduce the notion of metarank for a 2parameter persistence module, an invariant that captures the information behind images of morphisms between 1D slices of the module. We then define the metadiagram of a 2parameter persistence module to be the Möbius inversion of the metarank, resulting in a function that takes values from signed 1parameter persistence modules. We show that the metarank and metadiagram contain information equivalent to the rank invariant and the signed barcode. This equivalence leads to computational benefits, as we introduce an algorithm for computing the metarank and metadiagram of a 2parameter module M indexed by a bifiltration of n simplices in O(n³) time. This implies an improvement upon the existing algorithm for computing the signed barcode, which has O(n⁴) time complexity. This also allows us to improve the existing upper bound on the number of rectangles in the rank decomposition of M from O(n⁴) to O(n³). In addition, we define notions of erosion distance between metaranks and between metadiagrams, and show that under these distances, metaranks and metadiagrams are stable with respect to the interleaving distance. Lastly, the metadiagram can be visualized in an intuitive fashion as a persistence diagram of diagrams, which generalizes the wellunderstood persistence diagram in the 1parameter setting.more » « lessFree, publiclyaccessible full text available June 9, 2024

Abstract Background Interpretation of highthroughput gene expression data continues to require mathematical tools in data analysis that recognizes the shape of the data in high dimensions. Topological data analysis (TDA) has recently been successful in extracting robust features in several applications dealing with high dimensional constructs. In this work, we utilize some recent developments in TDA to curate gene expression data. Our work differs from the predecessors in two aspects: (1) Traditional TDA pipelines use topological signatures called barcodes to enhance feature vectors which are used for classification. In contrast, this work involves curating relevant features to obtain somewhat better representatives with the help of TDA. This representatives of the entire data facilitates better comprehension of the phenotype labels. (2) Most of the earlier works employ barcodes obtained using topological summaries as fingerprints for the data. Even though they are stable signatures, there exists no direct mapping between the data and said barcodes.
Results The topology relevant curated data that we obtain provides an improvement in shallow learning as well as deep learning based supervised classifications. We further show that the representative cycles we compute have an unsupervised inclination towards phenotype labels. This work thus shows that topological signatures are able to comprehend gene expression levels and classify cohorts accordingly.
Conclusions In this work, we engender representative persistent cycles to discern the gene expression data. These cycles allow us to directly procure genes entailed in similar processes.

Tomography is a widely used tool for analyzing microstructures in three dimensions (3D). The analysis, however, faces difficulty because the constituent materials produce similar greyscale values. Sometimes, this prompts the image segmentation process to assign a pixel/voxel to the wrong phase (active material or pore). Consequently, errors are introduced in the microstructure characteristics calculation. In this work, we develop a filtering algorithm called PerSplat based on topological persistence (a technique used in topological data analysis) to improve segmentation quality. One problem faced when evaluating filtering algorithms is that real image data in general are not equipped with the `ground truth' for the microstructure characteristics. For this study, we construct synthetic images for which the groundtruth values are known. On the synthetic images, we compare the pore tortuosity and Minkowski functionals (volume and surface area) computed with our PerSplat filter and other methods such as total variation (TV) and nonlocal means (NLmeans). Moreover, on a real 3D image, we visually compare the segmentation results provided by our filter against TV and NLmeans. The experimental results indicate that PerSplat provides a significant improvement in segmentation quality.more » « less

Persistent cycles, especially the minimal ones, are useful geometric features functioning as augmentations for the intervals in the purely topological persistence diagrams (also termed as barcodes). In our earlier work, we showed that computing minimal 1dimensional persistent cycles (persistent 1cycles) for finite intervals is NPhard while the same for infinite intervals is polynomially tractable. In this paper, we address this problem for general dimensions with Z2 coefficients. In addition to proving that it is NPhard to compute minimal persistent dcycles (d>1) for both types of intervals given arbitrary simplicial complexes, we identify two interesting cases which are polynomially tractable. These two cases assume the complex to be a certain generalization of manifolds which we term as weak pseudomanifolds. For finite intervals from the dth persistence diagram of a weak (d+1)pseudomanifold, we utilize the fact that persistent cycles of such intervals are nullhomologous and reduce the problem to a minimal cut problem. Since the same problem for infinite intervals is NPhard, we further assume the weak (d+1)pseudomanifold to be embedded in R^{d+1}Rd+1 so that the complex has a natural dual graph structure and the problem reduces to a minimal cut problem. Experiments with both algorithms on scientific data indicate that the minimal persistent cycles capture various significant features of the data.more » « less

Computation of the interleaving distance between persistence modules is a central task in topological data analysis. For $1$D persistence modules, thanks to the isometry theorem, this can be done by computing the bottleneck distance with known efficient algorithms. The question is open for most $n$D persistence modules, $n>1$, because of the well recognized complications of the indecomposables. Here, we consider a reasonably complicated class called {\em $2$D interval decomposable} modules whose indecomposables may have a description of nonconstant complexity. We present a polynomial time algorithm to compute the bottleneck distance for these modules from indecomposables, which bounds the interleaving distance from above, and give another algorithm to compute a new distance called {\em dimension distance} that bounds it from below.more » « less

Recovering hidden graphlike structures from potentially noisy data is a fundamental task in modern data analysis. Recently, a persistenceguided discrete Morsebased framework to extract a geometric graph from lowdimensional data has become popular. However, to date, there is very limited theoretical understanding of this framework in terms of graph reconstruction. This paper makes a first step towards closing this gap. Specifically, first, leveraging existing theoretical understanding of persistenceguided discrete Morse cancellation, we provide a simplified version of the existing discrete Morsebased graph reconstruction algorithm. We then introduce a simple and natural noise model and show that the aforementioned framework can correctly reconstruct a graph under this noise model, in the sense that it has the same loop structure as the hidden groundtruth graph, and is also geometrically close. We also provide some experimental results for our simplified graphreconstruction algorithm.more » « less