skip to main content

Title: Persistent spectral graph

Persistent homology is constrained to purely topological persistence, while multiscale graphs account only for geometric information. This work introduces persistent spectral theory to create a unified low‐dimensional multiscale paradigm for revealing topological persistence and extracting geometric shapes from high‐dimensional datasets. For a point‐cloud dataset, a filtration procedure is used to generate a sequence of chain complexes and associated families of simplicial complexes and chains, from which we construct persistent combinatorial Laplacian matrices. We show that a full set of topological persistence can be completely recovered from the harmonic persistent spectra, that is, the spectra that have zero eigenvalues, of the persistent combinatorial Laplacian matrices. However, non‐harmonic spectra of the Laplacian matrices induced by the filtration offer another powerful tool for data analysis, modeling, and prediction. In this work, fullerene stability is predicted by using both harmonic spectra and non‐harmonic persistent spectra, while the latter spectra are successfully devised to analyze the structure of fullerenes and model protein flexibility, which cannot be straightforwardly extracted from the current persistent homology. The proposed method is found to provide excellent predictions of the protein B‐factors for which current popular biophysical models break down.

more » « less
Award ID(s):
1900473 1761320 1721024
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
International Journal for Numerical Methods in Biomedical Engineering
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    One-dimensional persistent homology is arguably the most important and heavily used computational tool in topological data analysis. Additional information can be extracted from datasets by studying multi-dimensional persistence modules and by utilizing cohomological ideas, e.g. the cohomological cup product. In this work, given a single parameter filtration, we investigate a certain 2-dimensional persistence module structure associated with persistent cohomology, where one parameter is the cup-length$$\ell \ge 0$$0and the other is the filtration parameter. This new persistence structure, called thepersistent cup module, is induced by the cohomological cup product and adapted to the persistence setting. Furthermore, we show that this persistence structure is stable. By fixing the cup-length parameter$$\ell $$, we obtain a 1-dimensional persistence module, called the persistent$$\ell $$-cup module, and again show it is stable in the interleaving distance sense, and study their associated generalized persistence diagrams. In addition, we consider a generalized notion of apersistent invariant, which extends both therank invariant(also referred to aspersistent Betti number), Puuska’s rank invariant induced by epi-mono-preserving invariants of abelian categories, and the recently-definedpersistent cup-length invariant, and we establish their stability. This generalized notion of persistent invariant also enables us to lift the Lyusternik-Schnirelmann (LS) category of topological spaces to a novel stable persistent invariant of filtrations, called thepersistent LS-category invariant.

    more » « less
  2. null (Ed.)
    Graphs model real-world circumstances in many applications where they may constantly change to capture the dynamic behavior of the phenomena. Topological persistence which provides a set of birth and death pairs for the topological features is one instrument for analyzing such changing graph data. However, standard persistent homology defined over a growing space cannot always capture such a dynamic process unless shrinking with deletions is also allowed. Hence, zigzag persistence which incorporates both insertions and deletions of simplices is more appropriate in such a setting. Unlike standard persistence which admits nearly linear-time algorithms for graphs, such results for the zigzag version improving the general O(m^ω) time complexity are not known, where ω < 2.37286 is the matrix multiplication exponent. In this paper, we propose algorithms for zigzag persistence on graphs which run in near-linear time. Specifically, given a filtration with m additions and deletions on a graph with n vertices and edges, the algorithm for 0-dimension runs in O(mlog² n+mlog m) time and the algorithm for 1-dimension runs in O(mlog⁴ n) time. The algorithm for 0-dimension draws upon another algorithm designed originally for pairing critical points of Morse functions on 2-manifolds. The algorithm for 1-dimension pairs a negative edge with the earliest positive edge so that a 1-cycle containing both edges resides in all intermediate graphs. Both algorithms achieve the claimed time complexity via dynamic graph data structures proposed by Holm et al. In the end, using Alexander duality, we extend the algorithm for 0-dimension to compute the (p-1)-dimensional zigzag persistence for ℝ^p-embedded complexes in O(mlog² n+mlog m+nlog n) time. 
    more » « less
  3. Abstract

    Topological data analysis (TDA) is a tool from data science and mathematics that is beginning to make waves in environmental science. In this work, we seek to provide an intuitive and understandable introduction to a tool from TDA that is particularly useful for the analysis of imagery, namely, persistent homology. We briefly discuss the theoretical background but focus primarily on understanding the output of this tool and discussing what information it can glean. To this end, we frame our discussion around a guiding example of classifying satellite images from the sugar, fish, flower, and gravel dataset produced for the study of mesoscale organization of clouds by Rasp et al. We demonstrate how persistent homology and its vectorization, persistence landscapes, can be used in a workflow with a simple machine learning algorithm to obtain good results, and we explore in detail how we can explain this behavior in terms of image-level features. One of the core strengths of persistent homology is how interpretable it can be, so throughout this paper we discuss not just the patterns we find but why those results are to be expected given what we know about the theory of persistent homology. Our goal is that readers of this paper will leave with a better understanding of TDA and persistent homology, will be able to identify problems and datasets of their own for which persistent homology could be helpful, and will gain an understanding of the results they obtain from applying the included GitHub example code.

    Significance Statement

    Information such as the geometric structure and texture of image data can greatly support the inference of the physical state of an observed Earth system, for example, in remote sensing to determine whether wildfires are active or to identify local climate zones. Persistent homology is a branch of topological data analysis that allows one to extract such information in an interpretable way—unlike black-box methods like deep neural networks. The purpose of this paper is to explain in an intuitive manner what persistent homology is and how researchers in environmental science can use it to create interpretable models. We demonstrate the approach to identify certain cloud patterns from satellite imagery and find that the resulting model is indeed interpretable.

    more » « less
  4. Persistent cycles, especially the minimal ones, are useful geometric features functioning as augmentations for the intervals in the purely topological persistence diagrams (also termed as barcodes). In our earlier work, we showed that computing minimal 1-dimensional persistent cycles (persistent 1-cycles) for finite intervals is NP-hard while the same for infinite intervals is polynomially tractable. In this paper, we address this problem for general dimensions with Z2 coefficients. In addition to proving that it is NP-hard to compute minimal persistent d-cycles (d>1) for both types of intervals given arbitrary simplicial complexes, we identify two interesting cases which are polynomially tractable. These two cases assume the complex to be a certain generalization of manifolds which we term as weak pseudomanifolds. For finite intervals from the d-th persistence diagram of a weak (d+1)-pseudomanifold, we utilize the fact that persistent cycles of such intervals are null-homologous and reduce the problem to a minimal cut problem. Since the same problem for infinite intervals is NP-hard, we further assume the weak (d+1)-pseudomanifold to be embedded in R^{d+1}Rd+1 so that the complex has a natural dual graph structure and the problem reduces to a minimal cut problem. Experiments with both algorithms on scientific data indicate that the minimal persistent cycles capture various significant features of the data. 
    more » « less
  5. Abstract

    Protein‐ligand binding is a fundamental biological process that is paramount to many other biological processes, such as signal transduction, metabolic pathways, enzyme construction, cell secretion, and gene expression. Accurate prediction of protein‐ligand binding affinities is vital to rational drug design and the understanding of protein‐ligand binding and binding induced function. Existing binding affinity prediction methods are inundated with geometric detail and involve excessively high dimensions, which undermines their predictive power for massive binding data. Topology provides the ultimate level of abstraction and thus incurs too much reduction in geometric information. Persistent homology embeds geometric information into topological invariants and bridges the gap between complex geometry and abstract topology. However, it oversimplifies biological information. This work introduces element specific persistent homology (ESPH) or multicomponent persistent homology to retain crucial biological information during topological simplification. The combination of ESPH and machine learning gives rise to a powerful paradigm for macromolecular analysis. Tests on 2 large data sets indicate that the proposed topology‐based machine‐learning paradigm outperforms other existing methods in protein‐ligand binding affinity predictions. ESPH reveals protein‐ligand binding mechanism that can not be attained from other conventional techniques. The present approach reveals that protein‐ligand hydrophobic interactions are extended to 40Å  away from the binding site, which has a significant ramification to drug and protein design.

    more » « less