skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Knot data analysis using multiscale Gauss link integral
In the past decade, topological data analysis has emerged as a powerful algebraic topology approach in data science. Although knot theory and related subjects are a focus of study in mathematics, their success in practical applications is quite limited due to the lack of localization and quantization. We address these challenges by introducing knot data analysis (KDA), a paradigm that incorporates curve segmentation and multiscale analysis into the Gauss link integral. The resulting multiscale Gauss link integral (mGLI) recovers the global topological properties of knots and links at an appropriate scale and offers a multiscale geometric topology approach to capture the local structures and connectivities in data. By integration with machine learning or deep learning, the proposed mGLI significantly outperforms other state-of-the-art methods across various benchmark problems in 13 intricately complex biological datasets, including protein flexibility analysis, protein–ligand interactions, human Ether-à-go-go-Related Gene potassium channel blockade screening, and quantitative toxicity assessment. Our KDA opens a research area—knot deep learning—in data science.  more » « less
Award ID(s):
1900473
PAR ID:
10566697
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
National Academy of Sciences
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
121
Issue:
42
ISSN:
0027-8424
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this article, we define Vassiliev measures of complexity for open curves in 3-space. These are related to the coefficients of the enhanced Jones polynomial of open curves in 3-space. These Vassiliev measures are continuous functions of the curve coordinates; as the ends of the curve tend to coincide, they converge to the corresponding Vassiliev invariants of the resulting knot. We focus on the second Vassiliev measure from the enhanced Jones polynomial for closed and open curves in 3-space. For closed curves, this second Vassiliev measure can be computed by a Gauss code diagram and it has an integral formulation, the double alternating self-linking integral. The double alternating self-linking integral is a topological invariant of closed curves and a continuous function of the curve coordinates for open curves in 3-space. For polygonal curves, the double alternating self-linking integral obtains a simpler expression in terms of geometric probabilities. 
    more » « less
  2. Abstract Khovanov homology has been the subject of much study in knot theory and low dimensional topology since 2000. This work introduces a Khovanov Laplacian and a Khovanov Dirac to study knot and link diagrams. The harmonic spectrum of the Khovanov Laplacian or the Khovanov Dirac retains the topological invariants of Khovanov homology, while their non-harmonic spectra reveal additional information that is distinct from Khovanov homology. 
    more » « less
  3. Abstract Neurodegenerative diseases, like Alzheimer’s, are associated with the presence of neurofibrillary lesions formed by tau protein filaments in the cerebral cortex. While it is known that different morphologies of tau filaments characterize different neurodegenerative diseases, there are few metrics of global and local structure complexity that enable to quantify their structural diversity rigorously. In this manuscript, we employ for the first time mathematical topology and geometry to classify neurodegenerative diseases by using cryo-electron microscopy structures of tau filaments that are available in the Protein Data Bank. By employing mathematical topology metrics (Gauss linking integral, writhe and second Vassiliev measure) we achieve a consistent, but more refined classification of tauopathies, than what was previously observed through visual inspection. Our results reveal a hierarchy of classification from global to local topology and geometry characteristics. In particular, we find that tauopathies can be classified with respect to the handedness of their global conformations and the handedness of the relative orientations of their repeats. Progressive supranuclear palsy is identified as an outlier, with a more complex structure than the rest, reflected by a small, but observable knotoid structure (a diagrammatic structure representing non-trivial topology). This topological characteristic can be attributed to a pattern in the beginning of the R3 repeat that is present in all tauopathies but at different extent. Moreover, by comparing single filament to paired filament structures within tauopathies we find a consistent change in the side-chain orientations with respect to the alpha carbon atoms at the area of interaction. 
    more » « less
  4. Abstract The escalating drug addiction crisis in the United States underscores the urgent need for innovative therapeutic strategies. This study embarked on an innovative and rigorous strategy to unearth potential drug repurposing candidates for opioid and cocaine addiction treatment, bridging the gap between transcriptomic data analysis and drug discovery. We initiated our approach by conducting differential gene expression analysis on addiction-related transcriptomic data to identify key genes. We propose a novel topological differentiation to identify key genes from a protein–protein interaction network derived from DEGs. This method utilizes persistent Laplacians to accurately single out pivotal nodes within the network, conducting this analysis in a multiscale manner to ensure high reliability. Through rigorous literature validation, pathway analysis and data-availability scrutiny, we identified three pivotal molecular targets, mTOR, mGluR5 and NMDAR, for drug repurposing from DrugBank. We crafted machine learning models employing two natural language processing (NLP)-based embeddings and a traditional 2D fingerprint, which demonstrated robust predictive ability in gauging binding affinities of DrugBank compounds to selected targets. Furthermore, we elucidated the interactions of promising drugs with the targets and evaluated their drug-likeness. This study delineates a multi-faceted and comprehensive analytical framework, amalgamating bioinformatics, topological data analysis and machine learning, for drug repurposing in addiction treatment, setting the stage for subsequent experimental validation. The versatility of the methods we developed allows for applications across a range of diseases and transcriptomic datasets. 
    more » « less
  5. Abstract Protein‐ligand binding is a fundamental biological process that is paramount to many other biological processes, such as signal transduction, metabolic pathways, enzyme construction, cell secretion, and gene expression. Accurate prediction of protein‐ligand binding affinities is vital to rational drug design and the understanding of protein‐ligand binding and binding induced function. Existing binding affinity prediction methods are inundated with geometric detail and involve excessively high dimensions, which undermines their predictive power for massive binding data. Topology provides the ultimate level of abstraction and thus incurs too much reduction in geometric information. Persistent homology embeds geometric information into topological invariants and bridges the gap between complex geometry and abstract topology. However, it oversimplifies biological information. This work introduces element specific persistent homology (ESPH) or multicomponent persistent homology to retain crucial biological information during topological simplification. The combination of ESPH and machine learning gives rise to a powerful paradigm for macromolecular analysis. Tests on 2 large data sets indicate that the proposed topology‐based machine‐learning paradigm outperforms other existing methods in protein‐ligand binding affinity predictions. ESPH reveals protein‐ligand binding mechanism that can not be attained from other conventional techniques. The present approach reveals that protein‐ligand hydrophobic interactions are extended to 40Å  away from the binding site, which has a significant ramification to drug and protein design. 
    more » « less