skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The decomposition of the higher-order homology embedding constructed from the k-Laplacian
The null space of the k-th order Laplacian Lk, known as the {\em k-th homology vector space}, encodes the non-trivial topology of a manifold or a network. Understanding the structure of the homology embedding can thus disclose geometric or topological information from the data. The study of the null space embedding of the graph Laplacian L0 has spurred new research and applications, such as spectral clustering algorithms with theoretical guarantees and estimators of the Stochastic Block Model. In this work, we investigate the geometry of the k-th homology embedding and focus on cases reminiscent of spectral clustering. Namely, we analyze the {\em connected sum} of manifolds as a perturbation to the direct sum of their homology embeddings. We propose an algorithm to factorize the homology embedding into subspaces corresponding to a manifold's simplest topological components. The proposed framework is applied to the {\em shortest homologous loop detection} problem, a problem known to be NP-hard in general. Our spectral loop detection algorithm scales better than existing methods and is effective on diverse data such as point clouds and images.  more » « less
Award ID(s):
2015272 1810975
PAR ID:
10347221
Author(s) / Creator(s):
;
Editor(s):
M. Ranzato; A. Beygelzimer; Y. Dauphin; P.S. Liang; J. Wortman Vaughan
Date Published:
Journal Name:
Advances in neural information processing systems
Volume:
34
ISSN:
1049-5258
Page Range / eLocation ID:
15695--15709
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Bojańczyk, Mikołaj; Merelli, Emanuela; Woodruff, David P (Ed.)
    Given n points in 𝓁_p^d, we consider the problem of partitioning points into k clusters with associated centers. The cost of a clustering is the sum of p-th powers of distances of points to their cluster centers. For p ∈ [1,2], we design sketches of size poly(log(nd),k,1/ε) such that the cost of the optimal clustering can be estimated to within factor 1+ε, despite the fact that the compressed representation does not contain enough information to recover the cluster centers or the partition into clusters. This leads to a streaming algorithm for estimating the clustering cost with space poly(log(nd),k,1/ε). We also obtain a distributed memory algorithm, where the n points are arbitrarily partitioned amongst m machines, each of which sends information to a central party who then computes an approximation of the clustering cost. Prior to this work, no such streaming or distributed-memory algorithm was known with sublinear dependence on d for p ∈ [1,2). 
    more » « less
  2. Accurate detection of protein sequence homology is essential for understanding evolutionary relationships and predicting protein functions, particularly for detecting remote homology in the “twilight zone” (20-35% sequence similarity), where traditional sequence alignment methods often fail. Recent studies show that embeddings from protein language models (pLM) can improve remote homology detection over traditional methods. Alignment-based approaches, such as those combining pLMs with dynamic programming alignment, further improve performance but often suffer from noise in the resulting similarity matrices. To address this, we evaluate a newly developed embedding-based sequence alignment approach that refines residue-level embedding similarity using K-means clustering and double dynamic programming (DDP). We show that the incorporation of clustering and DDP contributes substantially to the improved performance in detecting remote homology. Experimental results demonstrate that our approach outperforms both traditional and state-of-the-art approaches based on pLMs on several benchmarks. Our study illustrates embedding-based alignment refined with clustering and DDP offers a powerful approach for identifying remote homology, with potential to evolve further as pLMs continue to advance. 
    more » « less
  3. Abstract A spectral minimal partition of a manifold is its decomposition into disjoint open sets that minimizes a spectral energy functional. It is known that bipartite spectral minimal partitions coincide with nodal partitions of Courant‐sharp Laplacian eigenfunctions. However, almost all minimal partitions are non‐bipartite. To study those, we define a modified Laplacian operator and prove that the nodal partitions of its Courant‐sharp eigenfunctions are minimal within a certain topological class of partitions. This yields new results in the non‐bipartite case and recovers the above known result in the bipartite case. Our approach is based on tools from algebraic topology, which we illustrate by a number of examples where the topological types of partitions are characterized by relative homology. 
    more » « less
  4. Clustering continues to be an important tool for data engineering and analysis. While advances in deep learning tend to be at the forefront of machine learning, it is only useful for the supervised classification of data sets. Clustering is an essential tool for problems where labeling data sets is either too labor intensive or where there is no agreed upon ground truth. The well studied k-means problem partitions groups of similar vectors into k clusters by iteratively updating the cluster assignment such that it minimizes the within cluster sum of squares metric. Unfortunately k-means can become prohibitive for very large high dimensional data sets as iterative methods often rely on random access to, or multiple passes over, the data set — a requirement that is not often possible for large and potentially unbounded data sets. In this work we explore an randomized, approximate method for clustering called Tree-Walk Random Projection Clustering (TWRP) that is a fast, memory efficient method for finding cluster embedding in high dimensional spaces. TWRP combines random projection with a tree based partitioner to achieve a clustering method that forgoes storing the exhaustive representation of all vectors in the data space and instead performs a bounded search over the implied cluster bifurcation tree represented as approximate vector and count values. The TWRP algorithm is described and experimentally evaluated for scalability and accuracy in the presence of noise against several other well-known algorithms. 
    more » « less
  5. null (Ed.)
    Abstract In this paper, we introduce and study representation homology of topological spaces, which is a natural homological extension of representation varieties of fundamental groups. We give an elementary construction of representation homology parallel to the Loday–Pirashvili construction of higher Hochschild homology; in fact, we establish a direct geometric relation between the two theories by proving that the representation homology of the suspension of a (pointed connected) space is isomorphic to its higher Hochschild homology. We also construct some natural maps and spectral sequences relating representation homology to other homology theories associated with spaces (such as Pontryagin algebras, $${{\mathbb{S}}}^1$$-equivariant homology of the free loop space, and stable homology of automorphism groups of f.g. free groups). We compute representation homology explicitly (in terms of known invariants) in a number of interesting cases, including spheres, suspensions, complex projective spaces, Riemann surfaces, and some 3-dimensional manifolds, such as link complements in $${\mathbb{R}}^3$$ and the lens spaces $ L(p,q) $. In the case of link complements, we identify the representation homology in terms of ordinary Hochschild homology, which gives a new algebraic invariant of links in $${\mathbb{R}}^3$$. 
    more » « less