Abstract Shape is data and data is shape. Biologists are accustomed to thinking about how the shape of biomolecules, cells, tissues, and organisms arise from the effects of genetics, development, and the environment. Less often do we consider that data itself has shape and structure, or that it is possible to measure the shape of data and analyze it. Here, we review applications of topological data analysis (TDA) to biology in a way accessible to biologists and applied mathematicians alike. TDA uses principles from algebraic topology to comprehensively measure shape in data sets. Using a function that relates the similarity of data points to each other, we can monitor the evolution of topological features—connected components, loops, and voids. This evolution, a topological signature, concisely summarizes large, complex data sets. We first provide a TDA primer for biologists before exploring the use of TDA across biological sub‐disciplines, spanning structural biology, molecular biology, evolution, and development. We end by comparing and contrasting different TDA approaches and the potential for their use in biology. The vision of TDA, that data are shape and shape is data, will be relevant as biology transitions into a data‐driven era where the meaningful interpretation of large data sets is a limiting factor. 
                        more » 
                        « less   
                    This content will become publicly available on November 28, 2025
                            
                            TopoLoop: A new tool for chromatin loop detection in live cells via single-particle tracking
                        
                    
    
            We present a novel method for identifying topological features of chromatin domains in live cells using single-particle tracking and topological data analysis (TDA). By applying TDA to particle trajectories, we can effectively detect complex spatial patterns, such as loops, that are often missed by traditional time series analysis. Using simulations of polymer bead–spring chains, we have validated the accuracy of our method and determined its limitations for detecting loops. Our approach offers a promising avenue for exploring the topological complexity of chromatin in living cells using TDA techniques. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1751339
- PAR ID:
- 10582526
- Publisher / Repository:
- AIP Publishing
- Date Published:
- Journal Name:
- The Journal of Chemical Physics
- Volume:
- 161
- Issue:
- 20
- ISSN:
- 0021-9606
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            An emerging method for data analysis is called Topological Data Analysis (TDA). TDA is based in the mathematical field of topology and examines the properties of spaces under continuous deformation. One of the key tools used for TDA is called persistent homology which considers the connectivity of points in a d-dimensional point cloud at different spatial resolutions to identify topological properties (holes, loops, and voids) in the space. Persistent homology then classifies the topological features by their persistence through the range of spatial connectivity. Unfortunately the memory and run-time complexity of computing persistent homology is exponential and current tools can only process a few thousand points in R3. Fortunately, the use of data reduction techniques enables persistent homology to be applied to much larger point clouds. Techniques to reduce the data range from random sampling of points to clustering the data and using the cluster centroids as the reduced data. While several data reduction approaches appear to preserve the large topological features present in the original point cloud, no systematic study comparing the efficacy of different data clustering techniques in preserving the persistent homology results has been performed. This paper explores the question of topology preserving data reductions and describes formally when and how topological features can be mischaracterized or lost by data reduction techniques. The paper also performs an experimental assessment of data reduction techniques and resilient effects on the persistent homology. In particular, data reduction by random selection is compared to cluster centroids extracted from different data clustering algorithms.more » « less
- 
            3D genomics methods such as Hi-C and Micro-C have uncovered chromatin loops across the genome and linked these loops to gene regulation. However, these methods only measure 3D interaction probabilities on a relative scale. Here, we overcome this limitation by using live imaging data to calibrate Micro-C in mouse embryonic stem cells, thus obtaining absolute looping probabilities for 36,804 chromatin loops across the genome. We find that the looped state is generally rare, with a mean probability of 2.3% and a maximum of 26% across the quantified loops. On average, CTCF-CTCF loops are stronger than loops between cis-regulatory elements (3.2% vs. 1.1%). Our findings can be extended to human stem cells and differentiated cells under certain assumptions. Overall, we establish an approach for genome-wide absolute loop quantification and report that loops generally occur with low probabilities, generalizing recent live imaging results to the whole genome.more » « less
- 
            Transient DNA loops occur throughout the genome due to thermal fluctuations of DNA and the function of SMC complex proteins such as condensin and cohesin. Transient crosslinking within and between chromosomes and loop extrusion by SMCs have profound effects on high-order chromatin organization and exhibit specificity in cell type, cell cycle stage, and cellular environment. SMC complexes anchor one end to DNA with the other extending some distance and retracting to form a loop. How cells regulate loop sizes and how loops distribute along chromatin are emerging questions. To understand loop size regulation, we employed bead–spring polymer chain models of chromatin and the activity of an SMC complex on chromatin. Our study shows that (1) the stiffness of the chromatin polymer chain, (2) the tensile stiffness of chromatin crosslinking complexes such as condensin, and (3) the strength of the internal or external tethering of chromatin chains cooperatively dictate the loop size distribution and compaction volume of induced chromatin domains. When strong DNA tethers are invoked, loop size distributions are tuned by condensin stiffness. When DNA tethers are released, loop size distributions are tuned by chromatin stiffness. In this three-way interaction, the presence and strength of tethering unexpectedly dictates chromatin conformation within a topological domain.more » « less
- 
            null (Ed.)Topological data analysis (TDA) combines concepts from algebraic topology, machine learning, statistics, and data science which allow us to study data in terms of their latent shape properties. Despite the use of TDA in a broad range of applications, from neuroscience to power systems to finance, the utility of TDA in Earth science applications is yet untapped. The current study aims to offer a new approach for analyzing multi-resolution Earth science datasets using the concept of data shape and associated intrinsic topological data characteristics. In particular, we develop a new topological approach to quantitatively compare two maps of geophysical variables at different spatial resolutions. We illustrate the proposed methodology by applying TDA to aerosol optical depth (AOD) datasets from the Goddard Earth Observing System, Version 5 (GEOS-5) model over the Middle East. Our results show that, contrary to the existing approaches, TDA allows for systematic and reliable comparison of spatial patterns from different observational and model datasets without regridding the datasets into common grids.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
