Abstract Shape is data and data is shape. Biologists are accustomed to thinking about how the shape of biomolecules, cells, tissues, and organisms arise from the effects of genetics, development, and the environment. Less often do we consider that data itself has shape and structure, or that it is possible to measure the shape of data and analyze it. Here, we review applications of topological data analysis (TDA) to biology in a way accessible to biologists and applied mathematicians alike. TDA uses principles from algebraic topology to comprehensively measure shape in data sets. Using a function that relates the similarity of data points to each other, we can monitor the evolution of topological features—connected components, loops, and voids. This evolution, a topological signature, concisely summarizes large, complex data sets. We first provide a TDA primer for biologists before exploring the use of TDA across biological sub‐disciplines, spanning structural biology, molecular biology, evolution, and development. We end by comparing and contrasting different TDA approaches and the potential for their use in biology. The vision of TDA, that data are shape and shape is data, will be relevant as biology transitions into a data‐driven era where the meaningful interpretation of large data sets is a limiting factor.
more »
« less
Measuring hidden phenotype: quantifying the shape of barley seeds using the Euler characteristic transform
Abstract Shape plays a fundamental role in biology. Traditional phenotypic analysis methods measure some features but fail to measure the information embedded in shape comprehensively. To extract, compare and analyse this information embedded in a robust and concise way, we turn to topological data analysis (TDA), specifically the Euler characteristic transform. TDA measures shape comprehensively using mathematical representations based on algebraic topology features. To study its use, we compute both traditional and topological shape descriptors to quantify the morphology of 3121 barley seeds scanned with X-ray computed tomography (CT) technology at 127 μm resolution. The Euler characteristic transform measures shape by analysing topological features of an object at thresholds across a number of directional axes. A Kruskal–Wallis analysis of the information encoded by the topological signature reveals that the Euler characteristic transform picks up successfully the shape of the crease and bottom of the seeds. Moreover, while traditional shape descriptors can cluster the seeds based on their accession, topological shape descriptors can cluster them further based on their panicle. We then successfully train a support vector machine to classify 28 different accessions of barley based exclusively on the shape of their grains. We observe that combining both traditional and topological descriptors classifies barley seeds better than using just traditional descriptors alone. This improvement suggests that TDA is thus a powerful complement to traditional morphometrics to comprehensively describe a multitude of ‘hidden’ shape nuances which are otherwise not detected.
more »
« less
- PAR ID:
- 10358277
- Editor(s):
- Chen, Tsu-Wei; Long, Stephen P
- Date Published:
- Journal Name:
- in silico Plants
- Volume:
- 4
- Issue:
- 1
- ISSN:
- 2517-5025
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Topological Data Analysis (TDA) utilizes concepts from topology to analyze data. In general, TDA considers objects similar based on a topological invariant. Topological invariants are properties of the topological space that are homeomorphic; resilient to deformation in the space. The Euler-Poincaré Characteristic is a classic topological invariant that represents the alternating sum of the vertices, edges, faces, and higherorder cells of a closed surface. Tracking the Euler characteristic over a topological filtration produces an Euler Characteristic Curve (ECC). This study introduces a computational technique to determine the ECC of R2 or R3 data; the technique generalizes to higher dimensions. This technique separates landscapes of lowerorder homologies utilizing triangulations of the space.more » « less
-
Recent developments in shape reconstruction and comparison call for the use of many different types of topological descriptors (persistence diagrams, Euler characteristic functions, etc.). We establish a framework that allows for quantitative comparisons of topological descriptor types and therefore may be used as a tool in more rigorously justifying choices made in applications. We then use this framework to partially order a set of six common topological descriptor types. In particular, the resulting poset gives insight into the advantages of using verbose rather than concise topological descriptors. We then provide lower bounds on the size of sets of descriptors that are complete discrete invariants of simplicial complexes, both tight and worst case. This work sets up a rigorous theory that allows for future comparisons and analysis of topological descriptor types.more » « less
-
Recent developments in shape reconstruction and comparison call for the use of many different (topological) descriptor types, such as persistence diagrams and Euler characteristic functions. We establish a framework to quantitatively compare the strength of different descriptor types, setting up a theory that allows for future comparisons and analysis of descriptor types and that can inform choices made in applications. We use this framework to partially order a set of six common descriptor types. We then give lower bounds on the size of sets of descriptors that uniquely correspond to simplicial complexes, giving insight into the advantages of using verbose rather than concise topological descriptors.more » « less
-
Topological data analysis (TDA) has proven to be a potent approach for extracting intricate topological structures from complex and high-dimensional data. In this paper, we propose a TDA-based processing pipeline for analyzing multi-channel scalp EEG data. The pipeline starts with extracting both frequency and temporal information from the signals via the Hilbert–Huang Transform. The sequences of instantaneous frequency and instantaneous amplitude across all electrode channels are treated as approximations of curves in the high-dimensional space. TDA features, which represent the local topological structure of the curves, are further extracted and used in the classification models. Three sets of scalp EEG data, including one collected in a lab and two Brain–computer Interface (BCI) competition data, were used to validate the proposed methods, and compare with other state-of-art TDA methods. The proposed TDA-based approach shows superior performance and outperform the winner of the BCI competition. Besides BCI, the proposed method can also be applied to spatial and temporal data in other domains such as computer vision, remote sensing, and medical imaging.more » « less
An official website of the United States government

