Shape is data and data is shape. Biologists are accustomed to thinking about how the shape of biomolecules, cells, tissues, and organisms arise from the effects of genetics, development, and the environment. Less often do we consider that data itself has shape and structure, or that it is possible to measure the shape of data and analyze it. Here, we review applications of topological data analysis (TDA) to biology in a way accessible to biologists and applied mathematicians alike. TDA uses principles from algebraic topology to comprehensively measure shape in data sets. Using a function that relates the similarity of data points to each other, we can monitor the evolution of topological features—connected components, loops, and voids. This evolution, a topological signature, concisely summarizes large, complex data sets. We first provide a TDA primer for biologists before exploring the use of TDA across biological sub‐disciplines, spanning structural biology, molecular biology, evolution, and development. We end by comparing and contrasting different TDA approaches and the potential for their use in biology. The vision of TDA, that data are shape and shape is data, will be relevant as biology transitions into a data‐driven era where the meaningful interpretation of large datamore »
- Editors:
- Chen, Tsu-Wei; Long, Stephen P
- Publication Date:
- NSF-PAR ID:
- 10358277
- Journal Name:
- in silico Plants
- Volume:
- 4
- Issue:
- 1
- ISSN:
- 2517-5025
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
Abstract This paper introduces the use of topological data analysis (TDA) as an unsupervised machine learning tool to uncover classification criteria in complex inorganic crystal chemistries. Using the apatite chemistry as a template, we track through the use of persistent homology the topological connectivity of input crystal chemistry descriptors on defining similarity between different stoichiometries of apatites. It is shown that TDA automatically identifies a hierarchical classification scheme within apatites based on the commonality of the number of discrete coordination polyhedra that constitute the structural building units common among the compounds. This information is presented in the form of a visualization scheme of a barcode of homology classifications, where the persistence of similarity between compounds is tracked. Unlike traditional perspectives of structure maps, this new “Materials Barcode” schema serves as an automated exploratory machine learning tool that can uncover structural associations from crystal chemistry databases, as well as to achieve a more nuanced insight into what defines similarity among homologous compounds.
-
Abstract Background Interpretation of high-throughput gene expression data continues to require mathematical tools in data analysis that recognizes the shape of the data in high dimensions. Topological data analysis (TDA) has recently been successful in extracting robust features in several applications dealing with high dimensional constructs. In this work, we utilize some recent developments in TDA to curate gene expression data. Our work differs from the predecessors in two aspects: (1) Traditional TDA pipelines use topological signatures called barcodes to enhance feature vectors which are used for classification. In contrast, this work involves curating relevant features to obtain somewhat better representatives with the help of TDA. This representatives of the entire data facilitates better comprehension of the phenotype labels. (2) Most of the earlier works employ barcodes obtained using topological summaries as fingerprints for the data. Even though they are stable signatures, there exists no direct mapping between the data and said barcodes.
Results The topology relevant curated data that we obtain provides an improvement in shallow learning as well as deep learning based supervised classifications. We further show that the representative cycles we compute have an unsupervised inclination towards phenotype labels. This work thus shows that topological signatures are able tomore »
Conclusions In this work, we engender representative persistent cycles to discern the gene expression data. These cycles allow us to directly procure genes entailed in similar processes.
-
Abstract Surface texture influences wear and tribological properties of manufactured parts, and it plays a critical role in end-user products. Therefore, quantifying the order or structure of a manufactured surface provides important information on the quality and life expectancy of the product. Although texture can be intentionally introduced to enhance aesthetics or to satisfy a design function, sometimes it is an inevitable byproduct of surface treatment processes such as Piezo Vibration Striking Treatment (PVST). Measures of order for surfaces have been characterized using statistical, spectral, and geometric approaches. For nearly hexagonal lattices, topological tools have also been used to measure the surface order. This paper explores utilizing tools from Topological Data Analysis for measuring surface texture. We compute measures of order based on optical digital microscope images of surfaces treated using PVST. These measures are applied to the grid obtained from estimating the centers of tool impacts, and they quantify the grid’s deviations from the nominal one. Our results show that TDA provides a convenient framework for characterization of pattern type that bypasses some limitations of existing tools such as difficult manual processing of the data and the need for an expert user to analyze and interpret the surface images.
-
Traditionally, engineering culture has limited rather than fostered diversity in engineering. To address this persistent issue, we examine how diverse students identify with engineering and navigate the culture of engineering. We define diversity not by making a priori categorizations according to traditional demographic information (e.g., race, gender, sexual orientation, etc.), but instead by investigating the variation in students’ attitudinal profiles on a host of affective measures. Using these measures, we develop an identification of large, “normative” groups of engineers as well as “non-normative” students who emerge as having distinct attitudinal profiles. This mixed methods study investigates the intersectionality of engineering students' personal identities to understand: How do non-normative groups in engineering form an engineering identity and navigate a culture dominated by limited diversity? The focus of this paper is on the first phase this project, in which students' identities, motivation, psychological traits, perceived supports and barriers to engineering, and other background information is being quantitatively assessed. Pilot survey data were collected from participants enrolled in second semester first-year engineering programs across three institutions (n=374). We used topological data analysis (TDA) to create normative and non-normative attitudinal profiles of respondents. As a relatively new and powerful set of analytic methods, TDAmore »