skip to main content

Title: CryoETGAN: Cryo-Electron Tomography Image Synthesis via Unpaired Image Translation
Cryo-electron tomography (Cryo-ET) has been regarded as a revolution in structural biology and can reveal molecular sociology. Its unprecedented quality enables it to visualize cellular organelles and macromolecular complexes at nanometer resolution with native conformations. Motivated by developments in nanotechnology and machine learning, establishing machine learning approaches such as classification, detection and averaging for Cryo-ET image analysis has inspired broad interest. Yet, deep learning-based methods for biomedical imaging typically require large labeled datasets for good results, which can be a great challenge due to the expense of obtaining and labeling training data. To deal with this problem, we propose a generative model to simulate Cryo-ET images efficiently and reliably: CryoETGAN. This cycle-consistent and Wasserstein generative adversarial network (GAN) is able to generate images with an appearance similar to the original experimental data. Quantitative and visual grading results on generated images are provided to show that the results of our proposed method achieve better performance compared to the previous state-of-the-art simulation methods. Moreover, CryoETGAN is stable to train and capable of generating plausibly diverse image samples.
; ; ; ; ; ;
Award ID(s):
2007595 1949629
Publication Date:
Journal Name:
Frontiers in Physiology
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background Cryo-electron tomography is an important and powerful technique to explore the structure, abundance, and location of ultrastructure in a near-native state. It contains detailed information of all macromolecular complexes in a sample cell. However, due to the compact and crowded status, the missing edge effect, and low signal to noise ratio (SNR), it is extremely challenging to recover such information with existing image processing methods. Cryo-electron tomogram simulation is an effective solution to test and optimize the performance of the above image processing methods. The simulated images could be regarded as the labeled data which covers a wide range of macromolecular complexes and ultrastructure. To approximate the crowded cellular environment, it is very important to pack these heterogeneous structures as tightly as possible. Besides, simulating non-deformable and deformable components under a unified framework also need to be achieved. Result In this paper, we proposed a unified framework for simulating crowded cryo-electron tomogram images including non-deformable macromolecular complexes and deformable ultrastructures. A macromolecule was approximated using multiple balls with fixed relative positions to reduce the vacuum volume. A ultrastructure, such as membrane and filament, was approximated using multiple balls with flexible relative positions so that this structure could deformmore »under force field. In the experiment, 400 macromolecules of 20 representative types were packed into simulated cytoplasm by our framework, and numerical verification proved that our method has a smaller volume and higher compression ratio than the baseline single-ball model. We also packed filaments, membranes and macromolecules together, to obtain a simulated cryo-electron tomogram image with deformable structures. The simulated results are closer to the real Cryo-ET, making the analysis more difficult. The DOG particle picking method and the image segmentation method are tested on our simulation data, and the experimental results show that these methods still have much room for improvement. Conclusion The proposed multi-ball model can achieve more crowded packaging results and contains richer elements with different properties to obtain more realistic cryo-electron tomogram simulation. This enables users to simulate cryo-electron tomogram images with non-deformable macromolecular complexes and deformable ultrastructures under a unified framework. To illustrate the advantages of our framework in improving the compression ratio, we calculated the volume of simulated macromolecular under our multi-ball method and traditional single-ball method. We also performed the packing experiment of filaments and membranes to demonstrate the simulation ability of deformable structures. Our method can be used to do a benchmark by generating large labeled cryo-ET dataset and evaluating existing image processing methods. Since the content of the simulated cryo-ET is more complex and crowded compared with previous ones, it will pose a greater challenge to existing image processing methods.« less
  2. Abstract Background Cryo-EM data generated by electron tomography (ET) contains images for individual protein particles in different orientations and tilted angles. Individual cryo-EM particles can be aligned to reconstruct a 3D density map of a protein structure. However, low contrast and high noise in particle images make it challenging to build 3D density maps at intermediate to high resolution (1–3 Å). To overcome this problem, we propose a fully automated cryo-EM 3D density map reconstruction approach based on deep learning particle picking. Results A perfect 2D particle mask is fully automatically generated for every single particle. Then, it uses a computer vision image alignment algorithm (image registration) to fully automatically align the particle masks. It calculates the difference of the particle image orientation angles to align the original particle image. Finally, it reconstructs a localized 3D density map between every two single-particle images that have the largest number of corresponding features. The localized 3D density maps are then averaged to reconstruct a final 3D density map. The constructed 3D density map results illustrate the potential to determine the structures of the molecules using a few samples of good particles. Also, using the localized particle samples (with no background) to generate themore »localized 3D density maps can improve the process of the resolution evaluation in experimental maps of cryo-EM. Tested on two widely used datasets, Auto3DCryoMap is able to reconstruct good 3D density maps using only a few thousand protein particle images, which is much smaller than hundreds of thousands of particles required by the existing methods. Conclusions We design a fully automated approach for cryo-EM 3D density maps reconstruction (Auto3DCryoMap). Instead of increasing the signal-to-noise ratio by using 2D class averaging, our approach uses 2D particle masks to produce locally aligned particle images. Auto3DCryoMap is able to accurately align structural particle shapes. Also, it is able to construct a decent 3D density map from only a few thousand aligned particle images while the existing tools require hundreds of thousands of particle images. Finally, by using the pre-processed particle images,Auto3DCryoMap reconstructs a better 3D density map than using the original particle images.« less
  3. Abstract

    Topological data analysis (TDA) is a tool from data science and mathematics that is beginning to make waves in environmental science. In this work, we seek to provide an intuitive and understandable introduction to a tool from TDA that is particularly useful for the analysis of imagery, namely, persistent homology. We briefly discuss the theoretical background but focus primarily on understanding the output of this tool and discussing what information it can glean. To this end, we frame our discussion around a guiding example of classifying satellite images from the sugar, fish, flower, and gravel dataset produced for the study of mesoscale organization of clouds by Rasp et al. We demonstrate how persistent homology and its vectorization, persistence landscapes, can be used in a workflow with a simple machine learning algorithm to obtain good results, and we explore in detail how we can explain this behavior in terms of image-level features. One of the core strengths of persistent homology is how interpretable it can be, so throughout this paper we discuss not just the patterns we find but why those results are to be expected given what we know about the theory of persistent homology. Our goal is that readersmore »of this paper will leave with a better understanding of TDA and persistent homology, will be able to identify problems and datasets of their own for which persistent homology could be helpful, and will gain an understanding of the results they obtain from applying the included GitHub example code.

    Significance Statement

    Information such as the geometric structure and texture of image data can greatly support the inference of the physical state of an observed Earth system, for example, in remote sensing to determine whether wildfires are active or to identify local climate zones. Persistent homology is a branch of topological data analysis that allows one to extract such information in an interpretable way—unlike black-box methods like deep neural networks. The purpose of this paper is to explain in an intuitive manner what persistent homology is and how researchers in environmental science can use it to create interpretable models. We demonstrate the approach to identify certain cloud patterns from satellite imagery and find that the resulting model is indeed interpretable.

    « less
  4. Abstract Motivation

    Cryo-Electron Tomography (cryo-ET) is a 3D imaging technology that enables the visualization of subcellular structures in situ at near-atomic resolution. Cellular cryo-ET images help in resolving the structures of macromolecules and determining their spatial relationship in a single cell, which has broad significance in cell and structural biology. Subtomogram classification and recognition constitute a primary step in the systematic recovery of these macromolecular structures. Supervised deep learning methods have been proven to be highly accurate and efficient for subtomogram classification, but suffer from limited applicability due to scarcity of annotated data. While generating simulated data for training supervised models is a potential solution, a sizeable difference in the image intensity distribution in generated data as compared with real experimental data will cause the trained models to perform poorly in predicting classes on real subtomograms.


    In this work, we present Cryo-Shift, a fully unsupervised domain adaptation and randomization framework for deep learning-based cross-domain subtomogram classification. We use unsupervised multi-adversarial domain adaption to reduce the domain shift between features of simulated and experimental data. We develop a network-driven domain randomization procedure with ‘warp’ modules to alter the simulated data and help the classifier generalize better on experimental data. We do notmore »use any labeled experimental data to train our model, whereas some of the existing alternative approaches require labeled experimental samples for cross-domain classification. Nevertheless, Cryo-Shift outperforms the existing alternative approaches in cross-domain subtomogram classification in extensive evaluation studies demonstrated herein using both simulated and experimental data.

    Availabilityand implementation

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    « less
  5. Introduction: Vaso-occlusive crises (VOCs) are a leading cause of morbidity and early mortality in individuals with sickle cell disease (SCD). These crises are triggered by sickle red blood cell (sRBC) aggregation in blood vessels and are influenced by factors such as enhanced sRBC and white blood cell (WBC) adhesion to inflamed endothelium. Advances in microfluidic biomarker assays (i.e., SCD Biochip systems) have led to clinical studies of blood cell adhesion onto endothelial proteins, including, fibronectin, laminin, P-selectin, ICAM-1, functionalized in microchannels. These microfluidic assays allow mimicking the physiological aspects of human microvasculature and help characterize biomechanical properties of adhered sRBCs under flow. However, analysis of the microfluidic biomarker assay data has so far relied on manual cell counting and exhaustive visual morphological characterization of cells by trained personnel. Integrating deep learning algorithms with microscopic imaging of adhesion protein functionalized microfluidic channels can accelerate and standardize accurate classification of blood cells in microfluidic biomarker assays. Here we present a deep learning approach into a general-purpose analytical tool covering a wide range of conditions: channels functionalized with different proteins (laminin or P-selectin), with varying degrees of adhesion by both sRBCs and WBCs, and in both normoxic and hypoxic environments. Methods: Our neuralmore »networks were trained on a repository of manually labeled SCD Biochip microfluidic biomarker assay whole channel images. Each channel contained adhered cells pertaining to clinical whole blood under constant shear stress of 0.1 Pa, mimicking physiological levels in post-capillary venules. The machine learning (ML) framework consists of two phases: Phase I segments pixels belonging to blood cells adhered to the microfluidic channel surface, while Phase II associates pixel clusters with specific cell types (sRBCs or WBCs). Phase I is implemented through an ensemble of seven generative fully convolutional neural networks, and Phase II is an ensemble of five neural networks based on a Resnet50 backbone. Each pixel cluster is given a probability of belonging to one of three classes: adhered sRBC, adhered WBC, or non-adhered / other. Results and Discussion: We applied our trained ML framework to 107 novel whole channel images not used during training and compared the results against counts from human experts. As seen in Fig. 1A, there was excellent agreement in counts across all protein and cell types investigated: sRBCs adhered to laminin, sRBCs adhered to P-selectin, and WBCs adhered to P-selectin. Not only was the approach able to handle surfaces functionalized with different proteins, but it also performed well for high cell density images (up to 5000 cells per image) in both normoxic and hypoxic conditions (Fig. 1B). The average uncertainty for the ML counts, obtained from accuracy metrics on the test dataset, was 3%. This uncertainty is a significant improvement on the 20% average uncertainty of the human counts, estimated from the variance in repeated manual analyses of the images. Moreover, manual classification of each image may take up to 2 hours, versus about 6 minutes per image for the ML analysis. Thus, ML provides greater consistency in the classification at a fraction of the processing time. To assess which features the network used to distinguish adhered cells, we generated class activation maps (Fig. 1C-E). These heat maps indicate the regions of focus for the algorithm in making each classification decision. Intriguingly, the highlighted features were similar to those used by human experts: the dimple in partially sickled RBCs, the sharp endpoints for highly sickled RBCs, and the uniform curvature of the WBCs. Overall the robust performance of the ML approach in our study sets the stage for generalizing it to other endothelial proteins and experimental conditions, a first step toward a universal microfluidic ML framework targeting blood disorders. Such a framework would not only be able to integrate advanced biophysical characterization into fast, point-of-care diagnostic devices, but also provide a standardized and reliable way of monitoring patients undergoing targeted therapies and curative interventions, including, stem cell and gene-based therapies for SCD. Disclosures Gurkan: Dx Now Inc.: Patents & Royalties; Xatek Inc.: Patents & Royalties; BioChip Labs: Patents & Royalties; Hemex Health, Inc.: Consultancy, Current Employment, Patents & Royalties, Research Funding.« less