<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Auto3DCryoMap: an automated particle alignment approach for 3D cryo-EM density map reconstruction</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>12/01/2020</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10244177</idno>
					<idno type="doi">10.1186/s12859-020-03885-9</idno>
					<title level='j'>BMC Bioinformatics</title>
<idno>1471-2105</idno>
<biblScope unit="volume">21</biblScope>
<biblScope unit="issue">S21</biblScope>					

					<author>Adil Al-Azzawi</author><author>Anes Ouadou</author><author>Ye Duan</author><author>Jianlin Cheng</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Abstract                          Background              Cryo-EM data generated by electron tomography (ET) contains images for individual protein particles in different orientations and tilted angles. Individual cryo-EM particles can be aligned to reconstruct a 3D density map of a protein structure. However, low contrast and high noise in particle images make it challenging to build 3D density maps at intermediate to high resolution (1–3Å). To overcome this problem, we propose a fully automated cryo-EM 3D density map reconstruction approach based on deep learning particle picking.                                      Results              A perfect 2D particle mask is fully automatically generated for every single particle. Then, it uses a computer vision image alignment algorithm (image registration) to fully automatically align the particle masks. It calculates the difference of the particle image orientation angles to align the original particle image. Finally, it reconstructs a localized 3D density map between every two single-particle images that have the largest number of corresponding features. The localized 3D density maps are then averaged to reconstruct a final 3D density map. The constructed 3D density map results illustrate the potential to determine the structures of the molecules using a few samples of good particles. Also, using the localized particle samples (with no background) to generate the localized 3D density maps can improve the process of the resolution evaluation in experimental maps of cryo-EM. Tested on two widely used datasets, Auto3DCryoMap is able to reconstruct good 3D density maps using only a few thousand protein particle images, which is much smaller than hundreds of thousands of particles required by the existing methods.                                      Conclusions              We design a fully automated approach for cryo-EM 3D density maps reconstruction (Auto3DCryoMap). Instead of increasing the signal-to-noise ratio by using 2D class averaging, our approach uses 2D particle masks to produce locally aligned particle images. Auto3DCryoMap is able to accurately align structural particle shapes. Also, it is able to construct a decent 3D density map from only a few thousand aligned particle images while the existing tools require hundreds of thousands of particle images. Finally, by using the pre-processed particle images,Auto3DCryoMap reconstructs a better 3D density map than using the original particle images.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>particle images. Finally, by using the pre-processed particle images,Auto3DCryoMap reconstructs a better 3D density map than using the original particle images.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Background</head><p>Cryo-EM (electron microscopy) has emerged as a major method for determining the structures of proteins, particularly large ones <ref type="bibr">[1]</ref>. It freezes purified proteins in solutions and then uses the electron microscope to image the frozen film <ref type="bibr">[2]</ref>. Typically, the cryo-EM method does not require a crystallization step. It can be applied to a wide range of proteins. During the cryo-EM, many 2D images of proteins are produced in varying orientations <ref type="bibr">[3]</ref>. The 2D images are classified or clustered corresponding to the same orientation first. Then, to improve the signal-to-noise-ratio, the 2D images are aligned and averaged. Finally, the 3D volume (density map) is produced from the averaged 2D images <ref type="bibr">[4]</ref>. A density map is a 3D grid in which each point has a certain density value. The density value reflects the electron density based on the corresponding point in the 3D space. Hundreds of thousands of the particle images (2D) are required to build and produce a 3D density maps of good quality <ref type="bibr">[5,</ref><ref type="bibr">6]</ref>. EMAN2 <ref type="bibr">[7]</ref>, RELION <ref type="bibr">[8]</ref>, and SPIDER <ref type="bibr">[9]</ref> are the popular methods developed for 3D cryo-EM map reconstruction. An initial 3D model is required for these methods to build a decent 3D density map in addition to the manual particle picking issue.</p><p>In our approach, fully automated 3D density maps are constructed with no need for an initial 3D model. First, the set of particles is fully automatically picked, isolated, and selected as "good" and "bad" samples using our previous model DeepCryoPicker <ref type="bibr">[10]</ref>. Second, the first stage of Auto3DCryoMap is designed to fully automate particle alignment. In our approach, instead of using the averaging process to summarize the similar particles in case of enhancing the contrast by increasing the signal-to-noise-ratio that helps in the alignment process and identifying bad or unwanted images (using a reference model), we use a new strategy. Our approach is based on using the unsupervised learning approach to generate a perfect binary mask (circular and square) for the top and side-view of particles. We design two fully automated approaches. The first one is designed fully automatically to align the binary mask of the square particle's mask using the intensity-based image registration. Then, we project the angle's difference between the original particle's mask and the aligned one on the original particle. The second approach is designed for fully automated circular particle centralization. We used the same idea of the binary mask generation to produce a perfect binary circular mask for each particle. Then, our approach constructs the same center of each particle's mask. A new particle dimension is reconstructed from the same center to build the same particle dimension. Finally, the second stage of the Auto3DCryoMap is designed for fully automated 3D density map reconstruction. Instead of using the common line or reference-based method for the 3D classification in addition to the pre-aligned steps that are required during the 3D construction step, we used a new approach that comprehends both. We designed a fully automated approach to build a localized 3D density map between every two aligned particles. First, the original particles are aligned using the intensity-based of the original particles not the binary mask for perfect alignment. Then, the 3D locations of the local matched points (corresponding) are estimated and calculated. Finally, the localized 3D density maps are projected together to produce the final 3D density map for each protein molecule.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Methods</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Overview of the Auto3DCryoMap workflow</head><p>We design Auto3DCryoMap-a fully automated 3D cryo-EM density map reconstruction method based on deep learning and unsupervised learning approaches. It is designed to reconstruct a 3D density map of a single protein from its cryo-EM image/ micrograph data (see Fig. <ref type="figure">1b,</ref><ref type="figure">g</ref>) for examples from Apoferritin <ref type="bibr">[11]</ref> and KLH <ref type="bibr">[12]</ref> datasets). The workflow of the Auto3DCryoMap framework is shown in Fig. <ref type="figure">1a</ref>. The workflow has five components described in detail as follows.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Component 1: Micrograph pre-processing</head><p>In this component, a set of pre-processing steps that were proposed in our last three models AutoCryoPicker <ref type="bibr">[13]</ref>, SuperCryoEMPicker <ref type="bibr">[14]</ref>, and DeepCryoPicker <ref type="bibr">[10]</ref> are used to improve the quality of the cryo-EM images and accommodate the low-SNR images. The preprocessing steps increase the particle's intensity and group the pixels inside each particle to make it easier to be isolated.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Component 2: Fully automated single particle picking</head><p>The particles are first detected and picked using the DeepCryoPicker <ref type="bibr">[10]</ref> and are then projected back onto their original ones to pick two versions of particles (original particle and the preprocessed one). Figure <ref type="figure">2b</ref>, f show the whole micrograph particle picking results using different datasets (Apoferritin <ref type="bibr">[11]</ref> and KLH <ref type="bibr">[12]</ref> datasets). Figure <ref type="figure">2b,</ref><ref type="figure">c,</ref><ref type="figure">g</ref> show the original versions of the picked particles while Fig. <ref type="figure">2d</ref>, e, h show the preprocessed versions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Component 3: Fully automated perfect 2D particle selection</head><p>This component is designed to select good 2D particle images and generate a 2D particle mask for 2D particle alignment. This component is proposed in our last model DeepCryoPicker <ref type="bibr">[10]</ref>. Each original particle image is evaluated and selected based on its particle's mask using fully automated perfect 2D side-view particles selection algorithm (the details of Additional file 1: Algorithm S1). First, the original side-view particles are picked using the KLH dataset <ref type="bibr">[12]</ref> (see an example in Fig. <ref type="figure">3a</ref>). The preprocessed versions of the original side-view particle are used to generate initial binary masks using the intensity-based clustering algorithms (IBC) <ref type="bibr">[13]</ref> (see an example in Fig. <ref type="figure">3b,</ref><ref type="figure">c</ref>). Then, the Feret diameter is applied on the initial binary masks to get the perfect side-view dimensions and generate perfect side-view binary masks (see Fig. <ref type="figure">3d</ref>). It is noticed that the result is not quite accurate and the generated binary masks are not perfect (see Fig. <ref type="figure">3e</ref>). For this reason, the initial binary masks are cleaned using fully automated 2D particle binary mask cleaning and post-processing algorithms (the details of Additional file 1: Algorithm S2). In this case, the Perfect Feret diameter detection produces perfect sideview binary masks (see Fig. <ref type="figure">3g,</ref><ref type="figure">h</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Step 1: Perfect "good" 2D side-view particle selection</head><p>This step is designed to select perfect 2D side-view particles (square shapes). It is based on using the individual binary mask of each particle that picked from the KLH <ref type="bibr">[12]</ref> datasets as shown in Fig. <ref type="figure">3c</ref>. First, a binary mask of each clustered particle image is cleaned by removing the small and irrelevant objects (Fig. <ref type="figure">3f</ref> ). The cleaned particle's binary images contain almost only the square objects (side view particles). The connected components of each object are identified. Some artifact objects are removed. Finally, bounding boxes are drawn around each object, including a list of pixel locations using the Feret diameter measures approach <ref type="bibr">[15]</ref> (see Fig. <ref type="figure">3g</ref>). The details are described in Additional file 1: Algorithm S1. Also, the particle binary mask post-processing for particle image cleaning, small object, and irrelevant removal algorithm is shown in Additional file 1: Algorithm S2. An example of illustrating the input, intermediate results, and final output of the algorithm (perfect binary mask generation for the side-view particle) is shown in Fig. <ref type="figure">3</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Step 2: Perfect "good" 2D top-view particle selection</head><p>This step is designed to select the top-view particles (circular shapes). It is based on using the individual binary mask of each particle (see Fig. <ref type="figure">4c</ref>, k for two different topview particles picked from different datasets (Apoferritin <ref type="bibr">[11]</ref> and KLH <ref type="bibr">[12]</ref> datasets).  <ref type="bibr">[11]</ref> and KLH <ref type="bibr">[12]</ref>). b, c The original KLH particle picking results. d, e The preprocessed KLH particle picking results. g The original Apoferritin particle picking results. h The Apoferritin preprocessed particle picking results Fig. <ref type="figure">3</ref> Perfect "good" 2D side-view particle selection-based fully automated perfect 2D binary mask generation. a Original side-view particle image that is picked by the DeepCryoPicker <ref type="bibr">[10]</ref> using the KLH dataset <ref type="bibr">[12]</ref>. b The preprocessed version of the original side-view particle. c Initial binary mask of the b using the IBC clustering algorithm in AutoCryoPicker <ref type="bibr">[13]</ref>. d Feret diameter detection of c. e Initial 2D binary mask generation of a. f Postprocessing version of c. g Perfect Feret diameter detection using f. h Perfect binary mask generation</p><p>The details of the method are described in Additional file 1: Algorithm S3. An example of illustrating the input, intermediate results, and final output of the method (perfect binary mask generation for the top-view particle) is shown in Fig. <ref type="figure">4</ref>. The preprocessed version of each particle (see Fig. <ref type="figure">4d,</ref><ref type="figure">j</ref>) is used to produce the initial clustering masks (see Fig. <ref type="figure">4c,</ref><ref type="figure">k</ref>) using the IBC clustering algorithm <ref type="bibr">[13]</ref>.Then, a cleaned binary mask of each particle image is produced by removing the small and irrelevant objects (Fig. <ref type="figure">4d,</ref><ref type="figure">l</ref>). The outer and inner circular mask are extracted (see Fig. <ref type="figure">4e,</ref><ref type="figure">f,</ref><ref type="figure">m,</ref><ref type="figure">n</ref>) to produce filled circular binary masks (see Fig. <ref type="figure">4g,</ref><ref type="figure">o</ref>. Finally, perfect top-view binary masks are generated using the center and the artificial dimeter of the modified CHT algorithm <ref type="bibr">[13]</ref> (see Fig. <ref type="figure">4h,</ref><ref type="figure">p</ref>).</p><p>Once the particles are picked and selected, the perfect 2D mask for each particle is generated, the fully automated single-particle alignment is performed to perfectly Fig. <ref type="figure">4</ref> Perfect "good" 2D top-view particle selection-based fully automated perfect 2D binary mask generation. a, i Two original top-view particles that are fully automated picked using the DeepCryoPicker <ref type="bibr">[10]</ref>, Apoferritin <ref type="bibr">[11]</ref>, and KLH dataset <ref type="bibr">[12]</ref>. b, j The preprocessed versions of the original top-view particles of a, i respectively. c, k The initial clustering results of the b, j using the IBC clustering algorithm in AutoCryoPicker <ref type="bibr">[13]</ref>. d, l The cleaned circular clustered images of c, k respectively. e, m The outer circular mask extraction of the d, l respectively. f, n The inner circular mask of e, m respectively. g, o The filled circular binary masks of f, n respectively. h, p Perfect top-view binary mask generation of g, o respectively align the particle images. This component consists of two stages: (1) Stage 1: fully automated side-view particle alignment; (2) Stage 2: full automated top-view (circular) particle alignment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Step 3: Fully automated 2D side-view particle alignment</head><p>This step is designed to fully automated side-views particles. Particle alignment relies on placing the particle into a similar orientation <ref type="bibr">[16]</ref>. Based on the relative plane of the two images, particles are shifted by [x, y] or/and rotated by (&#981;) . Technically, image align- ment needs to determine the correlation parameters [x, y, &#981;] to map the images perfectly <ref type="bibr">[17]</ref><ref type="bibr">[18]</ref><ref type="bibr">[19]</ref>. Image registration aims to geometrically estimate, and match two images based on different viewpoints <ref type="bibr">[20,</ref><ref type="bibr">21]</ref>.</p><p>Mathematically, the image registration is based on finding the best geometrical transformation that matches the same points on two images. Assume that I F x, y is the fixed or the reference image and I M x, y is the moving image (an image that needs to be aligned). The mathematical approach to estimate the geometrical transformation for the image registration T x, y is based on Eq. ( <ref type="formula">1</ref>) <ref type="bibr">[22]</ref>: Such that using the estimated geometrical transformation T x, y to register the moving image I M x, y to produce the close image I c x, y using the following Eq. ( <ref type="formula">2</ref>) <ref type="bibr">[23]</ref>: Thus, the image registration can be formatted as a maximization problem-based optimizer function that is shown in Eq. (3) <ref type="bibr">[23]</ref>:</p><p>where T opt denotes as the optimal geometrical transformation for I R x, y and I M x, y matching based on the selected metric of the measurement similarity (S) among the spe- cific transformation (T ) . Finally, the geometrical transformation T x, y follows the 2D parametric model to estimate the continuous bivariate function to estimate the certain regularity conditions <ref type="bibr">[20,</ref><ref type="bibr">21]</ref> based on the following Eq. (4) <ref type="bibr">[23,</ref><ref type="bibr">24]</ref>: where &#65533;x, &#65533;y, &#65533;&#966; are the three geometrical (motional) parameters and &#945; is the geo- metrical scaling parameter.</p><p>Intensity-based image registration is an image registration process which is based on the intensity image similarity to define the 2D geometrical transformation for minimizing or maximizing the similarity metric <ref type="bibr">[25]</ref>. It is based on the estimation of the internal geometrical transformation matrix T x, y after applying the image transformation (bilinear inter- polation <ref type="bibr">[26]</ref>) on the two images I F x, y and I M x, y . The idea from applying the bilinear interpolation on both images is that the bilinear interpolation <ref type="bibr">[26]</ref> is one of the resampling techniques (image scaling) on the computer vision and the image processing which (1)</p><p>transform that image to a specific transformation (T ).Then, the measurement similarity (S) of the transformed images is used to estimate the geometrical transformation T x, y using a mean square error (MSE) as a confirmation metric that is used to measure the similar (S) between the two transformed images I R x, y and I M x, y based on the following Eq. ( <ref type="formula">5</ref>) <ref type="bibr">[27,</ref><ref type="bibr">28]</ref>:</p><p>Then, the regular step gradient descent optimization <ref type="bibr">[29]</ref> is used to estimate the optimizer parameters. The regular step gradient descent optimization <ref type="bibr">[29]</ref> uses to adjust the geometrical transformation parameters by following the gradient of the image similarity metric in the direction of the extrema <ref type="bibr">[30]</ref>. Equation <ref type="bibr">(9)</ref> shows the typical form of gradient descent optimization used to estimate the image registration optimizer <ref type="bibr">[30]</ref>.</p><p>where &#947; &#8711;F X &#951; gradient factor that is a subtraction from X 0 to make it move to the global minimum (stop condition), and X 0 is the local minimum of the main function F which is in our case the similarity metric (S) . Finally, the image registration function maps each point in the moving image I M x, y into the corresponding point in the refer- ence image I R x, y based on the estimated correlation parameters [x, y, &#981;] from the simi- larity metric and optimizer functions.</p><p>Typically, a different geometrical transformation can be used to register the two images such as translation, scaling, rotation, and affine transformation as is shown in Eqs. ( <ref type="formula">7</ref>), <ref type="bibr">(8)</ref>, and (9) <ref type="bibr">[31]</ref>.</p><p>Using the scaling transformation, the new point P x, y is scaled along x and y axis to a new point P x &#8242; , y &#8242; (see Eqs. ( <ref type="formula">10</ref>) and ( <ref type="formula">11</ref>)) by multiplying x and y by the scaling factors S x and S y (see Eq. ( <ref type="formula">12</ref>)) <ref type="bibr">[32]</ref>: By using the rotational transformation, the new point P x, y is rotated around the origin to a new point P x &#8242; , y &#8242; by an angle &#952; (see Eqs. ( <ref type="formula">13</ref>), <ref type="bibr">(14)</ref>, and ( <ref type="formula">15</ref>)) <ref type="bibr">[33]</ref>:</p><p>In some cases, the translation and rotation are not enough. However, the scaling is necessary to correct the transformation of the point in the I M x, y . Therefore, the affine transformation scales the translation and rotational points (see Eq. ( <ref type="formula">16</ref>)) based on using the two-dimensional shear transformation as is shown in Eq. ( <ref type="formula">17</ref>) <ref type="bibr">[34]</ref>:</p><p>where a and b are the proportionality constants along axis x and y , respectively <ref type="bibr">[36]</ref>. Since in the second stage of the third component of our Auto3DCryoMap framework "fully automated perfect 2D particles-selection, " which is "stage 2: fully automated 2D particle mask generation based unsupervised learning approach", perfect binary masks are generated, we propose a fully automated approach for perfect side-view particle alignment-based automatic intensity-based Image registration using the perfect generated particle binary masks. In terms of the fully automated approach, we use the binary masks instead of the original particle images for two reasons. The first one is it is easier to automatically generate a reference image than manually select one. Second, it is easier to find the correlation points (corresponding corners) in the generated mask than the original particle image since the signal-to-noise ratio is very low-intensity value. The main steps to do the fully automated particle alignment-based intensity image registration using the perfect generated binary masks are as follows: First, we calculate the average binary particle object sizes and generate an artifice frontal view reference image (side-view) particle as is shown in Fig. <ref type="figure">5a</ref>. Second, for each particle image, we use the original particle image and the generated binary mask as is shown in Fig. <ref type="figure">5b,</ref><ref type="figure">c</ref>. Then, we use intensity-based automated image registration to align the perfect generated binary mask of each particle based on the generated reference binary mask (frontal view) using deferent geometrical transformation (see Fig. <ref type="figure">5e-i</ref>). After the perfect alignment is done, we extract the angles of both the aligned object and the original mask &#952; orginal and &#952; aligned which is the angle between the x-axis and the major axis of the object that has the same second-moments as the region (see Fig. <ref type="figure">5j,</ref><ref type="figure">k</ref>). Then, we extract the orientation angle &#952; orentaion based the on the difference between the aligned angle and the original angle. Finally, we use the orientation angle &#952; orentaion to rotate the original particle image as is shown in Fig. <ref type="figure">5l</ref>. <ref type="bibr">(13)</ref> x</p><p>To improve the SNR, class averaging is used over fewer particles to conduct the resolution. The researchers used to manually pick particles and do the 2D image averaging to remove some false positive particles from the whole data. Instead of using the extraction of the individual particle from micrograph background that is proposed in RELION <ref type="bibr">[8]</ref>, which uses a manually user-defined radius, a circle (normalization procedure), to extract each particle image in a background area (outside the circle) and a particle area (inside the same circle) <ref type="bibr">[35]</ref> and do the image averaging, we proposed a localized image averaging approach. The localized particle image is generated based on using the binary mask for each particle as is shown in Fig. <ref type="figure">6</ref>. In this case, the original aligned particle image (see Fig. <ref type="figure">6a</ref>) is multiplied by the perfect 2D aligned mask (see Fig. <ref type="figure">6b</ref>) to generate the perfect 2D localized particle images (see Fig. <ref type="figure">6d</ref>). The fully automated side-view particle alignment-based intensity image registration and particle masks are described in Additional file 1: Algorithm S4 and the whole framework of the fully automated side-view particle alignment-based intensity image registration is illustrated in Additional file 1: Figure <ref type="figure">S1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Step 4: Fully automated 2D top-view particle alignment</head><p>This step is designed to align the other common type of the particle images top-view (circular shapes). The top-view particle images are aligned based on centralizing all particles together on the same point which is a common way to align the circular particle. Since one particle (original form) might be heavy noisy compared to another one, it is very hard to find the same center point to centralize them. Also, circular particles can be miss centered, especially those that have a hollow in the particle ring. To come up with a perfect fully alignment approach, we propose a localized top-view centralization approach for top-view (circular) particle alignment. First, we used the localization approach that is applied to the side-view particle images to produce localized top-view particle images (see Fig. <ref type="figure">7d</ref>).</p><p>Second, it is more accurate to find the same center point (center of the binary circle) and the ring hollow is not a part of the centralization issue anymore. The center of the circular object (binary) is defined as an average of all points in the circular shape <ref type="bibr">[36]</ref>. Suppose that that circular shape consists of n points x 1 , x 2 . . . , x n (white pixels) as is shown in Fig. <ref type="figure">8a,</ref><ref type="figure">e</ref>. The centroid (center) of the circular white (binary) object is defined based on the Eq. ( <ref type="formula">18</ref>) <ref type="bibr">[36]</ref>:  <ref type="bibr">[11]</ref> and KLH <ref type="bibr">[12]</ref> datasets. b, f Perfect 2D top-view particle binary mask of a, b respectively using the modified CHT algorithm <ref type="bibr">[13]</ref> and the IBC <ref type="bibr">[13]</ref>. c, g Perfect 2D top-view particles projection results. d, h Centralized top-view of the localized particle alignment result</p><p>In our case, we use the modified CHT <ref type="bibr">[13]</ref> to extract the exact center point of each particle's mask as is shown in Fig. <ref type="figure">8b,</ref><ref type="figure">f</ref>. Then, from the extracted center point we draw a new candidate box the takes the same dimension ( x width , x hights ) . New bounding boxes are drawn around each top-view region (rectangle region area) after increasing each object center x, y using the same factor value and calculate the bounding boxes dimen- sions ( x width , x hights ) (see Fig. <ref type="figure">8c,</ref><ref type="figure">g</ref>). This approach allows the particles that have hollows (rings) to be accurately aligned based on the same particle mask extracted center (see Fig. <ref type="figure">8d,</ref><ref type="figure">h</ref>). Centralized based particle alignment-based perfect binary mask generation allows the particles to be placed (aligned) in the same point (center) which will help the 3D map reconstruction to overlap the particles in which they need to be aligned, shifted in the plane.</p><p>The extracted correlation point (center) that is determined based on the extraction of the center of the perfect binary mask (see Fig. <ref type="figure">9b</ref>) allows the particle to be shifted along with the fixed center point (see Fig. <ref type="figure">9c</ref>). In this case, all the particles are cross correlated to each other and shifted as necessary to the same center point. Figure <ref type="figure">9</ref> shows an example of two top-view particles from two different datasets before and after the centralized alignment-based perfect generated binary mask. The particle image is shifted as best as possible to be centrally aligned. The fully automated approach for a centralized top-view particle alignment-based particle mask is shown in Additional file 1: Algorithm S5 and illustrated in Additional file 1: Figure <ref type="figure">S2</ref>. The basic idea of reconstructing the 3D density map based cryo-EM is to project the density depth of several thousands of 2D cryo-EM particles. The Fourier coefficient-based Fourier Transformation (FT) <ref type="bibr">[37]</ref> is used to represent the 2D particles in another space (Fourier space), in which the structure of each 2D particle is represented in Fourier coefficients. In this case, the Fourier synthesis is used to reconstruct the 3D density map through its 3D Fourier transformation based on the direction of the projection <ref type="bibr">[7,</ref><ref type="bibr">38,</ref><ref type="bibr">39]</ref>. This approach requires a huge number of 2D particles to build such significant Fourier coefficients that represent the particle object structure.</p><p>To come up with the same descent 3D density map using a smaller amount of 2D particle images, we propose a new localized approach for 3D density map reconstruction based on the 2D shape appearance of the real object. Localized based 3D density map reconstruction approach bases on extracting the structural information-based particle object from every two 2D particle images <ref type="bibr">[39]</ref>.</p><p>Structural based motion information is a process that estimates the 3D structure (3D matrix) from a set of 2D images <ref type="bibr">[40]</ref>. Different steps are implemented in this component to achieve the 3D density map reconstruction-based particle structural motion information. First, match the sparse set of points between every two 2D particle images based on perfect 2D image alignment. Second, estimate the fundamental matrix (3D matrix). Third, track a dense set of points between the two images that illustrate the estimated structure of the object (particle in the 3D). Then, determine the 3D locations of the matched points using triangulation. Finally, recover the actual 3D map based metric reconstruction. The 3D density map, in this case, is building based on only the first two particle images. In the end, the average of all localized 3D Fig. <ref type="figure">9</ref> Comparing the localized 2D particle image results before and after the perfect top-view particle centralized based particle image alignment. a, e Localized 2D top-view particle images from the KLH <ref type="bibr">[12]</ref> and Apoferritin <ref type="bibr">[11]</ref> datasets before the centralized particle image alignment. b, f Localized 2D top-view particle binary masks of a, e before the centralized particle image alignment. c, g Localized 2D top-view particle images of a, e after centralized based particle image alignment. (d, h) Localized 2D top-view particle masks of a, e after centralized based particle image alignment density map represents the final 3D density map. The whole framework of the localized 3D density map reconstruction is illustrated in Additional file 1: Figure <ref type="figure">S3</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Step 1: Perfect 2D particle alignment</head><p>In terms of matching a sparse set of points between the two 2D particle images, there are multiple ways of finding point correspondences between two 2D particle images by detecting corners in the first image and tracking them into the second image. In sideview protein particle shapes, we discover that some cases our final localized 2D particle images are not aligned perfectly which causes the miss-tracking of the detected points (see Fig. <ref type="figure">10a,</ref><ref type="figure">b</ref>).</p><p>To solve this issue, we use the same fully automated side-view particle alignment algorithm directly on the aligned particle images. Different transformation functions are used to perfectly align the two-particle images such as default alignment (initial registration) using affine transformation-based images scaling, rotation, and (possibly) shear (see Fig. <ref type="figure">11d</ref>). We can notice that the default image alignment (initial registration) is very good. Thus, there are still some poor regions that are not perfectly aligned. To improve the image alignment, we use the optimizer adjustment and metric configuration properties which control the initial step length (size) that is used to adjust the parameter space to refine the geometrical transformation (see Fig. <ref type="figure">11e</ref>). By increasing the maximum iteration number during the image registration process, that allows the image registration (alignment) to run longer and potentially to find significant registration results (see Fig. <ref type="figure">11f</ref> ). Image registration-based optimization works better than the initial registration. For this reason, we can improve the image alignment (registration) by starting with more complicated transformation such as 'rigid' <ref type="bibr">[41]</ref> than the transformation result uses as an initial registration model by using the affine transform (see Fig. <ref type="figure">11g</ref>). Another option is that the initial geometrical transformation is used to refine the image registration by using the affine transform with the similarity model. In this case, the refine model estimates the image registration result by including the shear transformation (see Fig. <ref type="figure">11h</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Step 2: Extract and match set of sparse points</head><p>After perfectly aligning all the particle images, the correlation points that will be extracted in this step will be accurately tracked because the interesting points are on Fig. <ref type="figure">10</ref> Localized 2D side-view particle image before the perfect 2D side-view particle alignment. a, b Two localized 2D particle images are not perfectly aligned the space. There are many ways to find the correlation (corresponding) points between two-particle images. To extract the corresponding points, the first particle image is used as a reference image and detects the corner points (features) using the minimum eigenvalue algorithm developed by Shi and Tomasi <ref type="bibr">[42]</ref> and MATLAB function 'detectMi-nEigenFeatures' (see Fig. <ref type="figure">12b</ref>). Then, the same extracted features (detected points) are tracked on the second image using the Kanade-Lucas-Tomasi (KLT), feature-tracking algorithm <ref type="bibr">[43]</ref><ref type="bibr">[44]</ref><ref type="bibr">[45]</ref> and MATLAB function "PointTracker" (see Fig. <ref type="figure">12d</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Step 3: 3D fundamental matrix estimation</head><p>The fundamental matrix is the estimated 3D matrix that relates to the corresponding points in two images <ref type="bibr">[46]</ref><ref type="bibr">[47]</ref><ref type="bibr">[48]</ref><ref type="bibr">[49]</ref><ref type="bibr">[50]</ref>. The normalized eight-point algorithm <ref type="bibr">[51]</ref> is used to  <ref type="bibr">[42]</ref>. c The second tested particle image. d Correlation points detection and tracking using Kanade-Lucas-Tomasi (KLT), feature-tracking algorithm <ref type="bibr">[43]</ref><ref type="bibr">[44]</ref><ref type="bibr">[45]</ref> estimate the 3D matrix based on using a list of corresponding points in every twoparticle image. The fundamental matrix is specified based on the following Eq. ( <ref type="formula">19</ref>) <ref type="bibr">[51]</ref>:</p><p>where P 1 is the point in the points list of the first image (list1) that is corresponding to P 2 which is the point in the point list of the second image (list2). The 3D Fundematal Matrix estimates the outlier's points based on using a random sample consensus algorithm (RANSAC) <ref type="bibr">[48]</ref>. To compute and estimate the 3D Fundematal Matrix different steps are implemented. First, the 3D Fundematal Matrix is initialized by producing a 3 &#215; 3 matrix of zeros F initail . Second, a loop counter, which iterated the whole process based on the specified number of trails, N is initialized. Each trail represents the estimated outlier points in the 3D matrix. For each iteration, 8 Paris points are randomly selected from each point list in the two images (corresponding points) in list1 and list2. Then, use the selected 8 points to compute the fitness function f of the 3D fundamental matrix F by using the normalized 8-point algorithm <ref type="bibr">[51]</ref> based on the following Eq. ( <ref type="formula">20</ref>) <ref type="bibr">[51]</ref>:</p><p>where y &#8242; and y are the corresponding selected points from list1 and list2 and F is the estimated 3D matrix, which can be similarly written as Eq. ( <ref type="formula">21</ref>) <ref type="bibr">[51]</ref>:</p><p>where f is denoted as the reshape version of 3D Fundematal Matrix F . After fitness func- tion f is computed based on the corresponding points in the two images, if the fitness function f is better than the 3D fundamental matrix F , the 3D fundamental matrix F is replaced with the fitness function f . Then, the random number of trails N for every iteration is updated based on the RANSAC algorithm using Eq. ( <ref type="formula">22</ref>) <ref type="bibr">[51]</ref>:</p><p>where p is denoted as the selected confidence parameters, and r is calculated based on the Eq. ( <ref type="formula">23</ref>) <ref type="bibr">[51]</ref>:</p><p>where sgn(du i v i , t) is the distance function that follows the following Eq. ( <ref type="formula">24</ref>) <ref type="bibr">[51]</ref>:</p><p>Two different types of distance (algebraic and Sampson) are used to measure the distance of pair points as Eqs. ( <ref type="formula">25</ref>) and <ref type="bibr">(26)</ref> show respectively <ref type="bibr">[51]</ref>: <ref type="bibr">(19)</ref> </p><p>where i is denoted as the index of the corresponding point and Fu T i 2 j is the square root of the jth entity in the Fu T i vector. The 3D fundamental matrix estimation algorithm is shown in Additional file 1: Algorithm S6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Step 4: Reconstruct the 3D matched points locations</head><p>In terms of building the localized 3D matrix (density map) using two corresponding images, the 3D locations of the matched points (corresponding) are estimated and calculated. In this step, a typical computer vision algorithm triangulation <ref type="bibr">[53]</ref> is used to estimate and calculate the 3D locations of the corresponding points in the 3D space using the estimated 3D the previous step.</p><p>In general, the triangulation algorithm <ref type="bibr">[36]</ref> refers to the process that a corresponding point between two images is determining in a 3D space <ref type="bibr">[36]</ref>. In another word, triangulation reconstructs the 3D data based on a theory that says each point in an image is corresponding to one single line in a 3D space <ref type="bibr">[52]</ref>. In this case, a set of images can be projected in a common 3D point X <ref type="bibr">[52]</ref>. The set of lines that are generated by the image points must intersect at the 3D point X . The algebra formulation of computing the 3D point X using the triangulation is showing in Eq. ( <ref type="formula">27</ref>) <ref type="bibr">[41]</ref>: where d is a distance function between the 3D line L &#8242; 1 and the 3D point x such that the X est reconstruction point that joins the two projected lines can be calculated using the mid-point method based on the Eq. ( <ref type="formula">29</ref>) <ref type="bibr">[53]</ref>:</p><p>The 3D reconstruction of the matched point locations algorithm is shown in Additional file 1: Algorithm S7.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Step 5: Metric reconstruction and 3D density map visualization</head><p>To visualize the localized 3D density map that is reconstructed based on the first two particle images, we use the MATLAB point cloud visualization function (pcshow) and plot the <ref type="bibr">(25)</ref> </p><p>point cloud of the first localized 3D density map of the single side-view protein as is shown in Fig. <ref type="figure">13</ref>. For instance, Fig. <ref type="figure">13a</ref> shows the density depth of the first localized 3D density map, while Fig. <ref type="figure">13b</ref> shows the view of the same localized 3D density map. In terms of computing the second localized 3D density map that is reconstructed between the second and the third particle images, we must reconstruct a new reference particle image as is shown in the main framework of the 3D density map reconstruction (see Figure <ref type="figure">S3</ref>). To reconstruct a new reference image that has important information (corresponding points) between the two images, we use the image fusion to gather the important information between the first two particle images, and we need to keep going for the rest of the particles.</p><p>In this case, we need to combine the first two particle images to be the reference image. Image fusion is the process to combine two images and inclusion into a new one image <ref type="bibr">[54]</ref>. The new image is more accurate and informative than the individual two images since it gathers the corresponding important points (necessary information) between them <ref type="bibr">[54]</ref>. The main purpose of doing image fusion is the linear blend <ref type="bibr">[54]</ref>. The traditional way (approach) is the linear blend <ref type="bibr">[55]</ref>. It combines the two images after converting them to grayscale images and normalized the pixels values in a way that the darkest pixel value is represented by 0 and the lightest one (brightness) is represented by 1 using image Z-score normalization as shown in Eq. ( <ref type="formula">30</ref>) <ref type="bibr">[56]</ref>: where x is the mean of the intensity pixel values, and &#963; is the standard deviation. Then, the image gradient is computed to detect the directional change in the intensity value of an image as is shown in Eq. ( <ref type="formula">31</ref>) <ref type="bibr">[56]</ref>.</p><p>(30) where &#8706;x &#8706;f and &#8706;y &#8706;f are the gradient in the x and y direction respectively. Then, the regions of the high special variance are combined across one image based on Eq. ( <ref type="formula">32</ref>) <ref type="bibr">[56]</ref>: An important image information based weighted matrix W is calculated. The weighted matrix combines the input gradients |G| and indicates the desired image output. The basic steps of the image fusion are described in Additional file 1: Algorithm S8. Figure <ref type="figure">14</ref> shows an example of image reference generation-based image fusion. Figure <ref type="figure">14a</ref>, b show the first image (reference) and second aligned particle images (moving). Figure <ref type="figure">14c</ref> shows the blended overlay fused particle image, by scaling the intensities of the reference image (a) and aligned moving image (b) jointly as a single data set. Figure <ref type="figure">14d</ref> visualized the fused blended (overlay) image using the red channel for the reference particle image, the green channel for the aligned moving image, and the yellow channel for the areas of similar intensity between the two images.</p><p>After the new particle reference image generation, the whole process for the second localized 3D density map reconstruction is repeated until the last particle image in the whole dataset is processed. Finally, we average the whole localized 3D density maps to produce the final 3D density map. The 3D reconstruction of the matched point locations algorithm is shown in Additional file 1: Algorithm S9. The whole pipeline of the localized 3D density map reconstruction is illustrated in Additional file 1: Figure <ref type="figure">S3</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Micrograph datasets</head><p>Images from two datasets (Apoferritin dataset and Keyhole Limpet Hemocyanin (KLH) dataset) are used to evaluate the method <ref type="bibr">[11]</ref>. Two common shapes of protein particles in cryo-EM images are circles and rectangles. Apoferritin dataset <ref type="bibr">[12]</ref> uses a multi-frame MRC image format <ref type="bibr">(32 Bit Float)</ref>. The size of each micrograph is 1240 by 1200 pixels. It consists of 20 micrographs each having 50 frames at 2 electrons/A^2/ frame, where the beam energy is 300 kV. The particle shape in this dataset is circular. The Keyhole Limpet Hemocyanin (KLH) dataset from the US National Resource for Automated Molecular Microscopy <ref type="bibr">[12]</ref>  of each micrograph is 2048 by 2048 pixels. It consists of 82 micrographs at 2.2 electrons/A^2/pixel, where the beam energy is 300 120 kV. There are two main types of projection views in this dataset: the top view (circular particle shape) and the side view (square particle shape). The KLH dataset <ref type="bibr">[12]</ref> is a standard test dataset for particle picking. The dataset is challenging because of different specimens (different particles) and confounding artifacts (ice contamination, degraded particles, particle aggregates, etc.).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Experiments of fully automated 2D different particle shapes alignment</head><p>The particle picking has been evaluated in our previous work DeepCryoPicker <ref type="bibr">[10]</ref>.</p><p>Here, we focus on evaluating particle alignment and density map reconstruction. The first experimental results of the perfect 2D particle image alignment is based on the perfect 2D mask (square and circular) shapes generation that is shown in Table <ref type="table">1</ref>. Some experimental results of the perfect 2D particle mask generation (square and circular) KLH <ref type="bibr">[12]</ref> and Apoferritin <ref type="bibr">[11]</ref> datasets are shown in Additional file 1: Figures S4 and S5. The fully automated side-view particle alignment results using intensity-based registration and perfect generated particle masks are shown in Additional file 1: Figure <ref type="figure">S6</ref>. Also, the experimental results of the regular 2D side-view particles alignment using the KLH dataset <ref type="bibr">[12]</ref> are shown in Additional file 1: Figure <ref type="figure">S6</ref>. Also, the experimental results of the localized 2D top and side-view particles alignment using the original and preprocessed particle images from different datasets (KLH <ref type="bibr">[12]</ref> and Apoferritin <ref type="bibr">[11]</ref>) are shown in Additional file 1: Figure <ref type="figure">S7</ref>, and S8 respectively. The average similarity metric (SSIM) for the fully automated single-particle alignment reaches to 99.819% using the adjusted initial radius image registration with maximum iteration number 300. The corresponding SSIM score for each particle is 100% which is the original view. When we aligned a certain particle, we calculate how much the similarity-based SSIM between the original view and the aligned one. The average SSIM scores for the fully automated single-particle alignment based on different approaches are shown in Table <ref type="table">2</ref>. Figure <ref type="figure">15</ref> illustrates different fully automated particles alignment methods using intensity-based image registration comparing with the corresponding SSIM scores on the original view. Figure <ref type="figure">15</ref> showing the average similarity scores corresponding to the original SSIM scores and the time consuming for each one. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Experiments on fully automated 3D density map reconstruction</head><p>Two different 3D density maps are reconstructed (top and side-view) <ref type="bibr">[39,</ref><ref type="bibr">[57]</ref><ref type="bibr">[58]</ref><ref type="bibr">[59]</ref> for two single protein molecules (Apoferritin <ref type="bibr">[11]</ref> and KLH <ref type="bibr">[12]</ref>). The first 3D density map is automatically reconstructed for the side-view protein molecules using the preprocessed particles from the KLH dataset <ref type="bibr">[12]</ref>. The other molecule for the KLH data <ref type="bibr">[12]</ref> is the top-view protein molecule. Figure <ref type="figure">16a</ref> shows the final 3D density map of the Apoferritin <ref type="bibr">[11]</ref> top-view protein molecule based on the preprocessed</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2 The average similarity metric scores (SSIM) for the fully automated single-particle alignment</head><p>Registration method 1 is based on the "default registration model" which registers the two particle images using affine transformation to solve the distortion between the two images includes scaling, rotation. Registration method 2 is based on the "adjusted initial radius model" which improves the particle image registration by adjusting the optimizer and metric configuration properties. Registration method3 is based on the "adjusted initial radius-based maximum iterations model" in which the optimizer controls the maximum number of iterations that the optimizer will be allowed to take. Also allows the registration search to run longer and potentially find better registration results. Registration method 4 is based on the "affine model-based on similarity initial condition model" which registers the particle images by using an "affine" transformation model with the "similarity" results used as an initial condition for the geometric transformation. This model is refined estimate for the registration includes the possibility of shear Fig. <ref type="figure">15</ref> The average similarity and the time consuming different particle alignment methods using the intensity-based image registration approach. The x-axes illustrate the average SSIM score (alignment similarity between each particle and its aligned version), and y-axes illustrate the different particle alignment methods using the intensity-based registration approach particle images. Figure <ref type="figure">16b</ref>, c shows the final 3D density map of the KLH <ref type="bibr">[12]</ref> top and side-view protein molecule based on using the preprocessed particle images.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Discussions</head><p>We compare the results from the Auto3DCryoMap with two state-of-the-art 3D particle picking and 3D density reconstruction tools-RELION 3.1 <ref type="bibr">[8]</ref> and EMAN 2.31 <ref type="bibr">[8]</ref>-on different molecular datasets (Apoferritin <ref type="bibr">[11]</ref> and KLH <ref type="bibr">[12]</ref>). RELION 3.1 <ref type="bibr">[8]</ref> initially picks and selects 1195 KLH side-view particles from 82 micrographs and removes 44 particles (see the summary of particle selection and structural analysis table in Fig. <ref type="figure">17c</ref>. The refinement reconstruction model by RELION 3.1 <ref type="bibr">[8]</ref> yields a 3D density map reconstruction at a resolution of ~ 2.215 &#197; according to the goldstandard FSC = 0.143 criterion <ref type="bibr">[8]</ref> (see Fig. <ref type="figure">17b</ref>). AutoCryo3DMap picks 1,146 KLH side-view particles and selects 1,089 particles (see the summary of particle selection and structural analysis table in Fig. <ref type="figure">17c</ref>). The preprocessed version of the selected particles is used for the fully automated alignment (see Fig. <ref type="figure">17d</ref>) yielding a cryo-EM structure of ~ 2.19 &#197; according to the gold-standard FSC = 0.143 criterion [64] (see Fig. <ref type="figure">17b</ref>), while EMAN 2.31 <ref type="bibr">[7]</ref> produces a 3D density map reconstruction of ~ 4.378 &#197; according to the gold-standard FSC = 0.143 <ref type="bibr">[8]</ref> (see Fig. <ref type="figure">17b</ref>). Moreover, for different molecular views, RELION 3.1 <ref type="bibr">[8]</ref> picks 33,660 particles and selects 24,640 good Apoferritin top-view particles from 279 micrographs (see the summary of particle selection and structural analysis table in Fig. <ref type="figure">18c</ref>). The refinement reconstruction model by <ref type="bibr">RELION 3.1 [8]</ref> yields a 3D density map reconstruction of Apoferritin at a resolution of 2.75 &#197; according to the FSC = 0.143 <ref type="bibr">[8]</ref> (see Fig. <ref type="figure">18b</ref>). AutoCryo3DMap automatically selects 32,818 good particles from 24,024 total Apoferritin top-view particles (see the summary of particle selection and structural analysis table in Fig. <ref type="figure">18e</ref>). AutoCryo3DMap constructs a top-view cryo-EM structure of 2.4 &#197;, while EMAN 2.31 <ref type="bibr">[7]</ref> yields a 3D density map reconstruction at a resolution of 3.51 &#197; according to the gold-standard FSC = 0.143 <ref type="bibr">[8]</ref> (see Fig. <ref type="figure">18f</ref> ). 3D density map reconstruction using the preprocessed localized alignment particle images from the KLH dataset respectively <ref type="bibr">[12]</ref>. c Final average top-view 3D density map reconstruction using the preprocessed localized alignment particle images from the Apoferritin dataset <ref type="bibr">[11]</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusions</head><p>We introduce Auto3DCryoMap, a fully automated approach for cryo-EM 3D density maps reconstruction-based deep supervised and unsupervised learning approaches. It uses the fully automated unsupervised learning algorithm ICB <ref type="bibr">[13]</ref> to generate 2D particle shapes that are used for the fully perfect particle alignment. Also, the perfect 2D particle images are used to produce localized aligned particle images. We show that the Auto3DCryoMap is able to accurately align top-view and side-view particle shapes. From only a few thousand aligned particle images, Auto3DCryoMap is able to build a decent 3D density map. In contrast, existing tools require hundreds of thousands of particle images.Finally, by using the preprocessed particle images, Auto3DCryoMap reconstructs a better 3D density map than using the original particle images. In the future, we plan to extend our methods to reconstruct 3D density maps of particles with irregular shapes.   a b e c d Fig. <ref type="bibr">18</ref> Top-view molecular structural analysis using the Apoferritin dataset. a Particles picking from the Apoferritin micrograph using DeepCryoPicker <ref type="bibr">[10]</ref>. b Fourier shell correlation plots for the final 3D reconstruction. The red curve is based on using the RELION 3.1 <ref type="bibr">[8]</ref>, the blue is based on using Auto3DCryoMap, and the green one is based on using EMAN 2.31 <ref type="bibr">[7]</ref>. The average resolution of our 3D density map reconstruction using Auto3DcryoMap is ~ 2.4 &#197;, whereas that one generated from RELION is ~ 2.75 &#197; and EMAN 2.31 is ~ 3.51 &#197;. c Summary of particle selection and structural analysis. d The preprocessed versions of the top-view particles that are used to generate the 3D density map structure. e 3D density map reconstruction of Apoferritin top-view protein that is obtained by the Auto3DCryoMap</p></div></body>
		</text>
</TEI>
