<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Convolutional Neural Network Denoising of Focused Ion Beam Micrographs</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>10/25/2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10339854</idno>
					<idno type="doi">10.1109/MLSP52302.2021.9596272</idno>
					<title level='j'>IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Minxu Peng</author><author>Mertcan Cokbas</author><author>Unay Dorken Gallastegi</author><author>Prakash Ishwar</author><author>Janusz Konrad</author><author>Brian Kulis</author><author>Vivek K Goyal</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Most research on deep learning algorithms for image denoising has focused on signal-independent additive noise. Focused ion beam (FIB) microscopy with direct secondary electron detection has an unusual Neyman Type A (compound Poisson) measurement model, and sample damage poses fundamental challenges in obtaining training data. Model-based estimation is difficult and ineffective because of the nonconvexity of the negative log likelihood. In this paper, we develop deep learning-based denoising methods for FIB micrographs using synthetic training data generated from natural images. To the best of our knowledge, this is the first attempt in the literature to solve this problem with deep learning. Our results show that the proposed methods slightly outperform a total variation-regularized model-based method that requires time-resolved measurements that are not conventionally available. Improvements over methods using conventional measurements and less accurate noise modeling are dramatic - around 10 dB in peak signal-to-noise ratio.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">INTRODUCTION</head><p>Focused ion beam (FIB) microscopy plays a crucial role in imaging fine sample structures at sub-nanometer resolution <ref type="bibr">[1]</ref><ref type="bibr">[2]</ref><ref type="bibr">[3]</ref><ref type="bibr">[4]</ref>. A focused beam of ions is raster scanned over the sample. The interaction of incident ions with the sample produces secondary electrons (SEs) to be detected. The mean number of SEs per ion reveals information such as the composition and topography of sample components.</p><p>Measurements obtained by FIB microscopy are inherently noisy due to randomness in both the numbers of incident ions and the numbers of detected SEs per incident ion. Each of these sources of randomness is conventionally modeled with Poisson distributions, resulting in a Neyman Type A distribution for measurements that is detailed in Section 3. The signal-dependent noise level of this model is illustrated in Fig. <ref type="figure">1</ref>. Existing image denoising techniques fare poorly on FIB micrographs because they are developed for Poisson measurements or measurements corrupted by additive white Gaussian noise (AWGN).</p><p>In many types of imaging, one may improve quality by using averaging to reduce noise variance. This often takes the form of increased acquisition time. In FIB microscopy, this corresponds to increasing the dose, i.e., the mean number of incident ions per pixel &#955;, which may be achieved by increasing the product of beam current and pixel dwell time. However, various studies have shown that sample damage due to sputtering and radiation increases with dose <ref type="bibr">[5]</ref><ref type="bibr">[6]</ref><ref type="bibr">[7]</ref>. For dose-sensitive samples, this introduces a limit to the image quality that can be achieved without sophisticated data processing. Sample damage also makes it fundamentally infeasible to collect clean-noisy FIB micrograph pairs for training-based methods. Even for samples less prone to damage, acquiring these image pairs would be expensive and time consuming.</p><p>The concept of time-resolved (TR) measurement that has recently been introduced to FIB microscopy <ref type="bibr">[8]</ref> can be viewed as a way to make measurements more informative without increasing dose. For each pixel, dwell time t is divided into n sub-acquisitions of length t/n, and the number of SEs is counted within each subacquisition. Estimation methods developed for TR measurements show significant improvements that are approximately equivalent to making the source produce a deterministic number of ions -mitigation of source shot noise <ref type="bibr">[8,</ref><ref type="bibr">9]</ref>. Note that these methods have only been developed for unregularized (pixel-by-pixel) estimation, and they require unconventional data collection. Though measuring n sub-acquisitions at each pixel is feasible, it can increase total imaging time, which can increase cost and susceptibility to sample motion. To achieve similar improvement without TR measurement is thus of interest.</p><p>In this work, we use our knowledge of an accurate generative model for FIB microscope data to develop a deep learning-based method that significantly outperforms methods intended for AWGN or Poisson noise removal -without having a database of highquality FIB micrographs. We adopt a state-of-the-art denoising convolutional neural network (DnCNN) <ref type="bibr">[10]</ref>, developed mainly for the AWGN denoising task. We train our network on noisy natural images where the noise is synthetically generated according to the FIB microscope model. We also extend the network architecture by adding a VGG-16 network to incorporate perceptual loss. Through extensive experiments, we show that deep learning-based denoising algorithms can be effective on FIB micrographs. Furthermore, we demonstrate that perceptual loss may help to preserve the structure in FIB images during denoising. The deep learning-based methods trained with physically-accurate synthetic noise -operating only on conventional data -perform slightly better than total variationregularized maximum likelihood estimation applied to TR data. (c) A simulation with a physically accurate model for FIB microscopy. Note that the noise variance is signal dependent to a greater extent than with Poisson data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">RELATED WORK</head><p>Traditional methods. Various techniques have been proposed to recover a clean image from a noisy observation based on a measurement model and a regularizer or prior. For instance, total variation (TV) regularization <ref type="bibr">[11]</ref> is very broadly successful in reducing noise without excessive smoothing of sharp edges. We include a comparison with a TV-regularized method here.</p><p>Deep neural network for image denoising. Deep neural networks <ref type="bibr">[10,</ref><ref type="bibr">[12]</ref><ref type="bibr">[13]</ref><ref type="bibr">[14]</ref> have received increasing attention for the past decade. DnCNN <ref type="bibr">[10]</ref> combines a very deep neural network <ref type="bibr">[15]</ref>, batch normalization <ref type="bibr">[16]</ref> and residual learning <ref type="bibr">[17]</ref> for a series of computer vision tasks, including image denoising for Gaussian noise, super-resolution, and deblocking. DnCNN is based on the assumption that the residual mapping is easier to learn than the original signal of interest, where residual is defined as the difference between the noisy observation and ground truth image. The network employs a single residual unit to output the residual image and exhibits a boost of performance from the contribution of batch normalization and residual learning.</p><p>Perceptual loss. Pixelwise difference losses have achieved successes in recovering images from distortions. However, they fail to capture the perceptual quality of images and often result in oversmoothed reconstructions. Perceptual loss <ref type="bibr">[18]</ref> quantifies perceptual differences between images in a feature space and can be used to increase the spatial-structure fidelity between the ground truth and restored images. It is commonly used in style transfer <ref type="bibr">[19]</ref> and super resolution <ref type="bibr">[18]</ref> and can be adapted to image denoising <ref type="bibr">[20]</ref>. A pre-trained convolutional neural network (e.g., VGG-16 <ref type="bibr">[15]</ref>) is employed to extract high-level features of the output from the denoising neural network to enhance the perceptual appearance with respect to the ground truth image.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">DATA MODEL AND MODEL-BASED ESTIMATION</head><p>In this section, we introduce the measurement model for FIB microscopy with noiseless direct SE detection. Compared with commercially prevalent use of scintillators and photomultiplier tubes, direct SE detection offers higher signal-to-noise ratio <ref type="bibr">[21]</ref> and is easier to model. This measurement model underlies both modelbased reconstruction and our generation of synthetic noisy images for training and testing of learning-based reconstruction. We assume a square micrograph with J pixels and use J = {1, . . . ,</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#8730;</head><p>J} 2 to denote the pixel index set. Here the focus is on the SE yield image &#951; &#8712; [0, &#8734;) J and measurements that allow estimation of this image. These are related to network inputs and outputs in Section 5.2.</p><p>Measurement model. Our model for measurements is separable across the pixels, so we present it here without pixel indexing. During a fixed dwell time t, ion incidences follow a Poisson process. The number of incident ions M is a hidden Poisson random variable with mean &#955; = &#923;t, where &#923; represents the rate of incident ions per unit time. Dose &#955; is assumed to be equal for all the pixels. Incident ion i causes Zi number of SEs to be detected. Each Zi is Poisson distributed with mean &#951;, which is called the SE yield. Both uses of Poisson distributions are well-established in the particle beam microscopy literature <ref type="bibr">[22]</ref>. The total detected SEs Q = M i=1 Zi is a Neyman Type A random variable with probability mass function</p><p>mean</p><p>and variance</p><p>Note that both mean and variance depend on &#951;, making the noise neither additive nor white. Furthermore, the relationship between mean and variance is substantially different than for Poisson-distributed data, assuming &#951; is not too small. Combining measurements across pixels gives Q &#8712; N J . A comparison of AWGN and this microscopic noise model can be seen in Figure <ref type="figure">1</ref>.</p><p>If each pixel dwell time t is split over n sub-acquisitions, at each pixel we obtain an SE count vector of length n with independent entries V (k) , k &#8712; {1, . . . , n}, distributed according to (1) with &#955; replaced by &#955;/n. When &#955;/n is small, the measurement model for the TR vector can be approximated well by using only the m = 0 and m = 1 terms in <ref type="bibr">(1)</ref>. The counts from the sub-acquisitions together comprise the SE count above, so</p><p>for any pixel index j &#8712; J . Thus, a TR dataset is V &#8712; N J&#215;{1,...,n} , where summing over the last dimension of the tensor gives non-TR dataset Q.</p><p>Model-based estimation. At any pixel, scaling the detected SE counts Q by total dose &#955; yields the unbiased estimator</p><p>which has mean-squared error (MSE) &#951;(&#951; + 1)/&#955;. The (&#951; + 1) factor in the MSE is rooted in the (&#951; + 1) factor in (3) by which the variance of a Neyman Type A random variable exceeds the variance of a Poisson random variable with the same mean. This cost of randomness of the number of incident ions can be mitigated by TR measurement <ref type="bibr">[8,</ref><ref type="bibr">9]</ref>. The pixelwise separable maximum likelihood (ML) estimator using the TR data,</p><p>has been shown empirically to realize most of the performance improvement from TR measurement that is predicted by Fisher information <ref type="bibr">[9]</ref>. Regularized estimation without TR measurement is made difficult by the series form of (1) and the non-convexity of its negative logarithm. With TR measurement, provided &#955;/n is sufficiently small, the derivative of log PQ(v (k) ; &#955;/n | &#951;) with respect to &#951; can be approximated efficiently. This allows us to compute the TVregularized ML estimator for a full SE yield image,</p><p>where wTV is the regularization weight the 2D TV norm &#8226; TV .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">DEEP LEARNING ARCHITECTURES AND LOSS FUNCTIONS</head><p>To reconstruct the clean image y &#8712; R J from noisy observation x &#8712; R J , we use a feed-forward convolutional neural network called DnCNN <ref type="bibr">[10]</ref> as the backbone architecture, which can be seen in Figure <ref type="figure">2</ref>. We modified the original network architecture for our method. We pass the predicted image &#375; through a 'sigmoid' function to ensure that &#375; &#8712; [0, 1] J . In the remainder of this section, we discuss three different types of loss functions that we combined separately with DnCNN <ref type="bibr">[10]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Direct Loss</head><p>One natural way to apply a deep learning method to an image denoising problem is to estimate the clean image directly from the noisy image. In this approach, loss is measured on the difference between the clean image and estimated image. We use the MSE metric:</p><p>Fig. <ref type="figure">2</ref>: Denoising network architecture proposed by Zhang et al. <ref type="bibr">[10]</ref>. The network consists of 17 layers: the first layer is composed of a convolutional layer followed by a ReLU; the intermediate 15 layers are composed of a convolutional layer followed by batch normalization and a ReLU; and the last layer consists solely a convolutional layer. The output of the network is modified to be either an estimated clean image or an estimated residual image.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Residual Loss</head><p>In DnCNN <ref type="bibr">[10]</ref>, the loss function is designed to estimate the residual between the noisy and clean image instead of estimating the image directly. Let the ground-truth residual r be defined as</p><p>and let r = x&#375; be the estimated residual. The residual loss function is calculated as the MSE between the ground-truth and estimated residuals:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Perceptual Loss</head><p>In <ref type="bibr">[18]</ref>, Johnson et al. proposed a perceptual loss network. This loss network is used to extract descriptive features of the images. From different intermediate layers of this loss network, features with different characteristics can be extracted. For each of the chosen layers, we compute the MSE between the feature vectors of the recovered image from DnCNN <ref type="bibr">[10]</ref> and the clean image. Perceptual loss is then defined as</p><p>where denotes the of different layers chosen to extract features from the loss network, Fi(&#8226;) denotes the feature vector coming from the ith layer of the loss network, and wi represents its weight. We select the weights empirically. For our application, we choose VGG-16 <ref type="bibr">[15]</ref> as our loss network. The layers for extracting the feature vectors are kept the same as in <ref type="bibr">[18]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">DATA AND EXPERIMENTS</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Datasets</head><p>Due to lack of a FIB micrograph denoising dataset, we use regular camera images to train our network. We used the Berkeley Segmentation Data Set and Benchmarks 500 (BSDS500) <ref type="bibr">[23]</ref> for training. We use 400 images for training and the remaining 100 images for validation. We use the techniques introduced in the next section to distort the images for training, validation and testing. Distorted images will serve as the input, and the clean images will serve as the ground truth.</p><p>Our testing is two-fold. The first testing phase is quantitative using two benchmark datasets: the Berkeley segmentation dataset (BSD68) <ref type="bibr">[24]</ref> composed of 68 images and the 12-image dataset used in <ref type="bibr">[10]</ref>. The second testing phase is qualitative using the microscopic image shown in Figure <ref type="figure">3</ref>. For the chosen microscopic image we also provide quantitative results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Data Generation and Experimental Setup</head><p>In this work, three sets of experiments vary the training loss, vary the synthetic generation of training data, and provide model-based reconstructions for comparison. In all trained networks, we use the same network parameters (number of layers, number of channels, batch normalization, etc.) as the original DnCNN <ref type="bibr">[10]</ref>. For optimization, we use Adam optimizer with a learning rate of 5 &#215; 10 -5 . We schedule a learning rate decay at every 20 epochs by a factor of 0.2. Overall, we use 100 epochs to train each model. Experiments are performed on a PC with Intel(R) Core(TM) i7-6700K CPU at 4.00 Ghz and Nvidia GeForce GTX 1070. Training a single network takes approximately 16 hours on a GPU.</p><p>The first set of experiments compares the three networks discussed in Section 4, all trained with noisy images generated following the data model described in Section 3. We refer to these methods as CNN-Direct, CNN-Residual, and CNN-Perceptual. Each clean image y &#8712; [0, 1] is scaled through</p><p>to give ground-truth SE yield values in the interval <ref type="bibr">[2,</ref><ref type="bibr">8]</ref> to match the range suggested in <ref type="bibr">[1]</ref>. Given the SE yield image &#951;, an image of total detected SEs Q is generated for ion dose &#955; = 20 following the model described in <ref type="bibr">(1)</ref>. The final noisy image x is Q/320, with the scaling chosen so that most pixels in most realizations are in the range [0, 1].</p><p>The second set of experiments is designed to demonstrate the impact of lacking an accurate noise model for FIB microscopy. We present results only with the direct loss of Section 4.1; the residual and perceptual loss give worse quantitative performance and no additional insights. We maintain the image scaling of <ref type="bibr">(11)</ref> and the dose of &#955; = 20, but we model the data generation in three ways. If the source beam provided &#955; incident particles deterministically, the number of SEs would follow a Poisson distribution with mean &#955;&#951;. Without the physical justification of a deterministic source beam, one might also use a Poisson model out of naivety or because it is a suitable model for scanning electron microscopy. We refer to the denoising method trained with Poisson data as CNN-Direct-Poisson. Two additional networks are trained using images distorted by AWGN. Let &#951; denote the mean over the pixels of the SE yield image &#951;. Then training with AWGN variance &#955;&#951; is called CNN-Direct-AWGN-Poisson because it matches the average variance to Poisson-distributed data, and training with AWGN variance &#955;&#951;(1 + &#951;) is called CNN-Direct-AWGN-Neyman because it matches the variance to <ref type="bibr">(3)</ref>.</p><p>The final set of experiments simulates performance of the model-based methods. The values of &#951; and &#955; are unchanged from above. We refer to the estimator in (4) as Conventional. The total dose &#955; is split over n = 100 sub-acquisitions for TR methods. We refer to the estimator in <ref type="bibr">(5)</ref> as TR-Unregularized and to the estimator in <ref type="bibr">(6)</ref> as TR-TV. The value of wTV = 1.4 is optimized to minimize MSE.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">RESULTS AND DISCUSSION</head><p>In this section, we present qualitative and quantitative results for all the experiments discussed in Section 5.2. Table <ref type="table">1</ref> shows quantitative results on methods tested on noisy images corrupted following the model in (1) using peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) metrics. The best-performing algorithm with respect to both evaluation metrics is CNN-Direct. We also observe that the performance of CNN-Residual is very close to CNN-Direct. Recalling that the Neyman Type A distribution gives signal-dependent variance, this finding suggests that learning based on residual loss is less effective when the residual is not signalindependent. Moreover, both CNN-Direct and CNN-Residual outperform CNN-Perceptual in the stated metrics. However, these metrics do not always align with human perception. For instance, based on the qualitative results in Figure <ref type="figure">3</ref>, it is arguable that CNN-Direct oversmooths somewhat. Depending on the application, loss of details such as textures and edges may be not acceptable. For assessing how well the texture is preserved, we can focus on the red-framed section of the images and any detailed part of this mesh structure. Perceptual loss may be the best method in terms of preserving the texture and edges.</p><p>Among the three methods representing less accurate modeling of the FIB microscopy data, CNN-Direct-AWGN-Poisson achieves the worst performance. This modeling accounts for neither the particle nature of the data nor the correct signal-dependent variance model <ref type="bibr">(3)</ref>. Slightly better performance -though still more than 10 dB worse than CNN-Direct -is obtained with CNN-Direct-Poisson. Both CNN-Direct-Poisson and CNN-Direct-AWGN-Poisson underestimate the variance of the measurements by a large margin. By matching the average variance to (3), CNN-Direct-AWGN-Neyman achieves performance within about 2 dB to networks that are trained under the Neyman Type A model, which demonstrates the significance of the correct noise knowledge.</p><p>The improvement of TR-Unregularized over Conventional reconstruction is by about 6 dB, which is roughly consistent with previous theoretical and empirical results <ref type="bibr">[8,</ref><ref type="bibr">9]</ref>. Including TV regularization significantly improves the quantitative and visual performances; the PSNR increases by more than 7 dB and the SSIM is doubled. The TR-TV method achieves performance comparable with the best deep learning method, with PSNR difference less than 0.7 dB. While the use of TV regularization is not generally competitive with deep learning-based methods, here it is not a comparison based on the same input data; time-resolved measurements are fundamentally more informative.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">CONCLUSION</head><p>In this paper, we demonstrated how to apply deep learning-based image denoising algorithms on FIB microscopic image denoising despite a lack of training data from that imaging modality. We show that deep learning algorithms can achieve similar performance to a TV-regularized model-based method that requires unconventional time-resolved data. The original versions of the deep learning methods that we adapted were mainly tested for AWGN. In our work, we used the compound Poisson noise model that is physically accurate for FIB microscopy. We studied how performance degrades under less accurate modeling of the noisy observations. Results suggest that CNN-Direct and CNN-Residual perform the best in terms of PSNR and SSIM. Furthermore, we show that perceptual loss can be used to preserve the structure.</p><p>We have provided results only for the ion dose level of &#955; = 20, which combines with the SE yield to determine the noise level. Among our areas of future work is to demonstrate the robustness of the proposed methods to a range of noise levels. In addition, data following a Neyman Type A distribution displays spatially variant noise related to the ground truth value, as shown in <ref type="bibr">(3)</ref>. The employed DnCNN neural network lacks the flexibility to deal with spatially variant noise, while we want to design a network that can handle such noise. Finally, our results reinforce the merit of having timeresolved data. We want to extend our work to be able to incorporate time-resolved measurements into learned-denoiser technology.</p></div></body>
		</text>
</TEI>
