<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Recovery of continuous 3D refractive index maps from discrete intensity-only measurements using neural fields</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>09/01/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10357702</idno>
					<idno type="doi">10.1038/s42256-022-00530-3</idno>
					<title level='j'>Nature Machine Intelligence</title>
<idno>2522-5839</idno>
<biblScope unit="volume">4</biblScope>
<biblScope unit="issue">9</biblScope>					

					<author>Renhao Liu</author><author>Yu Sun</author><author>Jiabei Zhu</author><author>Lei Tian</author><author>Ulugbek S. Kamilov</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Intensity diffraction tomography (IDT) refers to a class of optical microscopy techniques for imaging the three-dimensional refractive index (RI) distribution of a sample from a set of two-dimensional intensity-only measurements. The reconstruction of artefact-free RI maps is a fundamental challenge in IDT due to the loss of phase information and the missing-cone problem. Neural fields has recently emerged as a new deep learning approach for learning continuous representations of physical fields. The technique uses a coordinate-based neural network to represent the field by mapping the spatial coordinates to the corresponding physical quantities, in our case the complex-valued refractive index values. We present Deep Continuous Artefact-free RI Field (DeCAF) as a neural-fields-based IDT method that can learn a high-quality continuous representation of a RI volume from its intensity-only and limited-angle measurements. The representation in DeCAF is learned directly from the measurements of the test sample by using the IDT forward model without any ground-truth RI maps. We qualitatively and quantitatively evaluate DeCAF on the simulated and experimental biological samples. Our results show that DeCAF can generate high-contrast and artefact-free RI maps and lead to an up to 2.1-fold reduction in the mean squared error over existing methods.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>T he refractive index (RI) measures the optical density, which determines the interaction between light and matter within a sample. The real part of the RI characterizes the phase, whereas its imaginary part characterizes absorption. The RI can thus serve as an endogenous source of optical contrast for imaging samples without staining or labelling. By quantitatively characterizing the three-dimensional (3D) distribution of the RI, one can visualize cellular or subcellular structures useful for morphogenesis <ref type="bibr">1</ref> , oncology <ref type="bibr">2</ref> , cellular pathophysiology <ref type="bibr">3</ref> , biochemistry <ref type="bibr">4</ref> and beyond (see the review papers in refs. <ref type="bibr">5,</ref><ref type="bibr">6</ref> ).</p><p>Intensity diffraction tomography (IDT) is a recent technique for recovering the 3D RI maps of a sample by measuring the light it scatters. In the standard IDT set-up, a sample is illuminated multiple times from different angles and a set of two-dimensional (2D) intensity projections is captured by the camera (see Fig. <ref type="figure">1c</ref>). A tomographic image reconstruction algorithm is then used to computationally reconstruct the desired 3D RI distribution from the set of 2D measurements. Unlike traditional optical diffraction tomography, which uses interferometry to record the complex-valued light fields <ref type="bibr">[7]</ref><ref type="bibr">[8]</ref><ref type="bibr">[9]</ref> , IDT only measures the squared amplitude of the scattered light, leading to an easy set-up on standard transmission optical microscopes with inexpensive hardware modifications. Such flexibility has spurred different IDT variants integrating object scanning <ref type="bibr">10,</ref><ref type="bibr">11</ref> , angled illumination <ref type="bibr">[12]</ref><ref type="bibr">[13]</ref><ref type="bibr">[14]</ref><ref type="bibr">[15]</ref> , pupil engineering <ref type="bibr">16,</ref><ref type="bibr">17</ref> and multiple scattering <ref type="bibr">18,</ref><ref type="bibr">19</ref> . Set-ups achieving high resolution <ref type="bibr">18</ref> and fast acquisition <ref type="bibr">20</ref> have also been reported.</p><p>Despite the rich literature on IDT, image reconstruction remains a fundamental challenge. The first issue is that the phase of the scattered light field is missing from the measurements, resulting in a nonlinear measurement system that is not characterizable by classical linear Fourier diffraction theory <ref type="bibr">21</ref> . This rules out the use of standard filtered-backprojection methods and calls for advanced computational algorithms. The second issue is the well-known missing cone problem, which causes elongation of the reconstructed object along the optical axis (z-dimension) and hence reduction of the axial resolution. The missing cone problem is a result of a limited-angle tomographic set-up, in which illuminations can come only from one side of the sample plane with a limited range for angle variation (less than ~40&#176; in our set-ups). This leads to incomplete coverage of the 3D Fourier spectra with a missing, cone-shaped region in the axial direction. These missing phase and cone problems make image reconstruction in IDT a severely ill-posed inverse problem.</p><p>Regularization methods are commonly used for mitigating the ill-posed nature of many inverse problems. These methods are based on minimizing a cost function that consists of a data-fidelity term and a regularization term, where the former uses a physical model to quantify the mismatch between the predicted and acquired measurements, whereas the latter promotes solutions that are consistent with a priori knowledge of the sample. For example, the least-squares loss and Tikhonov regularizer (&#8467; 2 -penalty) are widely used for obtaining closed-form solutions to inverse problems <ref type="bibr">14</ref> . The work on plug-and-play priors has generalized the notion of image priors to implicit regularizers characterized by image denoisers <ref type="bibr">[22]</ref><ref type="bibr">[23]</ref><ref type="bibr">[24]</ref><ref type="bibr">[25]</ref> . Deep learning has recently emerged as a powerful framework for image reconstruction. A traditional deep learning reconstruction is based on training a convolutional neural network (CNN) on a large dataset to learn a mapping from low-quality images to their high-quality counterparts. The state-of-the-art performance of such methods has been demonstrated in X-ray computed tomography <ref type="bibr">26,</ref><ref type="bibr">27</ref> , magnetic resonance imaging <ref type="bibr">28,</ref><ref type="bibr">29</ref> , optical tomography <ref type="bibr">30,</ref><ref type="bibr">31</ref> and seismic imaging 32 (see refs. <ref type="bibr">[33]</ref><ref type="bibr">[34]</ref><ref type="bibr">[35]</ref> ). Although deep learning has considerably improved image reconstruction in many modalities, traditional deep learning methods are impractical for image reconstruction in IDT, where it is difficult to acquire high-quality ground-truth RI maps in experiments. Although a physics-based simulator has been proposed to generate datasets for training IDT artefact-suppressing CNNs, the results are still limited by the mismatch between the simulation and experiments <ref type="bibr">36</ref> .</p><p>Neural fields (NF) is a recent deep learning framework that has gained popularity in computer vision and graphics for representing and rendering 3D scenes using coordinate-based deep neural networks <ref type="bibr">37,</ref><ref type="bibr">38</ref> . It is worth mentioning that although NF was deemed to be the most appropriate term <ref type="bibr">39,</ref><ref type="bibr">40</ref> , this idea currently goes by various names in the vision/graphics literature, including neural coordinate-based representations or neural implicit models. It has been shown that NF can learn a high-quality representation of a complex scene from a sparse set of data without any external training dataset. Motivated by this property, we propose Deep Continuous Artefact-free RI field (DeCAF) as a novel NF-based IDT method for learning a high-quality, continuous 3D RI map from intensity-only and limited-angle measurements without any external training dataset of ground-truth RI maps. Figure <ref type="figure">1</ref> provides a conceptual illustration of DeCAF, the key features of which are as follows: e central component of DeCAF is a multilayer perceptron (MLP)-a fully connected (non-convolutional) deep networkfor learning a function that maps 3D coordinates (x, y, z) to the corresponding complex-valued RI values. e trained MLP thus provides a continuous neural representation of the RI map. e RI value at any spatial location can be retrieved by querying the trained MLP with the corresponding coordinate. By decoupling representation from an explicit voxel grid, DeCAF can eciently store large 3D volumes. DeCAF is a self-supervised method, meaning that it does not need to be trained on an external dataset of ground-truth RI maps. is is possible as the same MLP is used at every 3D location, enabling it to learn natural redundancies and correlations within an RI volume. e MLP is trained directly at test time by using only the IDT measurements of the sample that we seek to reconstruct. e IDT forward model is used to map the output of MLP to the intensity measurements and use the gradient back-propagation to update the MLP weights. DeCAF enables easy integration of additional prior knowledge on the unknown sample using an explicit regularization term in the loss function. In this paper we explored the potential of such synergistic integration by including an anisotropic 3D regularizer that separately imposes penalties in the x-y plane and z direction. Speci cally, the x-y penalty uses a deep denoising CNN pre-trained on natural images to remove additive white Gaussian noise <ref type="bibr">41,</ref><ref type="bibr">42</ref> , whereas the z penalty is based on one-dimensional total variation. Although our denoising CNN was not trained explicitly on RI images, we show through ablation studies that it improves the performance by mitigating noise and imaging artefacts.</p><p>The pipeline of the proposed method is visually illustrated in Fig. <ref type="figure">1a</ref>. In the training phase, the input of DeCAF is a set of spatial coordinates = {( )} = taken from a pre-defined grid. DeCAF first maps the input coordinates to encodings using a non-trainable radial expansion, followed by a standard fully connected neural network to map the encodings to the RI values at the input coordinates. We introduced a novel type of encoding called radial encoding, which facilitates high-quality, artefact-free reconstruction of RI maps (see the Methods for details). DeCAF is trained to solve the following optimization with an objective consisting of a measurement loss L and regularizer</p><p>where x is the predicted RI map, y represents the intensity measurements of the test sample, F is the IDT forward model and M is the MLP (which includes the radial encoding) parameterized by weights &#981;. Note that the test measurements are the only input required in DeCAF. After the optimal &#981;* is learned, one can render the test sample on a voxel grid with arbitrary density by simply querying M * using the corresponding coordinates, as illustrated in Fig. <ref type="figure">1e</ref>. Past applications of NF include novel view synthesis <ref type="bibr">[43]</ref><ref type="bibr">[44]</ref><ref type="bibr">[45]</ref> , dynamic scene representation <ref type="bibr">[46]</ref><ref type="bibr">[47]</ref><ref type="bibr">[48]</ref> , object lightning <ref type="bibr">49,</ref><ref type="bibr">50</ref> and computed tomography <ref type="bibr">51</ref> . Our work has several contributions to the existing NF literature: (1) DeCAF considers learning NF by accounting for the diffraction and scattering effects due to the wave nature of the light, whereas the existing work in the area has focused on ray-tracing models in graphics; (2) DeCAF extends the use of NF to the recovery of the phase information from intensity-only data; (3) DeCAF combines an implicit MLP regularization with an additional explicit image regularizer (for example, based on a deep denoiser) to achieve the best of both worlds, that is, to improve on the separate usage of an implicit and explicit regularization; (4) DeCAF introduces radial encoding as a novel type of encoding layer for improving the ability of NF to represent complex samples. Details on the network architecture and the learning procedure of DeCAF are provided in the Methods and the Supplementary Information. In the next section, we present both qualitative and quantitative results that demonstrate DeCAF's ability to reconstruct high-quality RI maps.</p><p>Results. Experimental validation. We validated DeCAF's ability to recover high-quality RI maps with accurate biological features and minimal artefacts on experimentally collected IDT data. We used DeCAF on four biological samples, including Spirogyra and diatom algae, human buccal epithelial cells and Caenorhabditis elegans (C. elegans). We adopted the existing light-propagation models to formulate the inverse problems associated with the dense <ref type="bibr">14</ref> , annular <ref type="bibr">20</ref> and multiplexed <ref type="bibr">52</ref> illumination patterns. As absorption provides a lower contrast for the considered samples, we focus on comparing the phase images. In the subsequent sections we use x, y and z to denote length, width and depth, respectively. We first show the effectiveness of DeCAF for dense IDT (dIDT) on stained Spirogyra (Fisher Scientific S68786, embedded in water, RI &#8776; 1.33)-unicellular algae containing a helical arrangement of chloroplasts oriented in the 3D space. In total we collected 89 brightfield intensity measurements using a 0.25 NA objective lens; two example measurements are presented in Fig. <ref type="figure">1c(i,</ref><ref type="figure">ii</ref>). In the experiment we compared DeCAF with two existing IDT reconstruction baselines: Tikhonov regularization <ref type="bibr">14</ref> and SIMBA <ref type="bibr">53</ref> , as both methods have been extensively validated under similar imaging settings. SIMBA is a recently proposed model-based algorithm that leverages a deep learning denoiser as an image prior. Each method's final reconstructed RI volume consists of 40 axial slices of 1,024 &#215; 1,024 pixels, equally spaced between -30 &#956;m and 50 &#956;m, forming a volume of 665.6 &#215; 665.6 &#215; 80 &#956;m 3 . We define z = 0 &#956;m as the focal plane throughout the paper.</p><p>Figure <ref type="figure">2</ref> visualizes the experimental results. To demonstrate the overall structure of the sample, a rendered 2D image that accumulates all of the z layers of the DeCAF reconstruction is presented in Fig. <ref type="figure">2a</ref>. As shown, DeCAF successfully reconstructed the spiral structure of the Spirogyra. Figure <ref type="figure">2b</ref> compares the 2D axial slices obtained by DeCAF, SIMBA and Tikhonov at the depths z &#8712; {4, 16, 28, 40} &#956;m. The results show that DeCAF provides superior axial sectioning ability (that is, a pattern emerges only in the slices it belongs to and fades rapidly as we move axially to different depths) than the other two methods. This is demonstrated by the clarity and sharpness of the spirals (in the dashed cyan boxes) that appear at a specific depth, which show that DeCAF removes the artefacts (in the red dashed boxes) generated by the diffraction from the adjacent slices. These artefacts remain in the reconstructions by SIMBA and Tikhonov. We further evaluate the axial resolution of each reconstruction by comparing the lateral views that correspond to the cutlines shown in Fig. <ref type="figure">2c,</ref><ref type="figure">d</ref>. DeCAF substantially reduces the elongation artefacts caused by the missing cone problem. Line profiles presented in Fig. <ref type="figure">2e</ref> quantitatively characterize the reduction of z-elongation by DeCAF.</p><p>We next applied DeCAF to annular IDT (aIDT) to explore its capability for efficient data processing. We imaged two distinct classes of biological samples, including diatom microalgae (S68786, Fisher Scientific) and unstained human epithelial buccal cells. The former is a unicellular algae with regular arrangement of punctae, whereas the latter is a complex cell environment consisting of intracellular bacteria. We acquired 24 intensity images using a 0.65 NA objective lens under oblique illuminations for each sample. The diatom and cell cluster samples are fixed in glycerin gelatin (RI &#8776; 1.47) and water, respectively. We used Tikhonov as the baseline method for comparison. Figure <ref type="figure">3</ref> presents the results for diatom algae. Two example measurements are provided in Fig. <ref type="figure">1c</ref>(iii,iv). Both DeCAF and Tikhonov were configured to reconstruct 52 slices of 700 &#215; 700 pixels equally spaced between -10 &#956;m and 16 &#956;m, forming a volume of 113.75 &#215; 113.75 &#215; 26 &#956;m 3 . The 3D illustration of the volume reconstructed by DeCAF is presented in Fig. <ref type="figure">3a</ref>, demonstrating the overall reconstruction quality. Figure <ref type="figure">3e</ref> presents the slices reconstructed by each method at depths z &#8712; {-1, 1, 3} &#956;m. DeCAF demonstrates better sectioning capability than Tikhonov regularization. Superior removal of the missing cone artefacts is also shown in the lateral views in Fig. <ref type="figure">3c,</ref><ref type="figure">d</ref>. As DeCAF learns a continuous representation of the RI distribution, it can generate images on arbitrarily dense voxel grids without additional retraining. Figure <ref type="figure">3f</ref> demonstrates this unique ability of DeCAF by interpolating 26.7-times more pixels than the original reconstruction in the x-y planar region shown in Fig. <ref type="figure">3b</ref>. For comparison, we used nearest neighbour (Pixel), bilinear (Bilinear) and bicubic (Bicubic) interpolation methods to upsample the same region. Our results show that DeCAF is able to resolve small features with strong cross-scale consistency while avoiding interpolation artefacts highlighted by the arrows.</p><p>We next present the results of epithelial buccal cell clusters in Fig. <ref type="figure">4</ref>. A background-removed intensity measurement showing the distribution of the whole cell cluster is presented in Fig. <ref type="figure">4a</ref>. In Fig. <ref type="figure">4b,</ref><ref type="figure">c</ref> we focus on two complex regions where cells overlap with each other to highlight the superior axial sectioning capability of DeCAF. The size of the two volumes are 81.25 &#215; 81.25 &#215; 16 &#956;m 3 and 97.5 &#215; 97.5 &#215; 16 &#956;m 3 , discretized to 32 slices of 500 &#215; 500 and 600 &#215; 600 pixels, respectively. DeCAF successfully resolves different cells with clear separation, whereas Tikhonov generates strong artefacts that blur the boundaries. A visual demonstration of the axial slices of these cells is provided in Fig. <ref type="figure">4d-f</ref>. In each reconstructed slice, DeCAF recovers clear cell membrane, cytoplasm, micronuclei and bacterias while removing the diffraction and scattering artefacts.</p><p>We further show the continuous representation learned by DeCAF by upsampling it along z, meaning that DeCAF is used to interpolate an entire axial slice that was not part of the grid used during training. Figure <ref type="figure">4g</ref>,h presents the interpolated slices of the bacteria clusters highlighted in Fig. <ref type="figure">4d,</ref><ref type="figure">e</ref>. A z-axis is provided In each figure to show the axial location of each slice. Note that {-5.5, -4.5, -3.5} &#956;m and {1.5, 2.5, 3.5} &#956;m are the axial coordinates pre-defined in the training grid. The interpolated slices in Fig. <ref type="figure">4g</ref>,h clearly show the appearance and disappearance of the bacteria clusters at different values of z. As shown, the interpolated biological features are consistent with those lying in the pre-defined grid, making the whole transition smooth across axial layers. This strong axial consistency preserved in DeCAF enables it to produce high-fidelity interpolations without any additional retraining. We finally validate DeCAF on the multiplexed IDT microscopy (mIDT). This modality allows more rapid acquisition under the same time by simultaneously illuminating the sample from multiple angles for each intensity measurement. We imaged a C. elegans worm specimen by using a 0.65 NA objective lens to acquire 16 measurements, with each from the simultaneous illuminations of six different light-emitting diode (LED) sources. Figure <ref type="figure">1c(v,</ref><ref type="figure">vi)</ref> shows two example measurements. The sample is challenging due to its thickness and complicated arrangement of organs. As the worm is alive and moving during the acquisition, we reconstructed two volumes of 162.5 &#215; 162.5 &#215; 20 &#956;m 3 , discretized to 40 slices of 1,000 &#215; 1,000 pixels, at different times to cover the worm bodies with interested biological features. Extended Data Fig. <ref type="figure">1</ref>  with clear quantification of the internal biological tissues. For example, the buccal cavity, anterior and terminal pharyngeal bulbs, isthmus and intestine are clearly restored in our reconstruction. Smaller features are also distinguishable with high contrast, as shown in the regions expanded in Fig. <ref type="figure">5f-h</ref>. For example, lysosomes, a grinder and the lumen of intestine are accurately visualized with clear separation from the other tissues. Figure <ref type="figure">5c</ref>-e shows the y-z lateral views, where the oval shape of C. elegans is reconstructed without the missing cone artefacts, and fine features such as buccal cavity and grinder are preserved and recovered at different axial layers. Extended Data Fig. <ref type="figure">2</ref> highlights the space used for storing the MLP weights in DeCAF. As the representation is decoupled from a predefined voxel grid, DeCAF can be trained on a sparse grid to reduce the storage cost, but can still produce the final reconstruction on a grid of desired density. The storage reduction is demonstrated by comparing the memory requirements of DeCAF and Tikhonov for the reconstruction of the C. elegans worm. DeCAF retains the small memory size of 3 MB across different grid densities while that of Tikhonov increases as the grid becomes denser.</p><p>Quantitative evaluation. In this section we present quantitative evaluations of DeCAF using a high-fidelity cell phantom. We used CytoPacq <ref type="bibr">54</ref> to generate a granulocyte phantom containing tens of granulocyte cells randomly distributed in a volume of 60 &#215; 60 &#215; 12 &#956;m 3 , discretized to 40 slices of 454 &#215; 454 pixels. The maximum RI value in the cell is set to = and = (the real and imaginary parts of the sample's RI, respectively), meaning that there is no absorption. The immersion media is assumed to be air (n 0 = 1). The simulation is based on the aIDT set-up and used the split-step non-paraxial simulator to simulate the full wave propagation <ref type="bibr">55,</ref><ref type="bibr">56</ref> . The set-up includes an annular LED array at 515 nm wavelength for illumination and an objective lens with 0.65 NA. In total, 24 measurements are taken during the acquisition.</p><p>The result of quantitative evaluation are summarized in Fig. <ref type="figure">6</ref>. Figure <ref type="figure">6a</ref> shows the overall 3D structure of the phantom. Section views of axial and lateral planes are compared for DeCAF, SIMBA and Tikhonov, with quantitative evaluation of the peak signal-to-noise ratio (PSNR) values</p><p>where the mean squared error is computed by (&#8226; &#8226;) and the maximum pixel value in the image is returned by (&#8226;). missing cone artefacts, and removing the cell shadows due to axial elongations (highlighted using dashed circles). Extended Data Fig. <ref type="figure">3</ref> visualizes the 3D volumes reconstructed by each method using Fiji 57 under the default configuration. From left to right, the figure displays the 3D volumes corresponding to the ground-truth, DeCAF, SIMBA and Tikhonov. Peak signal-to-noise ratio values are labelled on each volume in green. DeCAF clearly outperforms SIMBA and Tikhonov by reconstructing cells that look most similar to the ground-truth. For example, consider the cells highlighted in the zoom-in volume. DeCAF reconstructs these cells with clear shapes and sharp edges, while the reconstructions of SIMBA and Tikhonov are either axially elongated or blurry. Quantitative results further highlight the accuracy of DeCAF, showing PSNR improvements of 1.6 dB and 3.3 dB with respect to SIMBA and Tikhonov, respectively (equivalent to a 1.5-and 2.1-fold reduction in MSE).</p><p>Discussion. Difference to SIMBA. DeCAF offers several benefits from the existing SIMBA method First, test-time learning: SIMBA does not adapt to the specifics of a test sample-it uses a fixed forward model and a fixed pre-trained prior. On the other hand, DeCAF is a test-time learning method in which the MLP weights are adjusted for each test sample, leading to a better reconstruction performance reported throughout this paper. Second, grid-free representation: SIMBA reconstructs a discrete volume on a pre-defined voxel grid. DeCAF decouples the representation of the reconstructed 3D RI from the grid by using MLP. This enables one to synthesize any part of the 3D RI volume 'on demand' on any grid by simply querying the relevant coordinates of MLP. Thus, the complexity of storing the sample reconstructed by DeCAF is decoupled from the voxel-grid. Third, internal and external regularization: unlike SIMBA, DeCAF synergistically uses internal and external regularization offered by MLP and a CNN denoiser, respectively. Our quantitative results show that MLP offers a substantial amount of regularization, even when no external regularizer is used; however, the best results are achieved when both regularizers are used.</p><p>Limitations of DeCAF. An obvious limitation of DeCAF is that it is based on the linear IDT forward models that are based on the first Born approximation. This limits the applicability of the current implementation to relatively thin and weakly scattering samples. This limitation can be observed in the reconstruction of a relatively thick C. elegans sample. Future work will explore the extension of DeCAF to thicker and stronger scattering samples by using forward models accounting for multiple scattering, such as the ones based on the variations of the beam propagation method <ref type="bibr">12,</ref><ref type="bibr">58</ref> . Another limitation of DeCAF is that it is currently slower than existing IDT reconstruction methods, Tikhonov and SIMBA, which is due to our implementation of the NF training. Our model takes less than a day (~20 h) to infer each real sample, while the runtimes of Tikhonov and SIMBA are at the levels of several minutes and hours, respectively. Furthermore, DeCAF's hyperparameters need to be tuned manually on real samples due to the lack of ground-truth, which potentially leads to further increases in runtime. The future work will explore faster DeCAF implementations that leverage recent progress in accelerating NF methods (for example, Instant Neural Graphics Primitives 59 suggests an order of magnitude acceleration).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusion.</head><p>We proposed a novel self-supervised deep learning method, DeCAF, for enabling high-quality 3D reconstruction of the RI distribution from intensity-only measurements. We extensively validated DeCAF on the experimentally collected datasets of multiple biological samples under three different IDT set-ups. The results show that DeCAF can mitigate the missing cone artefacts while maintaining the fine details of small biological features. We also provide quantitative evidence to further corroborate our argument. Results show that DeCAF can reduce MSE by up to 2.1-fold. The continuous representation in DeCAF also allows to generate images at voxel grids of arbitrary density without retraining of the deep network, which is useful for addressing computational and memory bottlenecks in image reconstruction and analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Methods</head><p>IDT experiments. IDT resolution. e lateral and axial resolutions of the IDT system are limited by the support of the optical transfer function, which is determined by the objective NA and illumination NA <ref type="bibr">14</ref> . For both the aIDT and mIDT set-up, our maximum illumination angle is close to the objective NA; thus, the recovered lateral spatial frequency can reach the incoherent di raction limit 4NA/&#955;, and the axial Fourier coverage is up to <ref type="bibr">( --)</ref> , where n 0 is the RI of background media.</p><p>Dense IDT. Our dense IDT system consists of: a Nikon TE 2000-U microscope equipped with a custom programmable LED array (approximately illuminating the plane wave with a central wavelength of &#955; = 632 nm); a &#215;10/0.25 NA objective (Nikon, CFI Plan Achromat); and an sCMOS camera (PCO.Edge 5.5). The LED array is placed about 79 mm away from the sample. It is controlled via a microcontroller and is synchronized with the camera. A small subset of the LEDs on the array-containing the 89 LEDs within the brightfield region-is used to illuminate the sample sequentially.</p><p>Annular IDT. Our annular IDT system consists of a Nikon ECLIPSE E200 microscope equipped with a programmable ring LED unit (Adafruit, 1586 NeoPixel Ring). The microscope objective is &#215;40/0.65 NA (Nikon, CFI Plan Achromat), and each LED approximately provides a plane wave with a central wavelength of &#955; = 515 nm. The ring LED unit has 24 LED lights and is 60 mm in diameter. It is centered at the optical axis and placed approximately 35 mm away from the sample, which sets the angle between the wave vector and the optical axis to about 40&#176; and complies with the microscope objective NA.</p><p>Multiplexed IDT. Our multiplexed IDT system has the same hardware specification as the dense IDT system except that the microscope objective is &#215;40/0.65 NA (Nikon, CFI Plan Achromat). Besides, the subset of the LEDs used in the experiment changes to 96 LEDs corresponding to the NA range from 0.3 to 0.575. This design contains 16 disjoint illumination patterns and the multiplexed illumination quantity of each pattern is 6. The camera is synchronized with the LED array and captures 16 measurements corresponding to each illumination pattern.</p><p>Sample and data preparation. Spirogyra algae. is sample is a part of Fisher Science Education algae basic slide set S68786. We captured 89 intensity-only bright eld measurements. We pre-processed each measurement by removing the background intensity followed by normalization. e same pre-processing procedure is also applied to other samples. We consider a reconstruction volume of 665.6 &#215; 665.6 &#215; 80 &#956;m 3 , positioned between -30 &#956;m and 50 &#956;m around the focal plane. e volume is discretized into 40 slices along the z-axis, with each slice having 1,024 &#215; 1,024 pixels. Here, a single voxel corresponds to 6.5 &#215; 6.5 &#215; 2 &#956;m 3 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Diatom algae (aIDT)</head><p>. This sample is a part of Fisher Science Education algae basic slide set S68786. We captured 24 measurements and consider a reconstruction volume of 113.75 &#215; 113.75 &#215; 26 &#956;m 3 , positioned between -10 &#956;m and 16 &#956;m around the focal plane. The volume is discretized into 52 slices along the z-axis, with each slice having 700 &#215; 700 pixels. Here, a single voxel corresponds to 0.1625 &#215; 0.1625 &#215; 0.5 &#956;m 3 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Diatom algae (mIDT)</head><p>. This sample is a part of Fisher Science Education algae basic slide set S68786. We captured 16 measurements, and each measurement used six LEDs. We consider a reconstruction volume of 130 &#215; 130 &#215; 30 &#956;m 3 , positioned between -15 &#956;m and 15 &#956;m around the focal plane. The volume is discretized into 60 slices along the z-axis, with each slice having 800 &#215; 800 pixels. Here, a single voxel corresponds to 0.1625 &#215; 0.1625 &#215; 0.5 &#956;m 3 .</p><p>Human buccal epithelial cells. This sample was swabbed from a researcher's buccal. The individual rinsed the mouth with clean water and then twirled a wooden swab against the inner cheek. The end of the swab was immersed in a drop of purified water on a glass slide and covered by a coverslip. We captured 24 measurements of the cell cluster and consider two volumes in the region as shown in Fig. <ref type="figure">4b,</ref><ref type="figure">c</ref>. The former has 81.25 &#215; 81.25 &#215; 16 &#956;m 3 and the latter has 97.5 &#215; 97.5 &#215; 16 &#956;m 3 . Both volumes are positioned between -8 &#956;m and 8 &#956;m around the focal plane. They are discretized to 32 slices of 500 &#215; 500 and 600 &#215; 600 pixels. Here, a single voxel corresponds to 0.1625 &#215; 0.1625 &#215; 0.5 &#956;m 3 . C. elegans. Young adult C. elegans were mounted on 3% agarose pads in a drop of nematode growth medium buffer. Glass coverslips were then gently placed on top of the pads and sealed with a 1:1 mixture of paraffin and petroleum jelly.</p><p>As the C. elegans were alive and moving during data acquisition, we captured a video at 4 fps in which each frame contained 16 measurements and each measurement used six LEDs. We picked two frames at 1.5 s and 44 s for reconstruction, where the sample was relatively steady. We consider a unified reconstruction volume of 162.5 &#215; 162.5 &#215; 20 &#956;m 3 , positioned between -10 &#956;m and 10 &#956;m around the focal plane. The volume is divided into 40 slices along the z-axis, with each slice having 1,000 &#215; 1,000 pixels. Here, a single voxel corresponds to 0.1625 &#215; 0.1625 &#215; 0.5 &#956;m 3 . DeCAF framework. A linearized approximation of IDT forward measurement system can be described by equation ( <ref type="formula">3</ref>)</p><p>is the unknown volume of complex-valued permittivity contrast, y &#961; is the collection of the background-removed intensity measurements corresponding to the LED illuminations emitted at a set of locations &#961;, and A &#961; is the measurement matrices that model the sample-intensity mapping associated with these illuminations. The reconstruction of &#916;&#1013; is equivalent to the reconstruction of the RI distribution via equation (4)</p><p>where and are the real and imaginary parts of the sample's RI, and n 0 is the RI of the background medium (where the attenuation is often assumed to be zero). In equation ( <ref type="formula">4</ref>), all operations are evaluated in an element-wise manner. We derived the formulations of A &#961; by following past works on dIDT <ref type="bibr">14</ref> , aIDT <ref type="bibr">20</ref> and mIDT <ref type="bibr">52</ref> (see the 'IDT forward model' section in the Supplementary Information).</p><p>The central piece of DeCAF is a coordinate-based MLP, M , which maps the 3D coordinate (x, y, z) to the corresponding values of and . We normalize the coordinate grid to a cube [-1, 1] 3 before feeding them into M . The deep network M consists of two subnetworks, where the first one is an encoding layer &#947;(x, y, z), pre-defined before training, and the second one is a standard MLP N : ( ) &#8594; ( ) parameterized by the trainable parameters &#981;. A visual illustration of the detailed network architecture is provided in the Extended Data Fig. <ref type="figure">4a</ref>.</p><p>Radial encoding. It has been shown that a Fourier-type encoding of the spatial coordinates is essential for a MLP to represent high-frequency variations in the signal <ref type="bibr">43</ref> and impose implicit regularization <ref type="bibr">60</ref> . In DeCAF, we consider a decomposition of the input coordinate (x, y, z) into (x, y) and z, and use different strategies to expand (x, y) and z. This is due to the non-isotropic resolution of the imaging system along the x-y plane and the z dimension. Our experiments showed that existing encoding strategies, such as positional <ref type="bibr">43</ref> and Gaussian 61 encoding, lead to suboptimal reconstruction of RI images along the x-y dimensions. We propose radial encoding as an alternative for expanding v := (x, y)</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>NATURE MACHINE INTELLIGENCE | VOL 4 | SEPTEMBER 2022 | 781-791 | www.nature.com/natmachintell</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1"><p>NATURE MACHINE INTELLIGENCE | www.nature.com/natmachintell</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_2"><p>Extended Data Fig. 3 | Reconstruction of the 3D Granulocyte Phantom using DeCAF, SIMBA, and Tikhonov. (a) From left to right, 3D volumes correspond to Groundtruth, DeCAF, SIMBA, and Tikhonov, respectively. (b) Close-up views of the reconstructions at the location shown in (a). Note how DeCAF reconstructs sharper and better quality cell images compared to both SIMBA and Tikhonov. NATURE MACHINE INTELLIGENCE | www.nature.com/natmachintell</p></note>
		</body>
		</text>
</TEI>
