<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Random two-frame interferometry based on deep learning</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>08/07/2020</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10271594</idno>
					<idno type="doi">doi.org/10.1364/OE.397904</idno>
					<title level='j'>Optics express</title>
<idno>1094-4087</idno>
<biblScope unit="volume">28</biblScope>
<biblScope unit="issue">17</biblScope>					

					<author>Xinyang Li Ziqiang Li</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[A two-frame phase-shifting interferometric wavefront reconstruction method based on deep learning is proposed. By learning from a large number of simulation data based on a physical model, the wrapped phase can be calculated accurately from two interferograms with an unknown phase step. The phase step can be any value excluding the integral multiples of π and the size of interferograms can be flexible. This method does not need a pre-filtering to subtract the direct-current term, but only needs a simple normalization. Comparing with other two-frame methods in both simulations and experiments, the proposed method can achieve better performance.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>As one of the most popular techniques in optical metrology <ref type="bibr">[1]</ref>, traditional Phase-shifting interferometry (PSI) requires three or more interferograms with fixed, known phase steps <ref type="bibr">[2]</ref><ref type="bibr">[3]</ref><ref type="bibr">[4]</ref>. However, the acquisition of these interferograms is time-consuming and sensitive to mechanical vibrations, ambient air turbulence and temperature changes. Thus, the number of interferograms should be reduced to minimize the recording time. But it is difficult to reconstruct the phase with only one interferogram. For example, only one interferogram cannot distinguish the positive or negative of defocus, therefore, it is difficult to separate convex surfaces from concave surfaces with one interferogram. If there is no additional prior information, at least two frames of interferograms are needed to solve the sign ambiguity problem. Takeda et al. developed a Fourier-transform method to extract the phase from a single interferogram <ref type="bibr">[5]</ref>. By introducing a large spatial-carrier frequency, the phase information can be separated from unwanted irradiance variations in Fourier domain. However, the limitation is that an interferogram with closed-fringes cannot be applied. Thus, a large tilt is needed to generate the wanted spatial carrier, which will make the fringes denser. If the fringes are too dense, not only the camera cannot record them, but also it will cause considerable retrace error <ref type="bibr">[6,</ref><ref type="bibr">7]</ref>. To simplify the measurement process and reduce the instrument cost, the phase reconstruction using two frames of interferograms has been investigated extensively in the past decade <ref type="bibr">[8]</ref><ref type="bibr">[9]</ref><ref type="bibr">[10]</ref><ref type="bibr">[11]</ref>. In reference <ref type="bibr">[8]</ref>, a demodulation method, called Kreis method, based on the Fourier transform of two interferograms was proposed. It can demodulate the phase from two interferograms without sign ambiguity. It is one of the most classical algorithms in this field, but it is very sensitive to noise. In <ref type="bibr">[9]</ref>, a two-step interferometric method based on a regularized optical flow algorithm (OF) was proposed. This method does not need to normalize the fringe pattern, but needs to subtract the direct-current (DC) term. In reference <ref type="bibr">[10]</ref>, a phase reconstruction method based on Gram Schmidt (GS) orthogonalization with two fringe patterns as independent vectors was proposed. Because of its high accuracy and small computation consumption, this method has achieved great success and is widely used in two-frame interferometry. This method also requires the DC term be removed. In reference <ref type="bibr">[11]</ref>, a fast and accurate (FA) two-frame PSI wavefront reconstruction method was proposed. Before phase reconstruction, it is also necessary to use a high-pass Gaussian filter to filter out the DC term, and then estimate the cosine value of the unknown phase step between two interferograms by solving a quartic polynomial equation, and finally calculate the wrapped phase.</p><p>In recent years, deep learning (DL) technology based on artificial neural networks (ANN) has developed rapidly, especially in the field of computer vision. In reference <ref type="bibr">[12]</ref>, a convolutional neural network named U-Net was proposed for biomedical image segmentation. In reference <ref type="bibr">[13]</ref>, the U-Net was modified to be the generator in a Generative Adversarial Networks (GANs) applying to Image-to-Image Translation. These latest developments in computer vision have inspired the field of optical metrology, such as phase unwrapping and denoising <ref type="bibr">[14]</ref><ref type="bibr">[15]</ref><ref type="bibr">[16]</ref>. In reference <ref type="bibr">[17]</ref>, Kando et al. proposed a deep learning-based method to extract phases from single-shot interferograms, even when the interferogram includes closed ring-shaped fringes. However, this method is not suitable for the precise measurement of freeform surfaces because it cannot be applicable to interferograms including more than one closed-fringes. In order to solve the shortcomings of the above methods, we modified the generator in <ref type="bibr">[13]</ref>, and proposed a new two-frame phase-shifting method based on deep learning. We call the proposed network the Phase U-Net (PUN). The proposed PUN neither needs a spatial carrier, nor a filter for subtracting the DC term, but only a simple normalization. It can accurately recover the wrapped phase from two interferograms with an unknown phase step excluding the singular case, which corresponds to integral multiples of &#960;.</p><p>The paper is organized in the following way: In Section 2 we analyze the advantages of two-frame methods over one-frame methods. Then we illustrate how we generated the datasets for training the network and explain the architecture of the PUN, including details in the training process in Section 3. In Section 4, we do simulations and experiments to compare the algorithms and make a detailed error analysis, followed by the conclusion.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Advantages of two-frame methods over one-frame methods</head><p>In traditional four-step PSI <ref type="bibr">[18]</ref>, the four phase-shifted interferograms can be described as:</p><p>(1)</p><p>where I 1 , I 2 , I 3 and I 4 are the intensities of the four interferograms respectively, a means the DC term, b means the modulation term, W is the original phase, and (x, y) is the coordinate. We can filter out the unwanted background intensities a and b by using the equation below:</p><p>where &#966; is the wrapped phase of the original phase W.</p><p>The traditional four-step PSI requires four interferograms with accurate phase steps of &#960;/2. This method is sensitive to mechanical vibrations, ambient air turbulence and temperature changes. Thus, many advanced algorithms have been proposed to reduce the number of interferograms.</p><p>The Fourier-transform method was first proposed by Takeda et al. <ref type="bibr">[5]</ref>. If the tilt is not set to zero, the fringe pattern can be expressed as:</p><p>where f 0 is called the spatial-carrier frequency introduced by the tilt. In most cases a, b and W vary slowly compared with f 0 .</p><p>An image-sensing device (such as CCD and CMOS) that has enough resolution to satisfy the sampling-theory requirement is used to capture the fringes. The fringe pattern is rewritten in the following form for convenience of explanation: I(x, y) = a(x, y) + c(x, y)exp(2&#960;if 0 x) + c * (x, y)exp(-2&#960;if 0 x), <ref type="bibr">(7)</ref> with</p><p>where * denotes a complex conjugate.</p><p>Then the fast-Fourier-transform (FFT) algorithm is used to transform Eq. ( <ref type="formula">7</ref>) into Fourier domain.</p><p>FI(fx, fy) = A(fx, fy) + C(fxf 0 , fy)</p><p>where FI means the Fourier spectra of intensity I, and the other capital letters denote the corresponding Fourier spectra in Eq. ( <ref type="formula">7</ref>). (fx, fy) are the spatial frequency in the x direction and y direction respectively. Since the spatial variations of a, b and W are slow compared with the spatial frequency f 0 , the Fourier spectra in Eq. ( <ref type="formula">9</ref>) are separated by the carrier frequency f 0 . Thus, C(fxf 0 , fy) can be extracted and to obtain C(fx, fy). Note that the unwanted background variation a has been filtered out in this stage. Again, using the FFT algorithm, we compute the inverse Fourier transform of C(fx, fy) with respect to (fx, fy) and obtain c(x, y), defined by Eq. ( <ref type="formula">8</ref>). Then we calculate a complex logarithm of Eq. ( <ref type="formula">8</ref>):</p><p>Now we have the wrapped phase &#966;(x, y) in the imaginary part completely separated from the unwanted amplitude variation b(x, y) in the real part. However, if the spatial-carrier frequency is not large enough, this method becomes invalid. Besides, this kind of cross-talk in Fourier domain corresponds to closed-fringes in spatial domain. In other words, there is a limitation of the Fourier-transform method, that an interferogram including closed-fringes cannot be applied.</p><p>Each closed-fringe pattern in interferograms corresponds to two possible situations: a concave surface or a convex surface. This is a typical one-to-multiple mapping relationship. In fact, a one-to-one mapping relationship can be determined by appointing all closed-fringe patterns to be concave surfaces (or convex surfaces) <ref type="bibr">[19]</ref>. Based on this assumption, a deep learning-based method or an improved Fourier-transform method can be proposed to extract phases from single-shot interferograms with closed-fringe patterns <ref type="bibr">[17,</ref><ref type="bibr">19]</ref>. However, freeform surfaces can be considered to be composed of many convex surfaces and concave surfaces, we can never really know whether they are convex or concave in only one interferogram <ref type="bibr">[19]</ref>. For example, Fig. <ref type="figure">1(a)</ref> shows a sphere with two small defects in the center: one of them can be seen as a small convex surface while the other one can be seen as a small concave surface. A tilt has been added to the sphere but is not large enough to turn closed-fringes in the interferogram into open-fringes. Figure <ref type="figure">1(b)</ref> is the interferogram of that sphere. There are two points P1 and P2 on the adjacent fringes as shown in Fig. <ref type="figure">1(b)</ref>. It is known that the phase difference between P1 and P2 is 2&#960; but we cannot know which point is higher, and it is the same with the other defect. Thus, there are four possibilities for those two small defects with the same interferogram in Fig. <ref type="figure">1(b</ref>), and the other three possible situations are shown in Fig. <ref type="figure">2</ref>.</p><p>Two-frame methods do not have the sign ambiguity problem. Figure <ref type="figure">3</ref>(a) shows the second interferograms for those four surface shapes in Fig. <ref type="figure">1</ref>(a) and Fig. <ref type="figure">2</ref>, Those interferograms are different so that the four situations above can be well distinguished when combined with the first interferograms in Fig. <ref type="figure">1</ref>     In conclusion, whether the Fourier-transform method or the deep learning-based one-frame method, in order to fully solve the sign ambiguity problem, all kinds of one-frame interferometry methods need a large tilt which may cause too dense fringe patterns and the retrace error, while two-frame methods do not need such tilts and are more suitable for high precision optical metrology. Therefore, in Section 4, we will compare the PUN with other two-frame methods rather than one-frame methods.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Data generation and the training process</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Data generation</head><p>All deep learning-based methods start with data. In this paper, we generate the data by simulation for training. In order to enhance the generalization ability of the network and make it perform well for all possible situations, we use two different ways to generate the original phase. The first way is to use the Zernike polynomials. Since any continuous wavefront shape may be represented by a linear combination of the Zernike polynomials <ref type="bibr">[18]</ref>, the original phase W can be expressed as follows.</p><p>where L is the maximum power, Z r is the Zernike coefficient, &#961; and &#952; are radius and angle respectively. The single index r is defined by:</p><p>and the polynomial U r can be expressed as follows.</p><p>where the sine function is used when n -2m&gt;0, and the cosine function is used when n -2m &#8804; 0. The radial polynomial is given by:</p><p>The other way is to introduce the actuator influence function which is commonly used in adaptive optics to model the surface of deformable mirrors <ref type="bibr">[20]</ref>.</p><p>where P i is the actuator influence function, (x, y) is the coordinate of each point on the original phase, (u i , v i ) is the coordinate of the actuator. k, d and g are coefficients to simulate different types of deformable mirrors and will have little impact to the final results in this paper, because they are still the same type of interferograms even when the coefficients change. k is the coupling coefficient and is set to 0.15 in this paper, d is the actuator spacing and is set to 8, g is the Gaussian index and is set to 2.5. So, the original phase is obtained as follows.</p><p>where W is the original phase, H i is the influence amplitude coefficient and N means the number of actuators.</p><p>We used several groups of Zernike coefficients (up to 11 terms, including the tip and tilt) to generate original phases. The terms and amplitudes of the Zernike coefficients in each group were carefully selected to prevent from generating too dense fringes. For example, there were 5 terms of Zernike coefficients (1 st to 5 th order) in Group 1, and the amplitudes of the Zernike coefficients were randomly chosen from -15 to 15. In Group 2, there were 3 Zernike coefficients (8 th , 9 th and 10 th order) with the amplitudes vary between -10 and 10. In this way, we could generate many different types of aberrations without too dense fringes in the interferograms. As a supplement to the Zernike polynomials, we also generated original phases by using the actuator influence function. The influence amplitude coefficients were randomly chosen from -25 to 25. After the original phases were generated, we computed the wrapped phases to be the outputs in the training set. The wrapped phase &#966; is the phase angle of the original phase W, and can be calculated by using the MATLAB function 'angle':</p><p>Figure <ref type="figure">4</ref> shows an example of two original phases generated by the Zernike polynomials method and the actuator influence function method respectively. Their corresponding wrapped phases and interferograms are also shown. According to <ref type="bibr">[18]</ref>, we can model the interferograms in two-frame phase-shifting interferometry as:</p><p>where &#948; is the unknown phase-shifting step.</p><p>In order to simulate the non-uniform background intensity and modulation amplitude, a and b were not constants when generating the training set. Also, to better simulate the real situation, additive white Gaussian noise was added into these interferograms and the signal-to-noise ratio (SNR) varied from 20 to 100 dB. Figure <ref type="figure">5</ref> shows some simulated interferograms with different background intensities and noise levels. These interferograms are normalized between 0 and 1 to be the inputs in the training set, while the wrapped phases &#966; are the outputs. To read the data more efficiently and accelerate the training process, we used the TFRecord format to store the training set. The TFRecord format is a simple format for storing a sequence of binary records. It can be helpful to serialize the data and store it in a set of files (100-200MB each) that can each be read linearly. Thus, we stored 64 pairs of inputs and outputs in one TFRecord format file (192MB). Since the PUN was a deep network, we wanted to generate more than 50,000 pairs of data to train the network. Therefore, we generated 782 TFRecord format files, which were 50,048 pairs of data in total. We also generated a test dataset different from the training set to test the network and to evaluate whether the network was overfitting. Note that the image sizes of both interferograms and wrapped phases were 512&#215;512. The phase step &#948; should be between 0 and &#960; when generating the training data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Architecture of PUN and the training process</head><p>The architecture of our network the PUN is illustrated in Fig. <ref type="figure">6</ref>. Different from the original U-Net in <ref type="bibr">[12]</ref>, the proposed network has no pooling layer because details are important while pooling layers may lose those features.</p><p>In the PUN shown above, we used LeakyReLu <ref type="bibr">[21]</ref> activation functions in downsampling steps and ReLU <ref type="bibr">[22]</ref> in upsampling steps. Also, Batch Normalization <ref type="bibr">[23]</ref> was used in some layers to accelerate training. The Dropout technique <ref type="bibr">[24]</ref> was also used in some upsampling steps in order to avoid over-fitting. Different from the original U-Net using softmax in the output layer to solve the image segmentation problem as a classification problem, we used the ELU activation function <ref type="bibr">[25]</ref> in the output layer to make predictions of the pixel values. It has little impact that the outputs of the ELU activation function do not range from 0 to 1, because all negative outputs can be easily replaced by zeros and values larger than one can also be replaced by ones in post-processing. We have tried different loss functions and found the mean absolute error loss function performed the best.</p><p>where &#966; ij is the (i, j)th value on real wrapped phase &#966;, &#967; ij is the corresponding value on the predicted wrapped phase, M and N are the image size. We applied the Adam solver <ref type="bibr">[26]</ref> to train the PUN on a High Performance Computing cluster using one Intel Xeon E5-2695 V3 CPU, one Nvidia P100 GPU (with 16 GB VRAM), and 224 GB RAM for 120 epochs. The version of Tensorflow framework was 2.0.0. It is significant to note that there is no fully-connected layer in the network architecture so that we can not only predict wrapped phases using inputs with the size of 512&#215;512, but also lager images such as 768&#215;768, 1024&#215;1024, and 1280&#215;1280. However, it doesn't mean that there is no limit in the input size. The input size must be integral multiples of 256&#215;256. For example, the size of 32&#215;32 is not acceptable. Because the architecture of the PUN consists of two parts: the downsampling part and the upsampling part. In the downsampling part, the image size will be halved from the previous layer of the network to the next layer of the network. The data cannot even reach the bottom layer of the downsampling part when the input size is 32&#215;32. Most ANNs do not allow users to change the input size for two reasons. One reason is that most of ANNs have fully-connected layers in their network architectures. Changing the size will lead to the change of the number of parameters in the fully-connected layers. But the number of parameters cannot be changed in a concrete network. Thus, the input sizes of ANNs with fully-connected layers just cannot be changed. The other reason is that the test set (or we can call it the real problem) should be independent and identically distributed (i.i.d.) with the training set. Although in the PUN, there is no fully connected layer and thus it allows users to change the input size, the receptive field of the network is limited by the convolution layers. In the same convolution kernel and with the same original phase, the fringe pattern of a large size interferogram is sparser than the fringe pattern of a small size interferogram. That means, the input size cannot be enlarged without limit because that may break the i.i.d. rule. In one word, this network architecture allows users to choose the input size freely within a certain range. For example, the well-trained PUN in this paper prefers the input sizes between 512&#215;512 to 1280&#215;1280, and we have tested in Section 4 that the performances fluctuate little in that range. But if the CCD size changed to 2048&#215;2048, it is better to re-train the network using a new dataset, and that new well-trained network may perform better with input sizes between 2048&#215;2048 to 2816&#215;2816.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Error analysis and experiments</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Error analysis by simulation</head><p>We first examined the robustness of the proposed method when phase step changed. We have calculated the RMS errors of the wrapped phases computed by all five two-frame methods at different phase steps among the range from 0 rad to &#960; rad (not including 0 and &#960;,). No noise was added to the interferograms and the results are plotted in Fig. <ref type="figure">7</ref>, showing that the trained PUN is consistently better than other methods. In order to test the robustness under noisy situations, we simulated interferograms with a phase step of 1 rad and added additive white Gaussian noise with SNR ranging from 20 dB to 100 dB and plotted the RMS errors in Fig. <ref type="figure">8</ref>. This simulation validated that the PUN had the best performance over the whole range of tested SNR. We also tested the PUN with different input sizes to make sure that it would perform well in common image sizes. We generated another two sets of interferograms with sizes of 512&#215;512, 768&#215;768, 1024&#215;1024 and 1280&#215;1280. One set of interferograms had the phase steps of 1 rad and Gaussian noise with 100 dB SNR. The RMS errors of all five methods were calculated in Table <ref type="table">1</ref>, followed by the corresponding processing times. We then did the same experiment on the second set which only changed the SNR to 20 dB, as shown in Table <ref type="table">2</ref>. It should be noted that the PUN was written in Python and ran in Spyder IDE, while other algorithms were implemented in MATLAB. All methods (including the PUN) ran on a laptop with one Intel Core i5-8250U processor and 8 GB RAM when testing the performances. As can be seen in Table <ref type="table">1</ref> and Table <ref type="table">2</ref>, the proposed method PUN is the most accurate among all those methods, while the FA algorithm remains the fastest especially with large image sizes. To have a more intuitive understanding, the results of PUN in Table <ref type="table">1</ref> are demonstrated in Fig. <ref type="figure">9</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Experimental validation</head><p>In order to further verify the effectiveness of the proposed method, we collected two sets of interferograms in experiments with &#960;/2 phase step as shown in Fig. <ref type="figure">10</ref>(a) and Fig. <ref type="figure">12</ref>     Although the standard four-step interferometry cannot actually be the ground truth, it can be seen reliable regardless of the noise in interferograms. Therefore, the PUN results can be seen more accurate than the other two-frame interferometry algorithms.    </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>In conclusion, we have proposed a new deep learning-based method called PUN to estimate the wrapped phase by using only two interferograms. The advantages of the PUN over one-frame methods and other two-frame methods have been explained in this paper. Admittedly, the proposed method is not the fastest comparing with other two-frame methods, however, it is indeed the most accurate. Moreover, deep learning-based methods normally need a fixed input size, our network architecture can use different input sizes such as 512&#215;512, 768&#215;768, 1024&#215;1024, 1280&#215;1280, or even larger after re-trained with new datasets. Both simulations and experiments have been done to verify the performance.</p></div></body>
		</text>
</TEI>
