<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>DRLO: Deep Representation Learning for Large Scale Off-track Satellite Remote Sensing Data</title></titleStmt>
			<publicationStmt>
				<publisher>IEEE</publisher>
				<date>12/15/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10496901</idno>
					<idno type="doi">10.1109/BigData59044.2023.10386306</idno>
					
					<author>Xin Huang</author><author>Chenxi Wang</author><author>Wenbin Zhang</author><author>Sanjay Purushotham</author><author>Jianwu Wang</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Collocation of measurements from active and passive satellite sensors refers to the combination of data from two sensors that observe the same geographic area at nearly the same time but with differing spatial resolutions and viewing angles. This collocated data, often known as on-track data, comes with precise product labels from the active sensor but comprises only the pixels located directly on the path of an active satellite's orbit. As a result, its spatial coverage is quite limited, especially when compared to the vast quantities of offtrack data. Handling the abundant and information-dense offtrack data is crucial for training machine learning models that can effectively integrate the unique features of this data along with on-track data. However, the sheer volume of off-track data presents significant challenges for these models. To address the challenges of large amounts of unlabeled off-track data in remote sensing applications, we introduce a self-supervised representation learning model with VAE and domain adaptation methods to learn a domain invariant classifier for the on-track and off-track data. The model's performance is enhanced by pre-training off-track data with VAE generative model using offtrack data, to learn a good representation that can be transferred to the down-streaming domain adaptation and classification tasks. The classifier is built on these representations to classify different cloud types in passive sensing data, with the goal of achieving higher accuracy in cloud property retrieval. Extensive quantitative and qualitative evaluation demonstrate our method achieves higher accuracy in cloud property retrieval for off-track remote sensing data.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>Constantly covering about two-thirds of Earth's surface, clouds have a critical role in our climate system, with fundamental influence on the energy, water, and biological cycles. Satellite-based remote sensing plays a crucial role in observing clouds at a global scale. Numerous satellite sensors have been developed to observe and retrieve cloud properties. They can be largely divided into two groups: active sensors such as CALIPSO and CloudSat, and passive sensors such as MODIS, VIIRS, and ABI. The advantages of active sensors include their capability of resolving the vertical location of the cloud layer and better performance during nighttime and polar regions. On the other hand, passive sensors have a much better spatial sampling rate.</p><p>The remote sensing data now grows at an astronomical pace as satellite instruments become more and more powerful, which poses serious challenges to the computational efficiency of physically-based retrieval algorithms. Take the geostationary satellite for example. Until a few years ago, the operational Geostationary Operational Environmental Satellite (GOES) multispectral imager only had a handful of spectral bands and provided a "full-disk" scan of Earth only every 3 hours. The Advanced Baseline Imager (ABI) on the latest GOES-R series (GOES-16+) can provide full-disk scans every 15 minutes in 16 spectral bands with better spatial resolutions. This increased capability leads to a great increase in data, which poses a serious challenge for the physically-based retrieval algorithms. What makes the problem even more challenging is the fact that remote sensing data has also become more heterogeneous. The great success of the A-Train satellite constellation <ref type="bibr">[1]</ref> has clearly demonstrated that coordinated and collocated observations from sensors with complementary capabilities, e.g., the combination of passive MODIS and active CALIPSO-CloudSat, can provide a more comprehensive perspective and richer information of clouds that cannot be achieved from the individual instruments alone. However, it is challenging to combine and fuse heterogeneous observations in physically based algorithms because different types of observations often involve dramatically different physics.</p><p>Domain adaptation has been thoroughly studied in computer vision <ref type="bibr">[2]</ref>  <ref type="bibr">[3]</ref> and natural language processing (NLP) applications <ref type="bibr">[4]</ref>  <ref type="bibr">[5]</ref>. Recently, the deep learning paradigm has become popular in domain adaptation due to its ability to learn rich, flexible, non-linear domain-invariant representations 979-8-3503-2445-7/23/$31.00 &#169;2023 IEEE Fig. <ref type="figure">1</ref>. An example plot of the one-day daytime VIIRS (global coverage) and CALIOP (green lines) orbit tracks (February 8, 2022). Credits: NASA <ref type="bibr">[6]</ref>, <ref type="bibr">[7]</ref>. However, few of these approaches have been adapted for remote sensing applications. Moreover, domain adaptation techniques using deep neural networks have been mainly used to solve the distribution drifting problem in homogeneous domains <ref type="bibr">[8]</ref>. The data in the homogeneous domains usually share similar feature spaces and have the same dimensionalities. Nevertheless, real-world applications often deal with heterogeneous domains that come from completely different feature spaces and different dimensionalities. In our remote sensing application, the two remote sensor datasets collected by active and passive sensors respectively are heterogeneous. In particular, CALIOP actively collects 25 bands of sensing data, it has better sensitivity to aerosol types and cloud phases, and the data are fully labeled with 6 cloud types. VIIRS uses a spectroradiometer sensor to passively collect 20 bands of sensing data with no label or inaccurate labels.</p><p>Collocation of measurements from active (e.g., CALIOP) and passive satellite (e.g., VIIRS) sensors involves pairing measurements from two sensors that observe the same location quasi-simultaneously but with different spatial resolutions and at different angles. The collocated data, widely recognized as on-track data, only consists of the pixels on an active satellite orbiting track, thus having very limited spatial coverage compared to large amounts of off-track data. Typically, on-track data are labeled with accurate product type from the active sensor, while off-track data are unlabeled or have inaccurate labels from the passive sensor product. Given the vast amount and valuable information present in off-track data, it's crucial to develop a machine learning model that can integrate the unique features of this data along with on-track data. However, the sheer volume of off-track data presents difficulties in training machine learning models effectively, as this data is often unlabeled and includes considerable noise. Figure <ref type="figure">1</ref> shows a coverage difference using a full-day data collection using NASA Earth Data World View website. VIIRS has nearly full coverage of the Earth while CALIOP only covers the green line area which is much smaller than the coverage of VIIRS.</p><p>To tackle the issue of handling vast volumes of unlabeled off-track data in remote sensing, we present DRLO, a selfsupervised deep representation learning model tailored for large-scale off-track satellite remote sensing data. This model combines Variational Autoencoders (VAEs) with domain adaptation techniques to develop a domain invariant classifier that efficiently processes both on-track and off-track data. The effectiveness of the model is significantly enhanced by initially pre-training the VAE generative model on off-track data. This pre-training step enables the model to learn a robust representation, which is then applied to domain adaptation and classification tasks. Our comprehensive quantitative and qualitative assessments show that our approach outperforms existing methods in retrieving cloud properties from off-track remote sensing data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. RELATED WORK</head><p>Over the past few decades, a variety of aerosol and cloud remote sensing algorithms have been developed based on the physical principles and the radiative transfer of light scattering and absorption within aerosol and cloud fields <ref type="bibr">[9]</ref>. These physically-based algorithms are widely used in aerosol and cloud property products for weather and climate studies <ref type="bibr">[10]</ref>, <ref type="bibr">[11]</ref>, <ref type="bibr">[12]</ref>, <ref type="bibr">[13]</ref>. Traditionally, many of these algorithms use a lookup table (LUT) approach, in that one must prescribe aerosol and surface properties. The challenge is to ensure the algorithm has the means to select the appropriate model.</p><p>Although highly successful, it is challenging to improve these physically-based algorithms. For example, according to <ref type="bibr">[14]</ref>, there is no absolute separation between "aerosol" and "cloud". Most, if not all retrieval techniques rely on manually setting thresholds for scene identification, and a test in one region (e.g., tropical ocean) may not work well in another region (e.g., polar land), which necessitates many such "tests", and painstakingly tuned thresholds. When more than 3 or 4 tests must be combined in a non-sequential way, it is beyond the human ability to interpret. Even with all the effort, dust and cloud scenes are not separated correctly. In addition, even if two sensors are "nearly" the same (e.g., MODIS and VIIRS), spectral bands, resolution, calibration, etc., may be different enough that a threshold applied for one sensor may need revision for another. Thus, physically-based algorithms are expensive.</p><p>Machine learning (ML) and artificial intelligence (AI) techniques may overcome the challenges faced by physical-based algorithms. Since ML algorithms are written to autonomously find information (e.g., patterns of spectral, spatial, and/or time series data), they can learn hidden signatures of different types of objects. <ref type="bibr">[15]</ref> introduced two Random Forest (RF) machine learning models for cloud mask and cloud thermodynamic phase detection using spectral observations from VIIRS data. <ref type="bibr">[16]</ref> developed a deterministic self-organizing map (SOM) approach and applied it to satellite data based cloud type classification. Deep learning <ref type="bibr">[17]</ref> is also a promising technique, already having revolutionized many fields such as computer vision <ref type="bibr">[18]</ref>, natural language processing <ref type="bibr">[19]</ref>, and is increasingly being used in remote sensing applications <ref type="bibr">[20]</ref>. Those approaches can learn representations of multiple variables in a single domain but not multiple domains. Domain adaptation has been widely used in computer vision <ref type="bibr">[2]</ref>, <ref type="bibr">[3]</ref> and natural language processing (NLP) applications <ref type="bibr">[4]</ref>, <ref type="bibr">[5]</ref>. Recently, the deep learning paradigm has become popular in domain adaptation <ref type="bibr">[7]</ref>, <ref type="bibr">[6]</ref>, <ref type="bibr">[8]</ref> due to its ability to learn rich, flexible, non-linear domain-invariant representations. To the best of our knowledge, those domain adaptation methods can mainly solve the distribution drift problems for homogeneous data collected in different environments and can not be directly adapted to heterogeneous domains such as the active sensor and passive satellite data in remote sensing.</p><p>A previous study <ref type="bibr">[21]</ref> proposed domain adaptation based cloud type detection methods (DAMA and DAMA-WL) using active and passive satellite data. It develops a deterministic classifier on pixel-level classification but lacks the capacity to capture the spatial correlation among the pixels that are generated by the orbiting satellites and have a strong spatial relationship. <ref type="bibr">[22]</ref> further advanced the domain adaptation techniques in multi-satellite remote sensing data by proposing a new Variational Autoencoder (VAE) based domain adaptation method which can capture the spatial correlation and have better generalizability. However, those approaches can only work with on-track collocated data from the paired remote sensors datasets, and they can not directly train a model utilizing the large-scale off-track unlabeled data, which would have a large distribution drift from the on-track data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. DEEP REPRESENTATION LEARNING WITH VAE AND DOMAIN ADAPTATION METHODS</head><p>The passive sensor and active sensor datasets raise more challenges as the two datasets are highly dimensional, globally covered, and heterogeneous. Since off-track data carry rich information and are dominant in the passive remote sensor, it is important to design a model that can incorporate the off-track data in the representation learning, in order to develop a cloud property retrieval method in multi-satellite remote sensing data. Moreover, distribution drift also happens from on-track to off-track data, as some off-track data are very far away from the active sensor's orbiting track, and the environments or surface types change depending on the distance of the offtrack data to the on-track data.</p><p>To tackle the challenge of distribution drift from on-track to off-track data, we design a deep representation learning framework with VAE and domain adaptation methods that can incorporate off-track feature representation learning to develop a domain invariant classifier for off-track remote sensor data, as shown in Figure <ref type="figure">2</ref>.</p><p>In the training phase, there are three branches of inputs that take source domain data features (CALIOP), target domain data features (off-track VIIRS), and target domain data features (on-track VIIRS). As shown in Figure <ref type="figure">2</ref>, our model introduces a heterogeneous domain mapping to transform the feature space of the target domain into the feature space of the source domain and uses a feature extraction layer to train the shared representative features between source and target domain. After the domain mapping stage, we pre-train a VAE encoder-decoder using off-track target domain data (red block in Figure <ref type="figure">2</ref>), and that trained variational encoder is shared with on-track VAE for the target domain (yellow block in Figure <ref type="figure">2</ref>). It then goes through the VAE-based domain adaptation framework between the on-track source domain (green block) and the on-track target domain (yellow block), in particular, a maximum mean discrepancy (MMD) based domain alignment is applied to the latent spaces generated from source and target domain encoders. By incorporating the domain alignment loss and classification loss in training the domain adaptation network, we find the network can maximize the classification accuracy on the target domain by training this end-to-end deep domain adaptation neural network.</p><p>In the testing phase, only target domain data is sent into the deep neural network by going through the deep domain mapping layer and VAE-based feature extraction encoder, which has already captured the hidden representations from both on-track and off-track target data from the training phase. The trained classifier can then be applied to classify the output of the feature extraction layer as the domain invariant feature representation has been generated the flow.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Heterogeneous Deep Domain Mapping</head><p>Similar to <ref type="bibr">[21]</ref>, <ref type="bibr">[22]</ref>, we utilize a deep neural network to perform the deep domain mapping (DDM) between the source and target domain, to adjust for the dissimilarity between the source and target domain by learning a transformation that aligns the target feature space with the source feature space. The goal is to map the VIIRS dataset (target domain) to the CALIOP dataset (source domain) in order to preserve the discriminating power of CALIOP data and transfer it to a downstream machine learning model. This approach ensures that the number of features in both domains is equal and that they are in the same feature space.</p><p>For the collocated source and target domain data, the input of the DDM network is the on-track target domain data and the output of the network is the transformed target domain data in the source domain feature space. The source and target domain data in this study are collocated remote sensing data that have the same longitude and latitude coordinates, thus a mean squared error (MSE) loss function is utilized to calculate the error of the DDM network. Specifically, given source domain training examples D s = {x i }, x &#8712; R ds s , i = 1, ..., n s and unlabeled target data set</p><p>t , and n s = n t . The DDM model is trained to convert the target domain into the feature space of the source domain by minimizing the L 2 loss function:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Pre-training VAE for off-track target domain</head><p>We first apply the trained DDM from on-track domains to transform off-track target domain data into the feature space of the source domain. Then we perform self-supervised learning by using a variational auto encoder (VAE) to learn a latent space that captures the hidden structures from the off-track target domain data. The off-track target domain VAE is composed of an encoder for feature extraction, a latent feature vector, and a decoder for data reconstruction, respectively. The encoder's purpose is to take input data and create a representation of it in a latent feature space using a parameterized model. The goal is to optimize the parameters of the neural network in order to maximize the variational lower bound by minimizing the KL divergence between the estimated latent vector and the true latent vector, represented as L kl , and maximizing the expectation of the data points reconstructed from the latent vector, represented as L r . We can rewrite the final VAE loss we need to optimize as:</p><p>The learned parameters of the encoder will be shared with encoders used in the domain adaptation module for the ontrack data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Domain Adaptation on On-track Source and Target Domains</head><p>Domain adaptation module for on-track source and target domains consists of four components, i.e., (1) VAE for ontrack source domain, (2) VAE for on-track target domain, and (3) domain alignment (4) classifier on the source domain and target domain, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1) VAE fine tuning:</head><p>We propose a new method to enhance feature representation learning by transferring the knowledge from large-scale off-track data to on-track data. The structure and relationship among the off-track data can be learned via self-supervised VAE and transferred to on-track data by exploiting the idea of transfer learning. To implement the transfer learning based feature representation learning, we use a pre-training and fine-tuning approach. First, the encoder layer of the VAE is pre-trained on off-track target domain data as introduced in Section III-B, and then the pre-trained model with updated parameters is used as an input into our model to give the model some 'knowledge' of training. In particular, transfer learning with the help of off-track target domain data is performed for the external learning of backbone structure and mapping. The weights and parameters of encoders are optimized during pre-training, and the outputs of pre-training are used as starting parameters to train on a similar network that requires on-track target domain data.</p><p>Our VAE-based domain adaptation model utilizes two networks: one for the source domain (CALIOP) and another for the target domain (VIIRS). The model optimizes VAE losses for both networks:</p><p>2) Domain alignment: The use of VAE can uncover hidden features in input data, but it can be difficult to extract domaininvariant features due to a significant difference between the source and target domains. To address this issue, we propose a domain alignment module that reduces the discrepancy between the source and target domains and improves the robustness of the classifier in the target domain. Specifically, we add a feature adaptation layer to the VAE for both the source and target domains and use the maximum mean discrepancy (MMD) <ref type="bibr">[23]</ref>, <ref type="bibr">[22]</ref> as a metric for measuring the difference between the domains. In particular, it maps the features of both domains to a common reproducing kernel Hilbert space (RKHS) so that the distance between distributions can be represented as the distance between their kernel embeddings.</p><p>VAE's encoder layers E s and E t are applied to source domain data and target domain data, respectively, resulting in hidden features represented as E(X s ) and E(X t ). The domain alignment loss that needs to be optimized is defined as:</p><p>3) Weakly supervised classifiers with focal loss: In our remote sensing application, the classifier is predicting three cloud properties, which are Clear Sky, Liquid Cloud, and Ice Cloud. The labels for the source domain (CALIOP) data are considered accurate and serve as the ground truth due to the active remote sensing method used. However, the labels for the target domain (VIIRS) data are generated by a physical-based retrieval algorithm and are therefore less accurate, referred to as "weak labels". These labels contain only three cloud types (Clear Sky, Liquid Cloud, Ice Cloud) and are only 86% accurate when compared to the ground truth (CALIOP) labels. In addition, the cloud types studied in our remote sensing application are also imbalanced.</p><p>To encourage the classifier to learn about the difficulty of classifying samples and confront the challenge of class imbalance, we introduce a weighted focal loss to down-weight the loss for well-classified examples and up-weight the loss for misclassified examples, as well as add the weight of class imbalance. This is achieved by introducing a modulating factor (referred to as &#947;) in the cross-entropy loss, which makes the loss function more focused on the hard examples. The weighted focal loss is defined as:</p><p>Where, p t = p if true class, otherwise 1 -p and p is the predicted probability of the true class. w is the rescaling weight given to each class. Our model utilizes the latent feature vectors obtained from the VAEs to construct a source classifier C s and a target classifier C t by adding fully connected feature layers and ReLU activation functions to the encoders of the source and target domains, respectively. The weighted focal loss F L s c is used for source labels and the weighted focal loss F L t c for target labels. Additionally, the fully connected feature layers for the source and target domains share the same weights, resulting in a classifier that is invariant to the domain.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. EXPERIMENTS AND EVALUATIONS</head><p>We conduct experiments on real-world remote sensing datasets to compare the performance of our proposed model with state-of-the-art ML models.</p><p>A. Dataset 1) Training data: Our experiments are performed using remote satellite sensing datasets from CALIOP active sensor (source) and VIIRS passive sensor (target). The source domain data from CALIOP includes 25 attributes, collected by the CALIOP active spaceborne Lidar sensor. The target domain data from VIIRS includes 20 attributes, collected by the VIIRS passive spectroradiometer sensor. The details of attributes can be found in <ref type="bibr">[24]</ref>.</p><p>The ground truth labels for the source domain are obtained from the aerosol-free pixels of CALIOP, which are divided into three categories: Clear Sky, Pure Liquid Cloud, and Pure Ice Cloud. The weak labels for the target domain are obtained from the aerosol-free pixels of the VIIRS Cloud Top and Optical Properties Product <ref type="bibr">[25]</ref>. It should be noted that 86% of the VIIRS data points are found to match the labels of the corresponding CALIOP data points.</p><p>Our training datasets consists of off-track target domain (VI-IRS) dataset and on-track collocated source domain (CALIOP) and target domain (VIIRS) datasets. The Off-track VIIRS dataset has 600k samples randomly selected from the year 2016 VIIRS data. On-track collocated dataset is collocated for January 2016 of CALIOP and VIIRS datasets with 1,197,536 data points.</p><p>2) Testing data: The CATS is a lidar remote-sensing instrument that measures atmospheric components from the International Space Station (ISS), which has a non-sun-synchronous orbit <ref type="bibr">[26]</ref>, <ref type="bibr">[27]</ref>. Launched on January 10th, 2015, CATS provides more than 2-year's continuous observations of clouds and aerosols. Similar to CALIPSO, backscattered signals in the 1064nm channel are used to identify many types of clouds and aerosols and derive vertical structures along the ISS orbit <ref type="bibr">[27]</ref>.</p><p>In this study, we collocate testing data from VIIRS and CATS for the entire CATS operation period (March 2015 -October 2017). Level-1 observations from VIIRS at native 750-m resolution in 16 VIIRS M-bands <ref type="bibr">[28]</ref> and vertically resolved cloud-aerosol types from CATS Level-2 Layer product at 5km resolution <ref type="bibr">[27]</ref> are collected. Since the two instruments have different platforms and orbit types, observations outside a 15-minute window and a 5-km distance are discarded. In total, 420k, 740k, and 605k pixels are collected to evaluate our models on the 2015, 2016, and 2017 datasets, respectively.</p><p>For model testing purposes, we create pixel labels using collocated CATS Level-2 Layer products. Similar to the labels used for model training, we also use clear sky, pure liquid cloud, and pure ice cloud to represent all kinds of pixel types. In particular, clear means no cloud, liquid (ice) means only liquid (ice)-phase clouds are found in the whole column. It is important to emphasize that, as CALIPSO has dual 532 and 1064nm channels which offer better aerosol detection capabilities <ref type="bibr">[29]</ref>, the generated labels from CALIPSO and CATS may have some differences.</p><p>3) Evaluation metrics: The first evaluation metric we use to compare all the models is accuracy, which is defined as:</p><p>Total number of correct predictions Total number of data points (4)</p><p>We also leverage receiver operating characteristic (ROC) curves to assess and compare the efficacy of various models. A ROC curve graphically illustrates the relationship between the true positive rate (TPR), commonly known as sensitivity, and the false positive rate (FPR), referred to as 1-specificity, across different classification score thresholds. By integrating the curve (TPR values) with respect to the FPR values from zero to one, we calculate the area under the ROC curve (AUC). This AUC metric serves as a comprehensive performance indicator, encapsulating the model's effectiveness across all conceivable thresholds. The AUC values fall within the range of [0, 1], with larger AUC values indicating superior performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Quantitative Evaluation</head><p>For domain adaptation model comparisons, we conducted experiments on five models that use both source and target data variables (features). These comparison models include:   As shown in Table <ref type="table">I</ref>, our proposed model outperforms the other domain adaptation baselines significantly. Firstly, our method improves the accuracy by 33% and AUC 9% on average of all the predictions over three years when compared to the model without the DDM module. The low accuracy (around 54%) in predicting without domain mapping illustrates the difficulties in representing heterogeneous data and the challenges of directly using existing domain adaptation methods in such domains. Our proposed deep domain mapping method can bridge the gap between heterogeneous source and target domains and extract a domain-invariant representation by combining it with a domain adaptation technique.</p><p>Secondly, we see the weighted focal loss improves our method's accuracy by about 1.1% for all three years used in the evaluation. Thirdly, DANN <ref type="bibr">[30]</ref> and DSAN <ref type="bibr">[31]</ref>, the stateof-the-art domain adaptation techniques, only achieve about 42% and 50% accuracy, respectively, about 40% lower than our DRLO method. This demonstrates that existing domain adaptation techniques, which are primarily designed to address distribution shifts in homogeneous domains, would have inferior performance when applied to heterogeneous domains. Finally, we also observe the proposed model outperforms the other domain adaptation method DAMA-WL model developed for satellite remote sensing data in <ref type="bibr">[21]</ref> and <ref type="bibr">[22]</ref>, showing around 7% and 6% accuracy improvement for all three testing datasets.</p><p>We further illustrate the classification performance for each cloud label using ROC-AUC on evaluating CATS 2015, 2016, and 2017 datasets as shown in Figure <ref type="figure">3</ref>, in which the clear sky classification has superior performance over liquid and ice clouds. Moreover, similar ROC-AUC performance on datasets of different years also demonstrates that DRLO is stable and robust enough to classify cloud types over long periods by training previous years' data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Qualitative Evaluation</head><p>We conduct extensive climatology evaluations of the proposed approach for satellite remote sensing cloud properties. We collaborate with climatology domain experts in our team to conduct scientific evaluation and compare the proposed datasets to be generated with physically-based algorithms and current EOSDIS level-2 products through multiple aspects including statistics, climatology, ground observation, and adhoc case studies. We compare the climatology of a variety of cloud bulk properties, such as cloud fraction and cloud phase (Liquid/Ice cloud fraction) for the VIIRS off-track data.</p><p>Figure <ref type="figure">4</ref> shows the cloud and liquid cloud fraction of our model's prediction versus the sensor zenith angle. A good prediction model that captures the inherent characteristic of the data should be independent of the sensor's zenith angle. As the bar is very flat across the zenith angles in Figure <ref type="figure">4</ref> on both cloud fraction and liquid cloud fraction, it tells that our model's prediction is independent of the sensor's zenith angle and captures good representations of data.</p><p>We also conduct a qualitative evaluation with the help of our NASA collaborators from climate science to compare DRLO and DAMA-WL on visualizing the prediction on a granule of VIIRS data, as shown in Figure <ref type="figure">5</ref>. The clear sky is shown in blue, the ice cloud is shown in red, and the liquid cloud is shown in green. Compared to DAMA-WL, DRLO's prediction is more balanced, by revealing some more ice clouds (red) and clear sky pixels (blue), due to better representation learned from our pre-training/fine-tuning technique and the weighted focal loss used in dealing with the imbalanced data. Moreover, in the regions that are dominant with clear sky (blue), the proposed method is able to identify some ice clouds, which the DAMA-WL method fails to detect.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. CONCLUSION</head><p>This paper introduces a novel representation learning method using pre-training/fine-tuning VAE and domain adaptation to predict cloud types for off-track remote sensing data. The method pre-trains a VAE-based generative model using large scale off-track data to capture structures of unlabeled data. It then uses a fine-tuning strategy to load the pre-trained VAE model to a domain adaptation network to learn domaininvariant representations from multiple satellite remote sensing data. The method then uses these representations to classify different cloud types in passive sensing data, with the goal of achieving higher accuracy in cloud property retrieval. Extensive quantitative and qualitative evaluations show that this method outperforms other state-of-the-art machine learning methods. We also want to highlight our work's impact as better cloud retrieval from the proposed model could help understand climate change.</p></div></body>
		</text>
</TEI>
