Title: Scalable Aggregation Service for Satellite Remote Sensing Data
With the advances of satellite remote sensing techniques, we are receiving huge amount of satellite observation data for the Earth. While the data greatly helps Earth scientists on their research, conduct- ing data processing and analytics from the data is getting more and more time consuming and complicated. One common data processing task is to aggregate satellite observation data from original pixel level to latitude-longitude grid level to easily obtain global information and work with global climate models. This paper focuses on how to best aggregate NASA MODIS satellite data products from pixel level to grid level in a distributed environment and provision the aggregation capa- bility as a service for Earth scientists to use easily. We propose three different approaches of parallel data aggregation and employ three par- allel platforms (Spark, Dask and MPI) to implement the approaches. We run extensive experiments based on these parallel approaches and platforms on a local cluster to benchmark their differences in execution performance and discuss key factors to achieve good speedup. We also study how to make the provisioned service adaptable to different service libraries and protocols via a unified framework. more »« less
MODIS (Moderate Resolution Imaging Spectroradiometer) is a key instrument onboard NASA’s Terra (launched in 1999) and Aqua (launched in 2002) satellite missions as part of the more extensive Earth Observation System (EOS). By measuring the reflection and emission by the Earth-Atmosphere system in 36 spectral bands from the visible to thermal infrared with near-daily global coverage and high-spatial-resolution (250 m ~ 1 km at nadir), MODIS is playing a vital role in developing validated, global, interactive Earth system models. MODIS products are processed into three levels, i.e., Level-1 (L1), Level-2 (L2) and Level-3 (L3). To shift the current static and “one-size-fits-all” data provision method of MODIS products, in this paper, we propose a service-oriented flexible and efficient MODIS aggregation framework. Using this framework, users only need to get aggregated MODIS L3 data based on their unique requirements and the aggregation can run in parallel to achieve a speedup. The experiments show that our aggregation results are almost identical to the current MODIS L3 products and our parallel execution with 8 computing nodes can work 88.63 times faster than a serial code execution on a single node.
Surface albedo is a fundamental radiative parameter as it controls the Earth’s energy budget and directly affects the Earth’s climate. Satellite observations have long been used to capture the temporal and spatial variations of surface albedo because of their continuous global coverage. However, space-based albedo products are often affected by errors in the atmospheric correction, multi-angular bi-directional reflectance distribution function (BRDF) modelling, as well as spectral conversions. To validate space-based albedo products, an in situ tower albedometer is often used to provide continuous “ground truth” measurements of surface albedo over an extended area. Since space-based albedo and tower-measured albedo are produced at different spatial scales, they can be directly compared only for specific homogeneous land surfaces. However, most land surfaces are inherently heterogeneous with surface properties that vary over a wide range of spatial scales. In this work, tower-measured albedo products, including both directional hemispherical reflectance (DHR) and bi-hemispherical reflectance (BHR), are upscaled to coarse satellite spatial resolutions using a new method. This strategy uses high-resolution satellite derived surface albedos to fill the gaps between the albedometer’s field-of-view (FoV) and coarse satellite scales. The high-resolution surface albedo is generated from a combination of surface reflectance retrieved from high-resolution Earth Observation (HR-EO) data and moderate resolution imaging spectroradiometer (MODIS) BRDF climatology over a larger area. We implemented a recently developed atmospheric correction method, the Sensor Invariant Atmospheric Correction (SIAC), to retrieve surface reflectance from HR-EO (e.g., Sentinel-2 and Landsat-8) top-of-atmosphere (TOA) reflectance measurements. This SIAC processing provides an estimated uncertainty for the retrieved surface spectral reflectance at the HR-EO pixel level and shows excellent agreement with the standard Landsat 8 Surface Reflectance Code (LaSRC) in retrieving Landsat-8 surface reflectance. Atmospheric correction of Sentinel-2 data is vastly improved by SIAC when compared against the use of in situ AErosol RObotic NETwork (AERONET) data. Based on this, we can trace the uncertainty of tower-measured albedo during its propagation through high-resolution EO measurements up to coarse satellite scales. These upscaled albedo products can then be compared with space-based albedo products over heterogeneous land surfaces. In this study, both tower-measured albedo and upscaled albedo products are examined at Ground Based Observation for Validation (GbOV) stations (https://land.copernicus.eu/global/gbov/), and used to compare with satellite observations, including Copernicus Global Land Service (CGLS) based on ProbaV and VEGETATION 2 data, MODIS and multi-angle imaging spectroradiometer (MISR).
Surface albedo is of crucial interest in land–climate interaction studies, since it is a key parameter that affects the Earth’s radiation budget. The temporal and spatial variation of surface albedo can be retrieved from conventional satellite observations after a series of processes, including atmospheric correction to surface spectral bi-directional reflectance factor (BRF), bi-directional reflectance distribution function (BRDF) modelling using these BRFs, and, where required, narrow-to-broadband albedo conversions. This processing chain introduces errors that can be accumulated and then affect the accuracy of the retrieved albedo products. In this study, the albedo products derived from the multi-angle imaging spectroradiometer (MISR), moderate resolution imaging spectroradiometer (MODIS) and the Copernicus Global Land Service (CGLS), based on the VEGETATION and now the PROBA-V sensors, are compared with albedometer and upscaled in situ measurements from 19 tower sites from the FLUXNET network, surface radiation budget network (SURFRAD) and Baseline Surface Radiation Network (BSRN) networks. The MISR sensor onboard the Terra satellite has 9 cameras at different view angles, which allows a near-simultaneous retrieval of surface albedo. Using a 16-day retrieval algorithm, the MODIS generates the daily albedo products (MCD43A) at a 500-m resolution. The CGLS albedo products are derived from the VEGETATION and PROBA-V, and updated every 10 days using a weighted 30-day window. We describe a newly developed method to derive the two types of albedo, which are directional hemispherical reflectance (DHR) and bi-hemispherical reflectance (BHR), directly from three tower-measured variables of shortwave radiation: downwelling, upwelling and diffuse shortwave radiation. In the validation process, the MISR, MODIS and CGLS-derived albedos (DHR and BHR) are first compared with tower measured albedos, using pixel-to-point analysis, between 2012 to 2016. The tower measured point albedos are then upscaled to coarse-resolution albedos, based on atmospherically corrected BRFs from high-resolution Earth observation (HR-EO) data, alongside MODIS BRDF climatology from a larger area. Then a pixel-to-pixel comparison is performed between DHR and BHR retrieved from coarse-resolution satellite observations and DHR and BHR upscaled from accurate tower measurements. The experimental results are presented on exploring the parameter space associated with land cover type, heterogeneous vs. homogeneous and instantaneous vs. time composite retrievals of surface albedo.
Barajas, Carlos; Guo, Pei; Mukherjee, Lipi; Hoban, Susan; Wang, Jianwu; Jin, Daeho; Gangopadhyay, Aryya; Gobbert, Matthias K
(, International Symposium on Benchmarking, Measuring and Optimization)
The study of clouds, i.e., where they occur and what are their characteristics, plays a key role in the understanding of climate change. Clustering is a common machine learning technique used in atmospheric science to classify cloud types. Many parallelism techniques e.g., MPI, OpenMP and Spark, could achieve efficient and scalable clustering of large-scale satellite observation data. In order to understand their differences, this paper studies and compares three different approaches on parallel clustering of satellite observation data. Benchmarking experiments with k-means clustering are conducted with three parallelism techniques, namely OpenMP, OpenMP+MPI, and Spark, on a HPC cluster using up to 16 nodes.
Buscombe, Daniel; Wernette, Phillipe; Fitzpatrick, Sharon; Favela, Jaycee; Goldstein, Evan B.; Enwright, Nicholas M.
(, Scientific Data)
Abstract The world’s coastlines are spatially highly variable, coupled-human-natural systems that comprise a nested hierarchy of component landforms, ecosystems, and human interventions, each interacting over a range of space and time scales. Understanding and predicting coastline dynamics necessitates frequent observation from imaging sensors on remote sensing platforms. Machine Learning models that carry out supervised (i.e., human-guided) pixel-based classification, or image segmentation, have transformative applications in spatio-temporal mapping of dynamic environments, including transient coastal landforms, sediments, habitats, waterbodies, and water flows. However, these models require large and well-documented training and testing datasets consisting of labeled imagery. We describe “Coast Train,” a multi-labeler dataset of orthomosaic and satellite images of coastal environments and corresponding labels. These data include imagery that are diverse in space and time, and contain 1.2 billion labeled pixels, representing over 3.6 million hectares. We use a human-in-the-loop tool especially designed for rapid and reproducible Earth surface image segmentation. Our approach permits image labeling by multiple labelers, in turn enabling quantification of pixel-level agreement over individual and collections of images.
Wang, Jianwu, Huang, Xin, Zheng, Jianyu, Rajapakshe, Chamara, Kay, Savio, Kandoor, Lakshmi, Maxwell, Thomas, and Zhang, Zhibo. Scalable Aggregation Service for Satellite Remote Sensing Data. Retrieved from https://par.nsf.gov/biblio/10303961. Proceedings of the 20th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2020) . Web. doi:10.1007/978-3-030-60239-0_13.
Wang, Jianwu, Huang, Xin, Zheng, Jianyu, Rajapakshe, Chamara, Kay, Savio, Kandoor, Lakshmi, Maxwell, Thomas, & Zhang, Zhibo. Scalable Aggregation Service for Satellite Remote Sensing Data. Proceedings of the 20th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2020), (). Retrieved from https://par.nsf.gov/biblio/10303961. https://doi.org/10.1007/978-3-030-60239-0_13
Wang, Jianwu, Huang, Xin, Zheng, Jianyu, Rajapakshe, Chamara, Kay, Savio, Kandoor, Lakshmi, Maxwell, Thomas, and Zhang, Zhibo.
"Scalable Aggregation Service for Satellite Remote Sensing Data". Proceedings of the 20th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2020) (). Country unknown/Code not available. https://doi.org/10.1007/978-3-030-60239-0_13.https://par.nsf.gov/biblio/10303961.
@article{osti_10303961,
place = {Country unknown/Code not available},
title = {Scalable Aggregation Service for Satellite Remote Sensing Data},
url = {https://par.nsf.gov/biblio/10303961},
DOI = {10.1007/978-3-030-60239-0_13},
abstractNote = {With the advances of satellite remote sensing techniques, we are receiving huge amount of satellite observation data for the Earth. While the data greatly helps Earth scientists on their research, conduct- ing data processing and analytics from the data is getting more and more time consuming and complicated. One common data processing task is to aggregate satellite observation data from original pixel level to latitude-longitude grid level to easily obtain global information and work with global climate models. This paper focuses on how to best aggregate NASA MODIS satellite data products from pixel level to grid level in a distributed environment and provision the aggregation capa- bility as a service for Earth scientists to use easily. We propose three different approaches of parallel data aggregation and employ three par- allel platforms (Spark, Dask and MPI) to implement the approaches. We run extensive experiments based on these parallel approaches and platforms on a local cluster to benchmark their differences in execution performance and discuss key factors to achieve good speedup. We also study how to make the provisioned service adaptable to different service libraries and protocols via a unified framework.},
journal = {Proceedings of the 20th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2020)},
author = {Wang, Jianwu and Huang, Xin and Zheng, Jianyu and Rajapakshe, Chamara and Kay, Savio and Kandoor, Lakshmi and Maxwell, Thomas and Zhang, Zhibo},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.