skip to main content


Title: Scalable Aggregation Service for Satellite Remote Sensing Data
With the advances of satellite remote sensing techniques, we are receiving huge amount of satellite observation data for the Earth. While the data greatly helps Earth scientists on their research, conduct- ing data processing and analytics from the data is getting more and more time consuming and complicated. One common data processing task is to aggregate satellite observation data from original pixel level to latitude-longitude grid level to easily obtain global information and work with global climate models. This paper focuses on how to best aggregate NASA MODIS satellite data products from pixel level to grid level in a distributed environment and provision the aggregation capa- bility as a service for Earth scientists to use easily. We propose three different approaches of parallel data aggregation and employ three par- allel platforms (Spark, Dask and MPI) to implement the approaches. We run extensive experiments based on these parallel approaches and platforms on a local cluster to benchmark their differences in execution performance and discuss key factors to achieve good speedup. We also study how to make the provisioned service adaptable to different service libraries and protocols via a unified framework.  more » « less
Award ID(s):
1730250 1726023
NSF-PAR ID:
10303961
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the 20th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2020)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    MODIS (Moderate Resolution Imaging Spectroradiometer) is a key instrument onboard NASA’s Terra (launched in 1999) and Aqua (launched in 2002) satellite missions as part of the more extensive Earth Observation System (EOS). By measuring the reflection and emission by the Earth-Atmosphere system in 36 spectral bands from the visible to thermal infrared with near-daily global coverage and high-spatial-resolution (250 m ~ 1 km at nadir), MODIS is playing a vital role in developing validated, global, interactive Earth system models. MODIS products are processed into three levels, i.e., Level-1 (L1), Level-2 (L2) and Level-3 (L3). To shift the current static and “one-size-fits-all” data provision method of MODIS products, in this paper, we propose a service-oriented flexible and efficient MODIS aggregation framework. Using this framework, users only need to get aggregated MODIS L3 data based on their unique requirements and the aggregation can run in parallel to achieve a speedup. The experiments show that our aggregation results are almost identical to the current MODIS L3 products and our parallel execution with 8 computing nodes can work 88.63 times faster than a serial code execution on a single node. 
    more » « less
  2. null (Ed.)
    Surface albedo is a fundamental radiative parameter as it controls the Earth’s energy budget and directly affects the Earth’s climate. Satellite observations have long been used to capture the temporal and spatial variations of surface albedo because of their continuous global coverage. However, space-based albedo products are often affected by errors in the atmospheric correction, multi-angular bi-directional reflectance distribution function (BRDF) modelling, as well as spectral conversions. To validate space-based albedo products, an in situ tower albedometer is often used to provide continuous “ground truth” measurements of surface albedo over an extended area. Since space-based albedo and tower-measured albedo are produced at different spatial scales, they can be directly compared only for specific homogeneous land surfaces. However, most land surfaces are inherently heterogeneous with surface properties that vary over a wide range of spatial scales. In this work, tower-measured albedo products, including both directional hemispherical reflectance (DHR) and bi-hemispherical reflectance (BHR), are upscaled to coarse satellite spatial resolutions using a new method. This strategy uses high-resolution satellite derived surface albedos to fill the gaps between the albedometer’s field-of-view (FoV) and coarse satellite scales. The high-resolution surface albedo is generated from a combination of surface reflectance retrieved from high-resolution Earth Observation (HR-EO) data and moderate resolution imaging spectroradiometer (MODIS) BRDF climatology over a larger area. We implemented a recently developed atmospheric correction method, the Sensor Invariant Atmospheric Correction (SIAC), to retrieve surface reflectance from HR-EO (e.g., Sentinel-2 and Landsat-8) top-of-atmosphere (TOA) reflectance measurements. This SIAC processing provides an estimated uncertainty for the retrieved surface spectral reflectance at the HR-EO pixel level and shows excellent agreement with the standard Landsat 8 Surface Reflectance Code (LaSRC) in retrieving Landsat-8 surface reflectance. Atmospheric correction of Sentinel-2 data is vastly improved by SIAC when compared against the use of in situ AErosol RObotic NETwork (AERONET) data. Based on this, we can trace the uncertainty of tower-measured albedo during its propagation through high-resolution EO measurements up to coarse satellite scales. These upscaled albedo products can then be compared with space-based albedo products over heterogeneous land surfaces. In this study, both tower-measured albedo and upscaled albedo products are examined at Ground Based Observation for Validation (GbOV) stations (https://land.copernicus.eu/global/gbov/), and used to compare with satellite observations, including Copernicus Global Land Service (CGLS) based on ProbaV and VEGETATION 2 data, MODIS and multi-angle imaging spectroradiometer (MISR). 
    more » « less
  3. Surface albedo is of crucial interest in land–climate interaction studies, since it is a key parameter that affects the Earth’s radiation budget. The temporal and spatial variation of surface albedo can be retrieved from conventional satellite observations after a series of processes, including atmospheric correction to surface spectral bi-directional reflectance factor (BRF), bi-directional reflectance distribution function (BRDF) modelling using these BRFs, and, where required, narrow-to-broadband albedo conversions. This processing chain introduces errors that can be accumulated and then affect the accuracy of the retrieved albedo products. In this study, the albedo products derived from the multi-angle imaging spectroradiometer (MISR), moderate resolution imaging spectroradiometer (MODIS) and the Copernicus Global Land Service (CGLS), based on the VEGETATION and now the PROBA-V sensors, are compared with albedometer and upscaled in situ measurements from 19 tower sites from the FLUXNET network, surface radiation budget network (SURFRAD) and Baseline Surface Radiation Network (BSRN) networks. The MISR sensor onboard the Terra satellite has 9 cameras at different view angles, which allows a near-simultaneous retrieval of surface albedo. Using a 16-day retrieval algorithm, the MODIS generates the daily albedo products (MCD43A) at a 500-m resolution. The CGLS albedo products are derived from the VEGETATION and PROBA-V, and updated every 10 days using a weighted 30-day window. We describe a newly developed method to derive the two types of albedo, which are directional hemispherical reflectance (DHR) and bi-hemispherical reflectance (BHR), directly from three tower-measured variables of shortwave radiation: downwelling, upwelling and diffuse shortwave radiation. In the validation process, the MISR, MODIS and CGLS-derived albedos (DHR and BHR) are first compared with tower measured albedos, using pixel-to-point analysis, between 2012 to 2016. The tower measured point albedos are then upscaled to coarse-resolution albedos, based on atmospherically corrected BRFs from high-resolution Earth observation (HR-EO) data, alongside MODIS BRDF climatology from a larger area. Then a pixel-to-pixel comparison is performed between DHR and BHR retrieved from coarse-resolution satellite observations and DHR and BHR upscaled from accurate tower measurements. The experimental results are presented on exploring the parameter space associated with land cover type, heterogeneous vs. homogeneous and instantaneous vs. time composite retrievals of surface albedo. 
    more » « less
  4. Ocean colour is recognised as an Essential Climate Variable (ECV) by the Global Climate Observing System (GCOS); and spectrally-resolved water-leaving radiances (or remote-sensing reflectances) in the visible domain, and chlorophyll-a concentration are identified as required ECV products. Time series of the products at the global scale and at high spatial resolution, derived from ocean-colour data, are key to studying the dynamics of phytoplankton at seasonal and inter-annual scales; their role in marine biogeochemistry; the global carbon cycle; the modulation of how phytoplankton distribute solar-induced heat in the upper layers of the ocean; and the response of the marine ecosystem to climate variability and change. However, generating a long time series of these products from ocean-colour data is not a trivial task: algorithms that are best suited for climate studies have to be selected from a number that are available for atmospheric correction of the satellite signal and for retrieval of chlorophyll-a concentration; since satellites have a finite life span, data from multiple sensors have to be merged to create a single time series, and any uncorrected inter-sensor biases could introduce artefacts in the series, e.g., different sensors monitor radiances at different wavebands such that producing a consistent time series of reflectances is not straightforward. Another requirement is that the products have to be validated against in situ observations. Furthermore, the uncertainties in the products have to be quantified, ideally on a pixel-by-pixel basis, to facilitate applications and interpretations that are consistent with the quality of the data. This paper outlines an approach that was adopted for generating an ocean-colour time series for climate studies, using data from the MERIS (MEdium spectral Resolution Imaging Spectrometer) sensor of the European Space Agency; the SeaWiFS (Sea-viewing Wide-Field-of-view Sensor) and MODIS-Aqua (Moderate-resolution Imaging Spectroradiometer-Aqua) sensors from the National Aeronautics and Space Administration (USA); and VIIRS (Visible and Infrared Imaging Radiometer Suite) from the National Oceanic and Atmospheric Administration (USA). The time series now covers the period from late 1997 to end of 2018. To ensure that the products meet, as well as possible, the requirements of the user community, marine-ecosystem modellers, and remote-sensing scientists were consulted at the outset on their immediate and longer-term requirements as well as on their expectations of ocean-colour data for use in climate research. Taking the user requirements into account, a series of objective criteria were established, against which available algorithms for processing ocean-colour data were evaluated and ranked. The algorithms that performed best with respect to the climate user requirements were selected to process data from the satellite sensors. Remote-sensing reflectance data from MODIS-Aqua, MERIS, and VIIRS were band-shifted to match the wavebands of SeaWiFS. Overlapping data were used to correct for mean biases between sensors at every pixel. The remote-sensing reflectance data derived from the sensors were merged, and the selected in-water algorithm was applied to the merged data to generate maps of chlorophyll concentration, inherent optical properties at SeaWiFS wavelengths, and the diffuse attenuation coefficient at 490 nm. The merged products were validated against in situ observations. The uncertainties established on the basis of comparisons with in situ data were combined with an optical classification of the remote-sensing reflectance data using a fuzzy-logic approach, and were used to generate uncertainties (root mean square difference and bias) for each product at each pixel. 
    more » « less
  5. Assimilation of remote-sensing products of sea ice thickness (SIT) into sea ice–ocean models has been shown to improve the quality of sea ice forecasts. Key open questions are whether assimilation of lower-level data products such as radar freeboard (RFB) can further improve model performance and what performance gains can be achieved through joint assimilation of these data products in combination with a snow depth product. The Arctic Mission Benefit Analysis system was developed to address this type of question. Using the quantitative network design (QND) approach, the system can evaluate, in a mathematically rigorous fashion, the observational constraints imposed by individual and groups of data products. We demonstrate the approach by presenting assessments of the observation impact (added value) of different Earth observation (EO) products in terms of the uncertainty reduction in a 4-week forecast of sea ice volume (SIV) and snow volume (SNV) for three regions along the Northern Sea Route in May 2015 using a coupled model of the sea ice–ocean system, specifically the Max Planck Institute Ocean Model. We assess seven satellite products: three real products and four hypothetical products. The real products are monthly SIT, sea ice freeboard (SIFB), and RFB, all derived from CryoSat-2 by the AlfredWegener Institute. These are complemented by two hypothetical monthly laser freeboard (LFB) products with low and high accuracy, as well as two hypothetical monthly snow depth products with low and high accuracy. On the basis of the per-pixel uncertainty ranges provided with the CryoSat-2 SIT, SIFB, and RFB products, the SIT and RFB achieve a much better performance for SIV than the SIFB product. For SNV, the performance of SIT is only low, the performance of SIFB is higher and the performance of RFB is yet higher. A hypothetical LFB product with low accuracy (20 cm uncertainty) falls between SIFB and RFB in performance for both SIV and SNV. A reduction in the uncertainty of the LFB product to 2 cm yields a significant increase in performance. Combining either of the SIT or freeboard products with a hypothetical snow depth product achieves a significant performance increase. The uncertainty in the snow product matters: a higher-accuracy product achieves an extra performance gain. Providing spatial and temporal uncertainty correlations with the EO products would be beneficial not only for QND assessments, but also for assimilation of the products. 
    more » « less