skip to main content


Title: TopoLens: Building a CyberGIS community data service for enhancing the usability of high‐resolution national topographic datasets
Summary

In recent years, geospatial data have exploded to massive volume and diversity and subsequently cause serious usability issues for researchers in various scientific areas. This paper describes a cyberGIS community data service framework to facilitate geospatial big data access, processing, and sharing based on a hybrid supercomputer architecture. Specifically, the framework aims to enhance the usability of national elevation dataset released by the U.S. Geological Survey in the contiguous United States at the resolution ofarc‐second. A community data service, namely TopoLens, is created to demonstrate the workflow integration of national elevation dataset and the associated computation and analysis. Two user‐friendly environments, including a publicly available web application and a private workspace based on the Jupyter notebook, are provided for users to access both precomputed and on‐demand computed high‐resolution elevation data. The system architecture of TopoLens is implemented by exploiting the ROGER supercomputer, the first cyberGIS supercomputer dedicated to geospatial problem‐solving. The usability of TopoLens has been acknowledged in the topographic user community evaluation.

 
more » « less
NSF-PAR ID:
10075762
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Concurrency and Computation: Practice and Experience
Volume:
31
Issue:
16
ISSN:
1532-0626
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    The interdisciplinary field of cyberGIS (geographic information science and systems (GIS) based on advanced cyberinfrastructure) has a major focus on data‐ and computation‐intensive geospatial analytics. The rapidly growing needs across many application and science domains for such analytics based on disparate geospatial big data poses significant challenges to conventional GIS approaches. This paper describes CyberGIS‐Jupyter, an innovative cyberGIS framework for achieving data‐intensive, reproducible, and scalable geospatial analytics using Jupyter Notebook based on ROGER, the first cyberGIS supercomputer. The framework adapts the Notebook with built‐in cyberGIS capabilities to accelerate gateway application development and sharing while associated data, analytics, and workflow runtime environments are encapsulated into application packages that can be elastically reproduced through cloud‐computing approaches. As a desirable outcome, data‐intensive and scalable geospatial analytics can be efficiently developed and improved and seamlessly reproduced among multidisciplinary users in a novel cyberGIS science gateway environment.

     
    more » « less
  2. Geospatial research and education have become increasingly dependent on cyberGIS to tackle computation and data challenges. However, the use of advanced cyberinfrastructure resources for geospatial research and education is extremely challenging due to both high learning curve for users and high software development and integration costs for developers, due to limited availability of middleware tools available to make such resources easily accessible. This tutorial describes CyberGIS-Compute as a middleware framework that addresses these challenges and provides access to high-performance resources through simple easy to use interfaces. The CyberGIS-Compute framework provides an easy to use application interface and a Python SDK to provide access to CyberGIS capabilities, allowing geospatial applications to easily scale and employ advanced cyberinfrastructure resources. In this tutorial, we will first start with the basics of CyberGISJupyter and CyberGIS-Compute, then introduce the Python SDK for CyberGIS-Compute with a simple Hello World example. Then, we will take multiple real-world geospatial applications use-cases like spatial accessibility and wildfire evacuation simulation using agent based modeling. We will also provide pointers on how to contribute applications to the CyberGIS-Compute framework. 
    more » « less
  3. Romanach, Stephanie S. (Ed.)
    Massive biological databases of species occurrences, or georeferenced locations where a species has been observed, are essential inputs for modeling present and future species distributions. Location accuracy is often assessed by determining whether the observation geocoordinates fall within the boundaries of the declared political divisions. This otherwise simple validation is complicated by the difficulty of matching political division names to the correct geospatial object. Spelling errors, abbreviations, alternative codes, and synonyms in multiple languages present daunting name disambiguation challenges. The inability to resolve political division names reduces usable data, and analysis of erroneous observations can lead to flawed results. Here, we present the Geographic Name Resolution Service (GNRS), an application for correcting, standardizing, and indexing world political division names. The GNRS resolves political division names against a reference database that combines names and codes from GeoNames with geospatial object identifiers from the Global Administrative Areas Database (GADM). In a trial resolution of political division names extracted from >270 million species occurrences, only 1.9%, representing just 6% of occurrences, matched exactly to GADM political divisions in their original form. The GNRS was able to resolve, completely or in part, 92% of the remaining 378,568 political division names, or 86% of the full biodiversity occurrence dataset. In assessing geocoordinate accuracy for >239 million species occurrences, resolution of political divisions by the GNRS enabled the detection of an order of magnitude more errors and an order of magnitude more error-free occurrences. By providing a novel solution to a significant data quality impediment, the GNRS liberates a tremendous amount of biodiversity data for quantitative biodiversity research. The GNRS runs as a web service and is accessible via an API, an R package, and a web-based graphical user interface. Its modular architecture is easily integrated into existing data validation workflows. 
    more » « less
  4. Abstract

    Many chemical processes depend non‐linearly on temperature. Gravity‐wave‐induced temperature perturbations have been shown to affect atmospheric chemistry, but accounting for this process in chemistry‐climate models has been a challenge because many gravity waves have scales smaller than the typical model resolution. Here, we present a method to account for subgrid‐scale orographic gravity‐wave‐induced temperature perturbations on the global scale for the Whole Atmosphere Community Climate Model. Temperature perturbation amplitudesconsistent with the model's subgrid‐scale gravity wave parameterization are derived and then used as a sinusoidal temperature perturbation in the model's chemistry solver. Because of limitations in the parameterization, we explore scaling ofbetween 0.6 and 1 based on comparisons to altitude‐dependentdistributions of satellite and reanalysis data, where we discuss uncertainties. We probe the impact on the chemistry from the grid‐point to global scales, and show that the parameterization is able to represent mountain wave events as reported by previous literature. The gravity waves for example, lead to increased surface area densities of stratospheric aerosols. This increases chlorine activation, with impacts on the associated chemical composition. We obtain large local changes in some chemical species (e.g., active chlorine, NOx, N2O5) which are likely to be important for comparisons to airborne or satellite observations, but the changes to ozone loss are more modest. This approach enables the chemistry‐climate modeling community to account for subgrid‐scale gravity wave temperature perturbations interactively, consistent with the internal parameterizations and are expected to yield more realistic interactions and better representation of the chemistry.

     
    more » « less
  5. Abstract

    Data required to calibrate uncertain general circulation model (GCM) parameterizations are often only available in limited regions or time periods, for example, observational data from field campaigns, or data generated in local high‐resolution simulations. This raises the question of where and when to acquire additional data to be maximally informative about parameterizations in a GCM. Here we construct a new ensemble‐based parallel algorithm to automatically target data acquisition to regions and times that maximize the uncertainty reduction, or information gain, about GCM parameters. The algorithm uses a Bayesian framework that exploits a quantified distribution of GCM parameters as a measure of uncertainty. This distribution is informed by time‐averaged climate statistics restricted to local regions and times. The algorithm is embedded in the recently developed calibrate‐emulate‐sample framework, which performs efficient model calibration and uncertainty quantification with onlymodel evaluations, compared withevaluations typically needed for traditional approaches to Bayesian calibration. We demonstrate the algorithm with an idealized GCM, with which we generate surrogates of local data. In this perfect‐model setting, we calibrate parameters and quantify uncertainties in a quasi‐equilibrium convection scheme in the GCM. We consider targeted data that are (a) localized in space for statistically stationary simulations, and (b) localized in space and time for seasonally varying simulations. In these proof‐of‐concept applications, the calculated information gain reflects the reduction in parametric uncertainty obtained from Bayesian inference when harnessing a targeted sample of data. The largest information gain typically, but not always, results from regions near the intertropical convergence zone.

     
    more » « less