skip to main content


Title: Parallel R Computing on the Web
R is the preferred language for Data analytics due to its open source development and high extensibility. Exponential growth in data has caused longer processing times leading to the rise in parallel computing technologies for analysis. Using R together with high performance computing resources is a cumbersome task. This paper proposes a framework that provides users with access to high-performance computing resources and simplifies the configuration, programming, uploading data and job scheduling through a web user interface. In addition to that, it provides two modes of parallelization of data-intensive computing tasks, catering to a wide range of users. The case studies emphasize the utility and efficiency of the framework. The framework provides better performance, ease of use and high scalability.  more » « less
Award ID(s):
1726532
NSF-PAR ID:
10168177
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2019 IEEE International Conference on Big Data (Big Data)
Page Range / eLocation ID:
3416 to 3423
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. With the increase in data-driven analytics, the demand for high performing computing resources has risen. There are many high-performance computing centers providing cyberinfrastructure (CI) for academic research. However, there exists access barriers in bringing these resources to a broad range of users. Users who are new to data analytics field are not yet equipped to take advantage of the tools offered by CI. In this paper, we propose a framework to lower the access barriers that exist in bringing the high-performance computing resources to users that do not have the training to utilize the capability of CI. The framework uses divide-and-conquer (DC) paradigm for data-intensive computing tasks. It consists of three major components - user interface (UI), parallel scripts generator (PSG) and underlying cyberinfrastructure (CI). The goal of the framework is to provide a user-friendly method for parallelizing data-intensive computing tasks with minimal user intervention. Some of the key design goals are usability, scalability and reproducibility. The users can focus on their problem and leave the parallelization details to the framework. 
    more » « less
  2. With distributed communication, computation, and storage resources close to end users, edge computing has great potentials to support delay-sensitive industrial applications involving intelligent edge devices. Cognitive portable ground penetrating radars (GPRs) are expected to achieve high-quality sensing performance in a variety of industrial environments by operating intelligently and adaptively under varying sensing conditions. Although edge computing makes it very promising to develop cognitive portable GPRs, both strict performance requirement and trade-offs between communication and computation pose significant challenges. This paper presents an edge computing framework for cognitive portable GPRs. Specifically, the system architecture of an EC-enabled cognitive portable GPR is developed. Based on the identification of various involved computation tasks, an offloading policy was proposed to determine whether computation tasks should be executed locally or offloaded to the edge server. Experimental results show the efficacy of the proposed methods. The framework also provides insight into the design of cognitive Internet of things (IoT) supported by edge computing. 
    more » « less
  3. Summary

    The GLEON Research And PRAGMA Lake Expedition—GRAPLE—is a collaborative effort between computer science and lake ecology researchers. It aims to improve our understanding and predictive capacity of the threats to the water quality of our freshwater resources, including climate change. This paper presents GRAPLEr, a distributed computing system used to address the modeling needs of GRAPLE researchers. GRAPLEr integrates and applies overlay virtual network, high‐throughput computing, and WEB service technologies in a novel way. First, its user‐level IP‐over‐P2P overlay network allows compute and storage resources distributed across independently administered institutions (including private and public clouds) to be aggregated into a common virtual network, despite the presence of firewalls and network address translators. Second, resources aggregated by the IP‐over‐P2P virtual network run unmodified high‐throughput‐computing middleware to enable large numbers of model simulations to be executed concurrently across the distributed computing resources. Third, a WEB service interface allows end users to submit job requests to the system using client libraries that integrate with the R statistical computing environment. The paper presents the GRAPLEr architecture, describes its implementation and reports on its performance for batches of general lake model simulations across 3 cloud infrastructures (University of Florida, CloudLab, and Microsoft Azure).

     
    more » « less
  4. Abstract Each year vast international resources are wasted on irreproducible research. The scientific community has been slow to adopt standard software engineering practices, despite the increases in high-dimensional data, complexities of workflows, and computational environments. Here we show how scientific software applications can be created in a reproducible manner when simple design goals for reproducibility are met. We describe the implementation of a test server framework and 40 scientific benchmarks, covering numerous applications in Rosetta bio-macromolecular modeling. High performance computing cluster integration allows these benchmarks to run continuously and automatically. Detailed protocol captures are useful for developers and users of Rosetta and other macromolecular modeling tools. The framework and design concepts presented here are valuable for developers and users of any type of scientific software and for the scientific community to create reproducible methods. Specific examples highlight the utility of this framework, and the comprehensive documentation illustrates the ease of adding new tests in a matter of hours. 
    more » « less
  5. Geospatial research and education have become increasingly dependent on cyberGIS to tackle computation and data challenges. However, the use of advanced cyberinfrastructure resources for geospatial research and education is extremely challenging due to both high learning curve for users and high software development and integration costs for developers, due to limited availability of middleware tools available to make such resources easily accessible. This tutorial describes CyberGIS-Compute as a middleware framework that addresses these challenges and provides access to high-performance resources through simple easy to use interfaces. The CyberGIS-Compute framework provides an easy to use application interface and a Python SDK to provide access to CyberGIS capabilities, allowing geospatial applications to easily scale and employ advanced cyberinfrastructure resources. In this tutorial, we will first start with the basics of CyberGISJupyter and CyberGIS-Compute, then introduce the Python SDK for CyberGIS-Compute with a simple Hello World example. Then, we will take multiple real-world geospatial applications use-cases like spatial accessibility and wildfire evacuation simulation using agent based modeling. We will also provide pointers on how to contribute applications to the CyberGIS-Compute framework. 
    more » « less