With the increase in data-driven analytics, the demand for high performing computing resources has risen. There are many high-performance computing centers providing cyberinfrastructure (CI) for academic research. However, there exists access barriers in bringing these resources to a broad range of users. Users who are new to data analytics field are not yet equipped to take advantage of the tools offered by CI. In this paper, we propose a framework to lower the access barriers that exist in bringing the high-performance computing resources to users that do not have the training to utilize the capability of CI. The framework uses divide-and-conquer (DC) paradigm for data-intensive computing tasks. It consists of three major components - user interface (UI), parallel scripts generator (PSG) and underlying cyberinfrastructure (CI). The goal of the framework is to provide a user-friendly method for parallelizing data-intensive computing tasks with minimal user intervention. Some of the key design goals are usability, scalability and reproducibility. The users can focus on their problem and leave the parallelization details to the framework.
more »
« less
Parallel R Computing on the Web
R is the preferred language for Data analytics due to its open source development and high extensibility. Exponential growth in data has caused longer processing times leading to the rise in parallel computing technologies for analysis. Using R together with high performance computing resources is a cumbersome task. This paper proposes a framework that provides users with access to high-performance computing resources and simplifies the configuration, programming, uploading data and job scheduling through a web user interface. In addition to that, it provides two modes of parallelization of data-intensive computing tasks, catering to a wide range of users. The case studies emphasize the utility and efficiency of the framework. The framework provides better performance, ease of use and high scalability.
more »
« less
- Award ID(s):
- 1726532
- PAR ID:
- 10168177
- Date Published:
- Journal Name:
- 2019 IEEE International Conference on Big Data (Big Data)
- Page Range / eLocation ID:
- 3416 to 3423
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This work presents a framework for estimating job wait times in High-Performance Computing (HPC) scheduling queues, leverag- ing historical job scheduling data and real-time system metrics. Using machine learning techniques, specifically Random Forest and Multi-Layer Perceptron (MLP) models, we demonstrate high accuracy in predicting wait times, achieving 94.2% reliability within a 10-minute error margin. The framework incorporates key fea- tures such as requested resources, queue occupancy, and system utilization, with ablation studies revealing the significance of these features. Additionally, the framework offers users wait time esti- mates for different resource configurations, enabling them to select optimal resources, reduce delays, and accelerate computational workloads. Our approach provides valuable insights for both users and administrators to optimize job scheduling, contributing to more efficient resource management and faster time to scientific results.more » « less
-
With distributed communication, computation, and storage resources close to end users, edge computing has great potentials to support delay-sensitive industrial applications involving intelligent edge devices. Cognitive portable ground penetrating radars (GPRs) are expected to achieve high-quality sensing performance in a variety of industrial environments by operating intelligently and adaptively under varying sensing conditions. Although edge computing makes it very promising to develop cognitive portable GPRs, both strict performance requirement and trade-offs between communication and computation pose significant challenges. This paper presents an edge computing framework for cognitive portable GPRs. Specifically, the system architecture of an EC-enabled cognitive portable GPR is developed. Based on the identification of various involved computation tasks, an offloading policy was proposed to determine whether computation tasks should be executed locally or offloaded to the edge server. Experimental results show the efficacy of the proposed methods. The framework also provides insight into the design of cognitive Internet of things (IoT) supported by edge computing.more » « less
-
Abstract Computer applications often leave traces or residues that enable forensic examiners to gain a detailed understanding of the actions a user performed on a computer. Such digital breadcrumbs are left by a large variety of applications, potentially (and indeed likely) unbeknownst to their users. This paper presents the concept of residue-free computing in which a user can operate any existing application installed on their computer in a mode that prevents trace data from being recorded to disk, thus frustrating the forensic process and enabling more privacy-preserving computing. In essence, residue-free computing provides an “incognito mode” for any application. We introduce our implementation of residue-free computing, R esidue F ree , and motivate R esidue F ree by inventorying the potentially sensitive and privacy-invasive residue left by popular applications. We demonstrate that R esidue F ree allows users to operate these applications without leaving trace data, while incurring modest performance overheads.more » « less
-
Geospatial research and education have become increasingly dependent on cyberGIS to tackle computation and data challenges. However, the use of advanced cyberinfrastructure resources for geospatial research and education is extremely challenging due to both high learning curve for users and high software development and integration costs for developers, due to limited availability of middleware tools available to make such resources easily accessible. This tutorial describes CyberGIS-Compute as a middleware framework that addresses these challenges and provides access to high-performance resources through simple easy to use interfaces. The CyberGIS-Compute framework provides an easy to use application interface and a Python SDK to provide access to CyberGIS capabilities, allowing geospatial applications to easily scale and employ advanced cyberinfrastructure resources. In this tutorial, we will first start with the basics of CyberGISJupyter and CyberGIS-Compute, then introduce the Python SDK for CyberGIS-Compute with a simple Hello World example. Then, we will take multiple real-world geospatial applications use-cases like spatial accessibility and wildfire evacuation simulation using agent based modeling. We will also provide pointers on how to contribute applications to the CyberGIS-Compute framework.more » « less
An official website of the United States government

