skip to main content


Title: Evaluating the Effectiveness of an Online Learning Platform in Transitioning Users from a High Performance Computing to a Commercial Cloud Computing Environment
Developments in large scale computing environments have led to design of workflows that rely on containers and analytics platform that are well supported by the commercial cloud. The National Science Foundation also envisions a future in science and engineering that includes commercial cloud service providers (CSPs) such as Amazon Web Services, Azure and Google Cloud. These twin forces have made researchers consider the commercial cloud as an alternative option to current high performance computing (HPC) environments. Training and knowledge on how to migrate workflows, cost control, data management, and system administration remain some of the commonly listed concerns with adoption of cloud computing. In an effort to ameliorate this situation, CSPs have developed online and in-person training platforms to help address this problem. Scalability, ability to impart knowledge, evaluating knowledge gain, and accreditation are the core concepts that have driven this approach. Here, we present a review of our experience using Google’s Qwiklabs online platform for remote and in-person training from the perspective of a HPC user. For this study, we completed over 50 online courses, earned five badges and attended a one-day session. We identify the strengths of the approach, identify avenues to refine them, and consider means to further community engagement. We further evaluate the readiness of these resources for a cloud-curious researcher who is familiar with HPC. Finally, we present recommendations on how the large scale computing community can leverage these opportunities to work with CSPs to assist researchers nationally and at their home institutions.  more » « less
Award ID(s):
1730695 1925764
NSF-PAR ID:
10179043
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Journal of computational science education
Volume:
11
Issue:
1
ISSN:
2153-4136
Page Range / eLocation ID:
93-99
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among domain experts, mathematical modelers, and scientific computing specialists. Computationally, however, it also revealed critical gaps in the ability of researchers to exploit advanced computing systems. These challenging areas include gaining access to scalable computing systems, porting models and workflows to new systems, sharing data of varying sizes, and producing results that can be reproduced and validated by others. Informed by our team’s work in supporting public health decision makers during the COVID-19 pandemic and by the identified capability gaps in applying high-performance computing (HPC) to the modeling of complex social systems, we present the goals, requirements, and initial implementation of OSPREY, an open science platform for robust epidemic analysis. The prototype implementation demonstrates an integrated, algorithm-driven HPC workflow architecture, coordinating tasks across federated HPC resources, with robust, secure and automated access to each of the resources. We demonstrate scalable and fault-tolerant task execution, an asynchronous API to support fast time-to-solution algorithms, an inclusive, multi-language approach, and efficient wide-area data management. The example OSPREY code is made available on a public repository. 
    more » « less
  2. Composable infrastructure holds the promise of accelerating the pace of academic research and discovery by enabling researchers to tailor the resources of a machine (e.g., GPUs, storage, NICs), on-demand, to address application needs. We were first introduced to composable infrastructure in 2018, and at the same time, there was growing demand among our College of Engineering faculty for GPU systems for data science, artificial intelligence / machine learning / deep learning, and visualization. Many purchased their own individual desktop or deskside systems, a few pursued more costly cloud and HPC solutions, and others looked to the College or campus computer center for GPU resources which, at the time, were scarce. After surveying the diverse needs of our faculty and studying product offerings by a few nascent startups in the composable infrastructure sector, we applied for and received a grant from the National Science Foundation in November 2019 to purchase a mid-scale system, configured to our specifications, for use by faculty and students for research and research training. This paper describes our composable infrastructure solution and implementation for our academic community. Given how modern workflows are progressively moving to containers and cloud frameworks (using Kubernetes) and to programming notebooks (primarily Jupyter), both for ease of use and for ensuring reproducible experiments, we initially adapted these tools for our system. We have since made it simpler to use our system, and now provide our users with a public facing JupyterHub server. We also added an expansion chassis to our system to enable composable co-location, which is a shared central architecture in which our researchers can insert and integrate specialized resources (GPUs, accelerators, networking cards, etc.) needed for their research. In February 2020, installation of our system was finalized and made operational and we began providing access to faculty in the College of Engineering. Now, two years later, it is used by over 40 faculty and students plus some external collaborators for research and research training. Their use cases and experiences are briefly described in this paper. Composable infrastructure has proven to be a useful computational system for workload variability, uneven applications, and modern workflows in academic environments. 
    more » « less
  3. The landscape of research in science and engineering is heavily reliant on computation and data processing. There is continued and expanded usage by disciplines that have historically used advanced computing resources, new usage by disciplines that have not traditionally used HPC, and new modalities of the usage in Data Science, Machine Learning, and other areas of AI. Along with these new patterns have come new advanced computing resource methods and approaches, including the availability of commercial cloud resources. The Coalition for Academic Scientific Computation (CASC) has long been an advocate representing the needs of academic researchers using computational resources, sharing best practices and offering advice to create a national cyberinfrastructure to meet US science, engineering, and other academic computing needs. CASC has completed the first of what we intend to be an annual survey of academic cloud and data center usage and practices in analyzing return on investment in cyberinfrastructure. Critically important findings from this first survey include the following: many of the respondents are engaged in some form of analysis of return in research computing investments, but only a minority currently report the results of such analyses to their upper-level administration. Most respondents are experimenting with use of commercial cloud resources but no respondent indicated that they have found use of commercial cloud services to create financial benefits compared to their current methods. There is clear correlation between levels of investment in research cyberinfrastructure and the scale of both cpu core-hours delivered and the financial level of supported research grants. Also interesting is that almost every respondent indicated that they participate in some sort of national cooperative or nationally provided research computing infrastructure project and most were involved in academic computing-related organizations, indicating a high degree of engagement by institutions of higher education in building and maintaining national research computing ecosystems. Institutions continue to evaluate cloud-based HPC service models, despite having generally concluded that so far cloud HPC is too expensive to use compared to their current methods. 
    more » « less
  4. null (Ed.)
    High Performance Computing (HPC) stands at the forefront of engineering innovation. With affordable and advanced HPC resources more readily accessible than ever before, computational simulation of complex physical phenomena becomes an increasingly attractive strategy to predict the physical behavior of diverse engineered systems. Furthermore, novel applications of HPC in engineering are highly interdisciplinary, requiring advanced skills in mathematical modeling, algorithm development as well as programming skills for parallel, distributed and concurrent architectures and environments. This and other possible reasons have created a shortage of qualified workforce to conduct the much-needed research and development in these areas. This paper describes our experience with mentoring a cohort of ten high achieving undergraduate students in Summer 2019 to conduct engineering HPC research for ten weeks in X University. Our mentoring activity was informed and motivated by an initial informal study with the goal to learn the roles and status of HPC in engineering research and what can be improved to make more effective use of it. Through a combination of email surveys, in-person interviews, and a manual analysis of faculty research profiles in X University, we learn several lessons. First, a large proportion of the engineering faculty conducts research that is highly mathematical and computational and driven by disciplinary sciences, where simulation and HPC are widely needed as solutions. Second, due to the lack of resources to provide the necessary training in software development to their students, the interviewed engineering groups are limited in their ability to fully leveraging HPC capability in their research. Therefore, novel pathways for training and educating engineering researchers in HPC software development must be explored in order to further advance the engineering research capability in HPC. With a multi-year support from NSF, our summer research mentoring activities were able to accommodate ten high-achieving undergraduate students recruited from across the USA and their faculty mentors on the theme of HPC applications in engineering research. We describe the processes of students recruitment and selection, training and engagement, research mentoring, and professional development for the students. Best practices and lessons learned are identified and summarized based on our own observations and the evaluation conducted by an independent evaluator. In particular, improvements are being planned so as to deliver a more wholistic and rigorous research experience for future cohorts. 
    more » « less
  5. Scientific workflows drive most modern large-scale science breakthroughs by allowing scientists to define their computations as a set of jobs executed in a given order based on their data dependencies. Workflow management systems (WMSs) have become key to automating scientific workflows-executing computational jobs and orchestrating data transfers between those jobs running on complex high-performance computing (HPC) platforms. Traditionally, WMSs use files to communicate between jobs: a job writes out files that are read by other jobs. However, HPC machines face a growing gap between their storage and compute capabilities. To address that concern, the scientific community has adopted a new approach called in situ, which bypasses costly parallel filesystem I/O operations with faster in-memory or in-network communications. When using in situ approaches, communication and computations can be interleaved. In this work, we leverage the Decaf in situ dataflow framework to accelerate task-based scientific workflows managed by the Pegasus WMS, by replacing file communications with faster MPI messaging. We propose a new execution engine that uses Decaf to manage communications within a sub-workflow (i.e., set of jobs) to optimize inter-job communications. We consider two workflows in this study: (i) a synthetic workflow that benchmarks and compares file- and MPI-based communication; and (ii) a realistic bioinformatics workflow that computes mu-tational overlaps in the human genome. Experiments show that in situ communication can improve the bioinformatics workflow execution time by 22% to 30% compared with file communication. Our results motivate further opportunities and challenges for bridging traditional WMSs with in situ frameworks. 
    more » « less