skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Scalable Software Infrastructure for Integrating Supercomputing with Volunteer Computing and Cloud Computing
Volunteer Computing (VC) is a computing model that uses donated computing cycles on the devices such as laptops, desktops, and tablets to do scientific computing. BOINC is the most popular software framework for VC and it helps in connecting the projects needing computing cycles with the volunteers interested in donating the computing cycles on their resources. It has already enabled projects with high societal impact to harness several PetaFLOPs of donated computing cycles. Given its potential in elastically augmenting the capacity of existing supercomputing resources for running High-Throughput Computing (HTC) jobs, we have extended the BOINC software infrastructure and have made it amenable for integration with the supercomputing and cloud computing environments. We have named the extension of the BOINC software infrastructure as BOINC@TACC, and are using it to route *qualified* HTC jobs from the supercomputers at the Texas Advanced Computing Center (TACC) to not only the typically volunteered devices but also to the cloud computing resources such as Jetstream and Chameleon. BOINC@TACC can be extremely useful for those researchers/scholars who are running low on allocations of compute-cycles on the supercomputers, or are interested in reducing the turnaround time of their HTC jobs when the supercomputers are over-subscribed. We have also developed a web-application for TACC users so that, through the convenience of their web-browser, they can submit their HTC jobs for running on the resources volunteered by the community. An overview of the BOINC@TACC project is presented in this paper. The BOINC@TACC software infrastructure is open-source and can be easily adapted for use by other supercomputing centers that are interested in building their volunteer community and connecting them with the researchers needing multi-petascale (and even exascale) computing power for their HTC jobs.  more » « less
Award ID(s):
1664022
PAR ID:
10091633
Author(s) / Creator(s):
Date Published:
Journal Name:
Communications in computer and information science
Volume:
964
ISSN:
1865-0929
Page Range / eLocation ID:
105-119
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Volunteer Computing (VC) is a computing model that uses donated computing cycles on the devices such as laptops, desktops, and tablets to do scientific computing. BOINC is the most popular software framework for VC and it helps in connecting the projects needing computing cycles with the volunteers interested in donating the computing cycles on their resources. It has already enabled projects with high societal impact to harness several PetaFLOPs of donated computing cycles. Given its potential in elastically augmenting the capacity of existing supercomputing resources for running High-Throughput Computing (HTC) jobs, we have extended the BOINC software infrastructure and have made it amenable for integration with the supercomputing and cloud computing environments. We have named the extension of the BOINC software infrastructure as BOINC@TACC, and are using it to route *qualified* HTC jobs from the supercomputers at the Texas Advanced Computing Center (TACC) to not only the typically volunteered devices but also to the cloud computing resources such as Jetstream and Chameleon. BOINC@TACC can be extremely useful for those researchers/scholars who are running low on allocations of compute-cycles on the supercomputers, or are interested in reducing the turnaround time of their HTC jobs when the supercomputers are over-subscribed. We have also developed a web-application for TACC users so that, through the convenience of their web-browser, they can submit their HTC jobs for running on the resources volunteered by the community. An overview of the BOINC@TACC project is presented in this paper. The BOINC@TACC software infrastructure is open-source and can be easily adapted for use by other supercomputing centers that are interested in building their volunteer community and connecting them with the researchers needing multi-petascale (and even exascale) computing power for their HTC jobs 
    more » « less
  2. Volunteer computing (VC) uses consumer digital electronics products, such as PCs, mobile devices, and game consoles, for high-throughput scientific computing. Device owners participate in VC by installing a program which, in the background, downloads and executes jobs from servers operated by science projects. Most VC projects use BOINC, an open-source middleware system for VC. BOINC allows scientists create and operate VC projects and enables volunteers to participate in these projects. Volunteers install a single application (the BOINC client) and then choose projects to support. We have developed a BOINC project, nanoHUB@home, to make use of VC in support of the nanoHUB science gateway. VC has greatly expanded the computational resources available for nanoHUB simulations. We are using VC to support “speculative exploration”, a model of computing that explores the input parameters of online simulation tools published through the nanoHUB gateway, pre-computing results that have not been requested by users. These results are stored in a cache, and when a user launches an interactive simulation our system first checks the cache. If the result is already available it is returned to the user immediately, leaving the computational resources free and not re-computing existing results. The cache is also useful for machine learning (ML) studies, building surrogate models for nanoHUB simulation tools that allow us to quickly estimate results before running an expensive simulation. VC resources also allow us to support uncertainty quantification (UQ) in nanoHUB simulation tools, to go beyond simulations and deliver real-world predictions. Models are typically simulated with precise input values, but real-world experiments involve imprecise values for device measurements, material properties, and stimuli. The imprecise values can be expressed as a probability distribution of values, such as a Gaussian distribution with a mean and standard deviation, or an actual distribution measured from experiments. Stochastic collocation methods can be used to predict the resulting outputs given a series of probability distributions for inputs. These computations require hundreds or thousands of simulation runs for each prediction. This workload is well-suited to VC, since the runs are completely separate, but the results of all runs are combined in a statistical analysis. 
    more » « less
  3. Volunteer computing (VC) uses consumer digital electronics products, such as PCs, mobile devices, and game consoles, for high-throughput scientific computing. Device owners participate in VC by installing a program which, in the background, downloads and executes jobs from servers operated by science projects. Most VC projects use BOINC, an open-source middleware system for VC. BOINC allows scientists create and operate VC projects and enables volunteers to participate in these projects. Volunteers install a single application (the BOINC client) and then choose projects to support. We have developed a BOINC project, nanoHUB@home, to make use of VC in support of the nanoHUB science gateway. VC has greatly expanded the computational resources available for nanoHUB simulations. We are using VC to support “speculative exploration”, a model of computing that explores the input parameters of online simulation tools published through the nanoHUB gateway, pre-computing results that have not been requested by users. These results are stored in a cache, and when a user launches an interactive simulation our system first checks the cache. If the result is already available it is returned to the user immediately, leaving the computational resources free and not re-computing existing results. The cache is also useful for machine learning (ML) studies, building surrogate models for nanoHUB simulation tools that allow us to quickly estimate results before running an expensive simulation. VC resources also allow us to support uncertainty quantification (UQ) in nanoHUB simulation tools, to go beyond simulations and deliver real-world predictions. Models are typically simulated with precise input values, but real-world experiments involve imprecise values for device measurements, material properties, and stimuli. The imprecise values can be expressed as a probability distribution of values, such as a Gaussian distribution with a mean and standard deviation, or an actual distribution measured from experiments. Stochastic collocation methods can be used to predict the resulting outputs given a series of probability distributions for inputs. These computations require hundreds or thousands of simulation runs for each prediction. This workload is well-suited to VC, since the runs are completely separate, but the results of all runs are combined in a statistical analysis. 
    more » « less
  4. Supercomputers are used to power discoveries and to reduce the time-to-results in a wide variety of disciplines such as engineering, physical sciences, and healthcare. They are globally considered as vital for staying competitive in defense, the financial sector, several mainstream businesses, and even agriculture. An integral requirement for enabling the usage of the supercomputers, like any other computer, is the availability of the software. Scalable and efficient software is typically required for optimally using the large-scale supercomputing platforms, and thereby, effectively leveraging the investments in the advanced CyberInfrastructure (CI). However, developing and maintaining such software is challenging due to several factors, such as, (1) no well-defined processes or guidelines for writing software that can ensure high-performance on supercomputers, and (2) shortfall of trained workforce having skills in both software engineering and supercomputing. With the rapid advancement in the computer architecture discipline, the complexity of the processors that are used in the supercomputers is also increasing, and, in turn, the task of developing efficient software for supercomputers is further becoming challenging and complex. To mitigate the aforementioned challenges, there is a need for a common platform that brings together different stakeholders from the areas of supercomputing and software engineering. To provide such a platform, the second workshop on Software Challenges to Exascale Computing (SCEC) was organized in Delhi, India, during December 13–14, 2018. The SCEC 2018 workshop informed participants about the challenges in large-scale HPC software development and steered them in the direction of building international collaborations for finding solutions to those challenges. The workshop provided a forum through which hardware vendors and software developers can communicate with each other and influence the architecture of the next-generation supercomputing systems and the supporting software stack. By fostering cross-disciplinary associations, the workshop served as a stepping-stone towards innovations in the future. We are very grateful to the Organizing and Program Committees (listed below), the sponsors (US National Science Foundation, Indian National Supercomputing Mission, Atos, Mellanox, Centre for Development of Advanced Computing, San Diego Supercomputing Center, Texas Advanced Computing Center), and the participants for their contributions to making the SCEC 2018 workshop a success. 
    more » « less
  5. As research projects grow more complex and researchers use a mix of tools - command-line scripts, science gateways, and Jupyter notebooks - it becomes increasingly difficult to track exactly how a final result was produced. Each tool often keeps its own logs, making it hard to reconstruct the full sequence of computational steps. This lack of end-to-end visibility poses a serious challenge for scientific reproducibility. Yet advanced computing remains a critical part of nearly every field of academic research, and researchers continue to rely on a wide range of interfaces to run their scientific software. To address this challenge, the Advanced Computing Interfaces group at the Texas Advanced Computing Center (TACC) created a system that collates logs from multiple sources - science gateways, Jupyter notebooks, and the Tapis platform - into one unified “audit trail.” The TACC Research Audit and Integration of Logs (TRAIL) system allows researchers and staff to follow the complete path a dataset or file took: from the moment it was first uploaded to TACC, through every step of computation, to the final result. This kind of tracking helps ensure scientific results can be reproduced and gives advanced computing services better insight into how data and resources are being used. 
    more » « less