skip to main content

Title: Distributed Computing Software and Data Access Patterns in OSG Midscale Collaborations
In this paper we showcase the support in Open Science Grid (OSG) of Midscale collaborations, the region of computing and storage scale where multi-institutional researchers collaborate to execute their science workflows on the grid without having dedicated technical support teams of their own. Collaboration Services enables such collaborations to take advantage of the distributed resources of the Open Science Grid by facilitating access to submission hosts, the deployment of their applications and supporting their data management requirements. Distributed computing software adopted from large scale collaborations, such as CVMFS, Rucio, xCache lower the barrier of intermediate scale research to integrate with existing infrastructure.
Authors:
; ; ; ; ; ; ; ; ;
Editors:
Doglioni, C.; Kim, D.; Stewart, G.A.; Silvestris, L.; Jackson, P.; Kamleh, W.
Award ID(s):
1841475
Publication Date:
NSF-PAR ID:
10213853
Journal Name:
EPJ Web of Conferences
Volume:
245
Page Range or eLocation-ID:
03005
ISSN:
2100-014X
Sponsoring Org:
National Science Foundation
More Like this
  1. The management of security credentials (e.g., passwords, secret keys) for computational science workflows is a burden for scientists and information security officers. Problems with credentials (e.g., expiration, privilege mismatch) cause workflows to fail to fetch needed input data or store valuable scientific results, distracting scientists from their research by requiring them to diagnose the problems, re-run their computations, and wait longer for their results. SciTokens introduces a capabilities-based authorization infrastructure for distributed scientific computing, to help scientists manage their security credentials more reliably and securely. SciTokens uses IETF-standard OAuth JSON Web Tokens for capability-based secure access to remote scientific data.more »These access tokens convey the specific authorizations needed by the workflows, rather than general-purpose authentication impersonation credentials, to address the risks of scientific workflows running on distributed infrastructure including NSF resources (e.g., LIGO Data Grid, Open Science Grid, XSEDE) and public clouds (e.g., Amazon Web Services, Google Cloud, Microsoft Azure). By improving the interoperability and security of scientific workflows, SciTokens 1) enables use of distributed computing for scientific domains that require greater data protection and 2) enables use of more widely distributed computing resources by reducing the risk of credential abuse on remote systems. In this extended abstract, we present the results over the past year of our open source implementation of the SciTokens model and its deployment in the Open Science Grid, including new OAuth support added in the HTCondor 8.8 release series.« less
  2. The management of security credentials (e.g., passwords, secret keys) for computational science workflows is a burden for scientists and information security officers. Problems with credentials (e.g., expiration, privilege mismatch) cause workflows to fail to fetch needed input data or store valuable scientific results, distracting scientists from their research by requiring them to diagnose the problems, re-run their computations, and wait longer for their results. In this paper, we introduce SciTokens, open source software to help scientists manage their security credentials more reliably and securely. We describe the SciTokens system architecture, design, and implementation addressing use cases from the Laser Interferometermore »Gravitational-Wave Observatory (LIGO) Scientific Collaboration and the Large Synoptic Survey Telescope (LSST) projects. We also present our integration with widely-used software that supports distributed scientific computing, including HTCondor, CVMFS, and XrootD. SciTokens uses IETF-standard OAuth tokens for capability-based secure access to remote scientific data. The access tokens convey the specific authorizations needed by the workflows, rather than general-purpose authentication impersonation credentials, to address the risks of scientific workflows running on distributed infrastructure including NSF resources (e.g., LIGO Data Grid, Open Science Grid, XSEDE) and public clouds (e.g., Amazon Web Services, Google Cloud, Microsoft Azure). By improving the interoperability and security of scientific workflows, SciTokens 1) enables use of distributed computing for scientific domains that require greater data protection and 2) enables use of more widely distributed computing resources by reducing the risk of credential abuse on remote systems.« less
  3. Abstract. The General Lake Model (GLM) is a one-dimensional open-source code designed to simulate the hydrodynamics of lakes, reservoirs, and wetlands. GLM was developed to support the science needs of the Global Lake Ecological Observatory Network (GLEON), a network of researchers using sensors to understand lake functioning and address questions about how lakes around the world respond to climate and land use change. The scale and diversity of lake types, locations, and sizes, and the expanding observational datasets created the need for a robust community model of lake dynamics with sufficient flexibility to accommodate a range of scientific and managementmore »questions relevant to the GLEON community. This paper summarizes the scientific basis and numerical implementation of the model algorithms, including details of sub-models that simulate surface heat exchange and ice cover dynamics, vertical mixing, and inflow–outflow dynamics. We demonstrate the suitability of the model for different lake types that vary substantially in their morphology, hydrology, and climatic conditions. GLM supports a dynamic coupling with biogeochemical and ecological modelling libraries for integrated simulations of water quality and ecosystem health, and options for integration with other environmental models are outlined. Finally, we discuss utilities for the analysis of model outputs and uncertainty assessments, model operation within a distributed cloud-computing environment, and as a tool to support the learning of network participants.

    « less
  4. The General Lake Model (GLM) is a one-dimensional open-source model code designed to simulate the hydrodynamics of lakes, reservoirs and wetlands. GLM was developed to support the science needs of the Global Lake Ecological Observatory Network (GLEON), a network of lake sensors and researchers attempting to understand lake functioning and address questions about how lakes around the world vary in response to climate and land-use change. The scale and diversity of lake types, locations and sizes, as well as the observational data within GLEON, created the need for a robust community model of lake dynamics with sufficient flexibility to accommodatemore »a range of scientific and management needs of the GLEON community. This paper summarises the scientific basis and numerical implementation of the model algorithms, including details of sub-models that simulate surface heat exchange and ice-cover dynamics, vertical mixing and inflow/outflow dynamics. A summary of typical parameter values for lakes and reservoirs collated from a range of sources is included. GLM supports a dynamic coupling with biogeochemical and ecological modelling libraries for integrated simulations of water quality and ecosystem health. An overview of approaches for integration with other models, and utilities for the analysis of model outputs and for undertaking sensitivity and uncertainty assessments is also provided. Finally, we discuss application of the model within a distributed cloud-computing environment, and as a tool to support learning of network participants.« less
  5. Data distribution for opportunistic users is challenging as they neither own the computing resources they are using or any nearby storage. Users are motivated to use opportunistic computing to expand their data processing capacity, but they require storage and fast networking to distribute data to that processing. Since it requires significant management overhead, it is rare for resource providers to allow opportunistic access to storage. Additionally, in order to use opportunistic storage at several distributed sites, users assume the responsibility to maintain their data. In this paper we present StashCache, a distributed caching federation that enables opportunistic users to utilizemore »nearby opportunistic storage. StashCache is comprised of four components: data origins, redirectors, caches, and clients. StashCache has been deployed in the Open Science Grid for several years and has been used by many projects. Caches are deployed in geographically distributed locations across the U.S. and Europe. We will present the architecture of StashCache, as well as utilization information of the infrastructure. We will also present performance analysis comparing distributed HTTP Proxies vs StashCache.« less