skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on March 1, 2026

Title: HPC-ED: Building a Sustainable Community Driven CyberTraining Catalog
HPC-ED is working to improve discovery and sharing of CyberTraining resources through the combination of the HPC-ED CyberTraining Catalog, an effective and flexible interface, thoughtful metadata design, and active community participation. HPC-ED encourages authors to share training resource information while retaining ownership and allows organizations to enrich their local portals with shared materials. By basing the architecture on an established, flexible framework, HPC-ED can provide a range of solutions people and organizations can employ for sharing and discovering materials. In this paper we describe the initial pilot phase of the project, where we prototyped the HPC-ED catalog, established an initial metadata set, provided documentation, and began using the system to share and discover materials. We gathered community feedback through a variety of means, and are now planning an implementation phase based on evolving our architecture and tools to meet community needs and feedback through improved interfaces and tools designed to address a range of preferences.  more » « less
Award ID(s):
2320977
PAR ID:
10639461
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Journal of Computational Science Education
Date Published:
Journal Name:
The Journal of Computational Science Education
Volume:
16
Issue:
1
ISSN:
2153-4136
Page Range / eLocation ID:
7 to 13
Subject(s) / Keyword(s):
Education, Training, Community Engagement, HPC, Cyberinfrastructure, Metadata, Globus
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. To improve the sharing and discovery of CyberTraining materi- als, the HPC-ED Pilot project team is building a platform for the community to better share and find training materials through a federated catalog. The platform, currently in early test mode, is fo- cused on a flexible platform, informative metadata, and community participation. By creating a framework for identifying, sharing, and including content broadly, HPC-ED will: allow providers of training materials to reach new groups of learners; extend the breadth and depth of training materials; and enable local sites to add or extend local portals. 
    more » « less
  2. Throughout the cyberinfrastructure community there are a large range of resources available to train faculty and young scholars about successful utilization of computational resources for research. The challenge that the community faces is that training materi- als abound, but they can be difficult to find, and often have little information about the quality or relevance of offerings. Building on existing software technology, we propose to build a way for the community to better share and find training and education materials through a federated training repository. In this scenario, organizations and authors retain physical and legal ownership of their materials by sharing only catalog information, organizations can refine local portals to use the best and most appropriate ma- terials from both local and remote sources, and learners can take advantage of materials that are reviewed and described more clearly. In this paper, we introduce the HPC ED pilot project, a federated training repository that is designed to allow resource providers, campus portals, schools, and other institutions to both incorporate training from multiple sources into their own familiar interfaces and to publish their local training materials. 
    more » « less
  3. During 2018, 2019, and 2020, the UMBC CyberTraining initiative “Big Data + HPC + Atmospheric Sciences” created an online team-based training program for advanced graduate students and junior researchers that trained a total of 58 participants. The year 2020 included 6 undergraduate students. Based on this experience, the authors created the summer undergraduate research program Online Interdisciplinary Big Data Analytics in Science and Engineering that will conduct 8-week online team-based undergraduate research programs (bigdatareu.umbc.edu) in the summers 2021, 2022, and 2023. Given the context of many institutions potentially expanding their online instruction, we share our experiences how the successful lessons from CyberTraining transfer to a high-intensity full-time online summer undergraduate research program. 
    more » « less
  4. Automated experimentation methods are unlocking a new data-rich research paradigm in materials science that promises to accelerate the pace of materials discovery. However, if our data management practices do not keep pace with progress in automation, this revolution threatens to drown us in unusable data. In this perspective, we highlight the need to update data management practices to track, organize, process, and share data collected from laboratories with deeply integrated automation equipment. We argue that a holistic approach to data management that integrates multiple scales (experiment, group and community scales) is needed. We propose a vision for what this integrated data future could look like and compare existing work against this vision to find gaps in currently available data management tools. To realize this vision, we believe that development of standard protocols for communicating with equipment and data sharing, the development of new open-source software tools for managing data in research groups, and leadership and direction from funding agencies and other organizations are needed. 
    more » « less
  5. Software organizations are increasingly incorporating machine learning (ML) into their product offerings, driving a need for new data management tools. Many of these tools facilitate the initial development of ML applications, but sustaining these applications post-deployment is difficult due to lack of real-time feedback (i.e., labels) for predictions and silent failures that could occur at any component of the ML pipeline (e.g., data distribution shift or anomalous features). We propose a new type of data management system that offers end-to-end observability , or visibility into complex system behavior, for deployed ML pipelines through assisted (1) detection, (2) diagnosis, and (3) reaction to ML-related bugs. We describe new research challenges and suggest preliminary solution ideas in all three aspects. Finally, we introduce an example architecture for a "bolt-on" ML observability system, or one that wraps around existing tools in the stack. 
    more » « less