skip to main content

Title: Workshop Report: Rethinking NSF’s Computational Ecosystem for 21st Century Science and Engineering
This report summarizes the discussions from a workshop convened at NSF on May 30-31, 2018 in Alexandria, VA. The overarching objective of the workshop was to rethink the nature and composition of the NSF-supported computational ecosystem given changing application requirements and resources and technology landscapes. The workshop included roughly 50 participants, drawn from high-performance computing (HPC) centers, campus computing facilities, cloud service providers (academic and commercial), and distributed resource providers. Participants spanned both large research institutions and smaller universities. Organized by Daniel Reed (University of Utah, chair), David Lifka (Cornell University), David Swanson (University of Nebraska), Rommie Amaro (UCSD), and Nancy Wilkins-Diehr (UCSD/SDSC), the workshop was motivated by the following observations. First, there have been dramatic changes in the number and nature of applications using NSF-funded resources, as well as their resource needs. As a result, there are new demands on the type (e.g., data centric) and location (e.g., close to the data or the users) of the resources as well as new usage modes (e.g., on-demand and elastic). Second, there have been dramatic changes in the landscape of technologies, resources, and delivery mechanisms, spanning large scientific instruments, ubiquitous sensors, and cloud services, among others.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ;
Reed, Daniel A.; Lifka, David; Swanson, David; Amaro, Rommie; Wilkins-Diehr, Nancy
Date Published:
Journal Name:
NSF Workshop Reports
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Obeid, Iyad ; Picone, Joseph ; Selesnick, Ivan (Ed.)
    The Neural Engineering Data Consortium (NEDC) is developing a large open source database of high-resolution digital pathology images known as the Temple University Digital Pathology Corpus (TUDP) [1]. Our long-term goal is to release one million images. We expect to release the first 100,000 image corpus by December 2020. The data is being acquired at the Department of Pathology at Temple University Hospital (TUH) using a Leica Biosystems Aperio AT2 scanner [2] and consists entirely of clinical pathology images. More information about the data and the project can be found in Shawki et al. [3]. We currently have a National Science Foundation (NSF) planning grant [4] to explore how best the community can leverage this resource. One goal of this poster presentation is to stimulate community-wide discussions about this project and determine how this valuable resource can best meet the needs of the public. The computing infrastructure required to support this database is extensive [5] and includes two HIPAA-secure computer networks, dual petabyte file servers, and Aperio’s eSlide Manager (eSM) software [6]. We currently have digitized over 50,000 slides from 2,846 patients and 2,942 clinical cases. There is an average of 12.4 slides per patient and 10.5 slides per case with one report per case. The data is organized by tissue type as shown below: Filenames: tudp/v1.0.0/svs/gastro/000001/00123456/2015_03_05/0s15_12345/0s15_12345_0a001_00123456_lvl0001_s000.svs tudp/v1.0.0/svs/gastro/000001/00123456/2015_03_05/0s15_12345/0s15_12345_00123456.docx Explanation: tudp: root directory of the corpus v1.0.0: version number of the release svs: the image data type gastro: the type of tissue 000001: six-digit sequence number used to control directory complexity 00123456: 8-digit patient MRN 2015_03_05: the date the specimen was captured 0s15_12345: the clinical case name 0s15_12345_0a001_00123456_lvl0001_s000.svs: the actual image filename consisting of a repeat of the case name, a site code (e.g., 0a001), the type and depth of the cut (e.g., lvl0001) and a token number (e.g., s000) 0s15_12345_00123456.docx: the filename for the corresponding case report We currently recognize fifteen tissue types in the first installment of the corpus. The raw image data is stored in Aperio’s “.svs” format, which is a multi-layered compressed JPEG format [3,7]. Pathology reports containing a summary of how a pathologist interpreted the slide are also provided in a flat text file format. A more complete summary of the demographics of this pilot corpus will be presented at the conference. Another goal of this poster presentation is to share our experiences with the larger community since many of these details have not been adequately documented in scientific publications. There are quite a few obstacles in collecting this data that have slowed down the process and need to be discussed publicly. Our backlog of slides dates back to 1997, meaning there are a lot that need to be sifted through and discarded for peeling or cracking. Additionally, during scanning a slide can get stuck, stalling a scan session for hours, resulting in a significant loss of productivity. Over the past two years, we have accumulated significant experience with how to scan a diverse inventory of slides using the Aperio AT2 high-volume scanner. We have been working closely with the vendor to resolve many problems associated with the use of this scanner for research purposes. This scanning project began in January of 2018 when the scanner was first installed. The scanning process was slow at first since there was a learning curve with how the scanner worked and how to obtain samples from the hospital. From its start date until May of 2019 ~20,000 slides we scanned. In the past 6 months from May to November we have tripled that number and how hold ~60,000 slides in our database. This dramatic increase in productivity was due to additional undergraduate staff members and an emphasis on efficient workflow. The Aperio AT2 scans 400 slides a day, requiring at least eight hours of scan time. The efficiency of these scans can vary greatly. When our team first started, approximately 5% of slides failed the scanning process due to focal point errors. We have been able to reduce that to 1% through a variety of means: (1) best practices regarding daily and monthly recalibrations, (2) tweaking the software such as the tissue finder parameter settings, and (3) experience with how to clean and prep slides so they scan properly. Nevertheless, this is not a completely automated process, making it very difficult to reach our production targets. With a staff of three undergraduate workers spending a total of 30 hours per week, we find it difficult to scan more than 2,000 slides per week using a single scanner (400 slides per night x 5 nights per week). The main limitation in achieving this level of production is the lack of a completely automated scanning process, it takes a couple of hours to sort, clean and load slides. We have streamlined all other aspects of the workflow required to database the scanned slides so that there are no additional bottlenecks. To bridge the gap between hospital operations and research, we are using Aperio’s eSM software. Our goal is to provide pathologists access to high quality digital images of their patients’ slides. eSM is a secure website that holds the images with their metadata labels, patient report, and path to where the image is located on our file server. Although eSM includes significant infrastructure to import slides into the database using barcodes, TUH does not currently support barcode use. Therefore, we manage the data using a mixture of Python scripts and manual import functions available in eSM. The database and associated tools are based on proprietary formats developed by Aperio, making this another important point of community-wide discussion on how best to disseminate such information. Our near-term goal for the TUDP Corpus is to release 100,000 slides by December 2020. We hope to continue data collection over the next decade until we reach one million slides. We are creating two pilot corpora using the first 50,000 slides we have collected. The first corpus consists of 500 slides with a marker stain and another 500 without it. This set was designed to let people debug their basic deep learning processing flow on these high-resolution images. We discuss our preliminary experiments on this corpus and the challenges in processing these high-resolution images using deep learning in [3]. We are able to achieve a mean sensitivity of 99.0% for slides with pen marks, and 98.9% for slides without marks, using a multistage deep learning algorithm. While this dataset was very useful in initial debugging, we are in the midst of creating a new, more challenging pilot corpus using actual tissue samples annotated by experts. The task will be to detect ductal carcinoma (DCIS) or invasive breast cancer tissue. There will be approximately 1,000 images per class in this corpus. Based on the number of features annotated, we can train on a two class problem of DCIS or benign, or increase the difficulty by increasing the classes to include DCIS, benign, stroma, pink tissue, non-neoplastic etc. Those interested in the corpus or in participating in community-wide discussions should join our listserv,, to be kept informed of the latest developments in this project. You can learn more from our project website: 
    more » « less
  2. In this proposal, we will share some initial findings about how teacher and student engagement in cogenerative dialogues influenced the development of the Culturally Relevant Pedagogical Guidelines for Computational Thinking and Computer Science (CRPG-CSCT). The CRPG-CSCT’s purpose is to provide computer science teachers with tools to enhance their instruction by accurately reflecting students’ diverse cultural resources in the classroom. Additionally, the CRPG-CSCT will provide guidance to non-computer science teachers on how to facilitate the integration of computational thinking skills to a broad spectrum of classes in the arts, humanities, sciences, social sciences, and mathematics. Our initial findings shared here are part of a larger NSF-funded research project (Award No. 2122367) which aims to better understand the barriers to entry and challenges for success faced by underrepresented secondary school students in computer science, through direct engagement with the students themselves. Throughout the 2022-23 academic year, the researchers have been working with a small team of secondary school teachers, students, and instructional designers, as well as university faculty in computer science, secondary education, and sociology to develop the CRPG-CSCT. The CRPG-CSCT is rooted in the tenets of culturally relevant pedagogy (Ladson-Billings, 1995) and borrows from Muhammad’s (2020) work in Cultivating Genius: An Equity Framework for Culturally and Historically Responsive Literacy. The CRPG-CCT is being developed over six day-long workshops held throughout the academic year. At the time of this submission, five of the six workshops had been completed. Each workshop utilized cogenerative dialogues (cogens) as the primary tool for organizing and sustaining participants’ engagement. Through cogens, participants more deeply learn about students’ cultural capital and the value of utilizing that capital within the classroom (Roth, Lawless, & Tobin, 2000). The success of cogens relies on following specific protocols (Emdin, 2016), such as listening attentively, ensuring there are equal opportunities for all participants to share, and affirming the experiences of other participants. The goal of a cogen is to reach a collective decision, based on the dialogue, that will positively impact students by explicitly addressing barriers to their engagement in the classroom. During each workshop, one member of the research team and one undergraduate research assistant observed the interactions among cogen participants and documented these in the form of ethnographic field notes. Another undergraduate research assistant took detailed notes during the workshop to record the content of small and large group discussions, presentations, and questions/responses throughout the workshops. A grounded theory approach was used to analyze the field notes. Additionally, at the conclusion of each workshop, participants completed a Cogen Feedback Survey (CFS) to gather additional information. The CFS were analyzed through open thematic coding, memos, and code frequencies. Our preliminary results demonstrate high levels of engagement from teacher and student participants during the workshops. Students identified that the cogen structure allowed them to participate comfortably, openly, and honestly. Further, students described feeling valued and heard. Students’ ideas and experiences were frequently affirmed, which served as an important step toward dismantling traditional teacher-student boundaries that might otherwise prevent them from sharing freely. Another result from the use of cogens was the shared experience of participants comprehending views from the other group’s perspective in the classroom. Students appreciated the opportunity to learn from teachers about their struggles in keeping students engaged. Teachers appreciated the opportunity to better understand students’ schooling experiences and how these may affirm or deny aspects of their identity. Finally, all participants shared meaningful suggestions and strategies for future workshops and for the collective betterment of the group. Initial findings shared here are important for several reasons. First, our findings suggest that cogens are an effective approach for fostering participants’ commitment to creating the conditions for students’ success in the classroom. Within the context of the workshops, cogens provided teachers, students, and faculty with opportunities to engage in authentic conversations for addressing the recruitment and retention problems in computer science for underrepresented students. These conversations often resulted in the development of tangible pedagogical approaches, examples, metaphors, and other strategies to directly address the recruitment and retention of underrepresented students in computer science. Finally, while we are still developing the CRPG-CSCT, cogens provided us with the opportunity to ensure the voices of teachers and students are well represented in and central to the document. 
    more » « less
  3. As the volume and sophistication of cyber-attacks grow, cybersecurity researchers, engineers and practitioners rely on advanced cyberinfrastructure (CI) techniques like big data and machine learning, as well as advanced CI platforms, e.g., cloud and high-performance computing (HPC) to assess cyber risks, identify and mitigate threats, and achieve defense in depth. There is a training gap where current cybersecurity curricula at many universities do not introduce advanced CI techniques to future cybersecurity workforce. At Old Dominion University (ODU), we are bridging this gap through an innovative training program named DeapSECURE (Data-Enabled Advanced Training Program for Cyber Security Research and Education). We developed six non-degree training modules to expose cybersecurity students to advanced CI platforms and techniques rooted in big data, machine learning, neural networks, and high-performance programming. Each workshop includes a lecture providing the motivation and context for a CI technique, which is then examined during a hands-on session. The modules are delivered through (1) monthly workshops for ODU students, and (2) summer institutes for students from other universities and Research Experiences for Undergraduates participants. Future plan for the training program includes an online continuous learning community as an extension to the workshops, and all learning materials available as open educational resources, which will facilitate widespread adoption, adaptations, and contributions. The project leverages existing partnerships to ensure broad participation and adoption of advanced CI techniques in the cybersecurity community. We employ a rigorous evaluation plan rooted in diverse metrics of success to improve the curriculum and demonstrate its effectiveness. 
    more » « less
  4. Computing landscape is evolving rapidly. Exascale computers have arrived, which can perform 10^18 mathematical operations per second. At the same time, quantum supremacy has been demonstrated, where quantum computers have outperformed these fastest supercomputers for certain problems. Meanwhile, artificial intelligence (AI) is transforming every aspect of science and engineering. A highly anticipated application of the emerging nexus of exascale computing, quantum computing and AI is computational design of new materials with desired functionalities, which has been the elusive goal of the federal materials genome initiative. The rapid change in computing landscape resulting from these developments has not been matched by pedagogical developments needed to train the next generation of materials engineering cyberworkforce. This gap in curricula across colleges and universities offers a unique opportunity to create educational tools, enabling a decentralized training of cyberworkforce. To achieve this, we have developed training modules for a new generation of quantum materials simulator, named AIQ-XMaS (AI and quantum-computing enabled exascale materials simulator), which integrates exascalable quantum, reactive and neural-network molecular dynamics simulations with unique AI and quantum-computing capabilities to study a wide range of materials and devices of high societal impact such as optoelectronics and health. As a singleentry access point to these training modules, we have also built a CyberMAGICS (cyber training on materials genome innovation for computational software) portal, which includes step-by-step instructions in Jupyter notebooks and associated tutorials, while providing online cloud service for those who do not have access to adequate computing platform. The modules are incorporated into our open-source AIQ-XMaS software suite as tutorial examples and are piloted in classroom and workshop settings to directly train many users at the University of Southern California (USC) and Howard University—one of the largest historically black colleges and universities (HBCUs), with a strong focus on underrepresented groups. In this paper, we summarize these educational developments, including findings from the first CyberMAGICS Workshop for Underrepresented Groups, along with an introduction to the AIQ-XMaS software suite. Our training modules also include a new generation of open programming languages for exascale computing (e.g., OpenMP target) and quantum computing (e.g., Qiskit) used in our scalable simulation and AI engines that underlie AIQ-XMaS. Our training modules essentially support unique dual-degree opportunities at USC in the emerging exa-quantum-AI era: Ph.D. in science or engineering, concurrently with MS in computer science specialized in high-performance computing and simulations, MS in quantum information science or MS in materials engineering with machine learning. The developed modular cyber-training pedagogy is applicable to broad engineering education at large. 
    more » « less
  5. On August 9-10, 2023, the Thomas J. O’Keefe Institute for Sustainable Supply of Strategic Minerals at Missouri University of Science and Technology (Missouri S&T) hosted the third annual workshop on ‘Resilient Supply of Critical Minerals’. The workshop was funded by the National Science Foundation (NSF) and was attended by 218 participants. 128 participants attended in-person in the Havener Center on the Missouri S&T campus in Rolla, Missouri, USA. Another 90 participants attended online via Zoom. Fourteen participants (including nine students) received travel support through the NSF grant to attend the conference in Rolla. Additionally, the online participation fee was waived for another six students and early career researchers to attend the workshop virtually. Out of the 218 participants, 190 stated their sectors of employment during registration showing that 87 participants were from academia (32 students), 62 from the private sector and 41 from government agencies. Four topical sessions were covered: A. The Critical Mineral Potential of the USA: Evaluation of existing, and exploration for new resources. B. Mineral Processing and Recycling: Maximizing critical mineral recovery from existing production streams. C. Critical Mineral Policies: Toward effective and responsible governance. D. Resource Sustainability: Ethical and environmentally sustainable supply of critical minerals. Each topical session was composed of two keynote lectures and complemented by oral and poster presentations by the workshop participants. Additionally, a panel discussion with panelists from academia, the private sector and government agencies was held that discussed ‘How to grow the American critical minerals workforce’. The 2023 workshop was followed by a post-workshop field trip to the lead-zinc mining operations of the Doe Run Company in southeast Missouri that was attended by 18 workshop participants from academia (n=10; including 4 students), the private sector (n=4), and government institutions (n=4). Discussions during the workshop led to the following suggestions to increase the domestic supply of critical minerals: (i) Research to better understand the geologic critical mineral potential of the USA, including primary reserves/resources, historic mine wastes, and mineral exploration potential. (ii) Development of novel extraction techniques targeted at the recovery of critical minerals as co-products from existing production streams, mine waste materials, and recyclables. (iii) Faster and more transparent permitting processes for mining and mineral processing operations. (iv) A more environmentally sustainable and ethical approach to mining and mineral processing. (v) Development of a highly skilled critical minerals workforce. This workshop report provides a detailed summary of the workshop discussions and describes a way forward for this workshop series for 2024 and beyond. 
    more » « less