skip to main content


Title: Astro2020 APC White Paper: Elevating the Role of Software as a Product of the Research Enterprise
Software is a critical part of modern research, and yet there are insufficient mechanisms in the scholarly ecosystem to acknowledge, cite, and measure the impact of research software. The majority of academic fields rely on a one-dimensional credit model whereby academic articles (and their associated citations) are the dominant factor in the success of a researcher's career. In the petabyte era of astronomical science, citing software and measuring its impact enables academia to retain and reward researchers that make significant software contributions. These highly skilled researchers must be retained to maximize the scientific return from petabyte-scale datasets. Evolving beyond the one-dimensional credit model requires overcoming several key challenges, including the current scholarly ecosystem and scientific culture issues. This white paper will present these challenges and suggest practical solutions for elevating the role of software as a product of the research enterprise.  more » « less
Award ID(s):
1743747
NSF-PAR ID:
10111875
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; « less
Date Published:
Journal Name:
APC
ISSN:
2257-8587
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Meeting the United Nation’ Sustainable Development Goals (SDGs) calls for an integrative scientific approach, combining expertise, data, models and tools across many disciplines towards addressing sustainability challenges at various spatial and temporal scales. This holistic approach, while necessary, exacerbates the big data and computational challenges already faced by researchers. Many challenges in sustainability research can be tackled by harnessing the power of advanced cyberinfrastructure (CI). The objective of this paper is to highlight the key components and technologies of CI necessary for meeting the data and computational needs of the SDG research community. An overview of the CI ecosystem in the United States is provided with a specific focus on the investments made by academic institutions, government agencies and industry at national, regional, and local levels. Despite these investments, this paper identifies barriers to the adoption of CI in sustainability research that include, but are not limited to access to support structures; recruitment, retention and nurturing of an agile workforce; and lack of local infrastructure. Relevant CI components such as data, software, computational resources, and human-centered advances are discussed to explore how to resolve the barriers. The paper highlights multiple challenges in pursuing SDGs based on the outcomes of several expert meetings. These include multi-scale integration of data and domain-specific models, availability and usability of data, uncertainty quantification, mismatch between spatiotemporal scales at which decisions are made and the information generated from scientific analysis, and scientific reproducibility. We discuss ongoing and future research for bridging CI and SDGs to address these challenges.

     
    more » « less
  2. An enormous reserve of information about the subglacial bedrock, tectonic and topographic evolution of Marie Byrd Land (MBL) exists within glaciomarine sediments of the Amundsen Sea shelf, slope and deep sea, and MBL marine shelf. Investigators of the NSF ICI-Hot and NSF Linchpin projects partnered with Arizona Laserchron Center to provide course-based undergraduate research experiences (CUREs) for from groups who do not ordinarily find access points to Antarctic science. Our courses enlist BIPOC and gender-expansive undergraduates in studies of ice-rafted debris (IRD) and bedrock samples, in order to impart skills, train in the use of research instrumentation, help students to develop confidence in their scientific abilities, and collaboratively address WAIS research questions at an early academic stage. CUREs afford benefits to graduate researchers and postdoctoral scientists, also, who join in as instructional faculty: CUREs allow GRs and PDs to engage in teaching that closely ties to their active research, yet provides practical experience to strengthen the academic portfolio (Cascella & Jez, 2018). Team members also develop art-science initiatives that engage students and community members who may not ordinarily engage with science, forging connections that make science relatable. Re-casting science topics through art centers personal connections and humanizes science, to promote understanding that goes beyond the purely analytical. Academic research shows that diverse undergraduates gain markedly from the convergence of art and science, and from involvement in collaborative research conducted within a CURE cohort, rather than as an individualized experience (e.g. Shanahan et al. 2022). The CUREs are offered as regular courses for credit, making access equitable via course enrollment. The course designation carries a legitimacy that is sought by students who balance academics with part-time employment. Course information is disseminated via STEM Bridge programs and/or an academic advising hub that reaches students from groups that are insufficiently represented within STEM and cryosphere science. CURE investigation of Amundsen Sea and WAIS problems is worthy objective because: 1) A variety of sample preparation, geochemical methods, and scientific best-practices can be imparted, while educating students about Antarctica’s geological configuration and role in the Earth climate system. 2) Individual projects that are narrowly defined can readily scaffold into collaborative science at the time of data synthesis and interpretation. 3) There is a high likelihood of scientific discovery that contributes to grant objectives. 4) Enrolled students will experience ambiguity and instrumentation setbacks alongside their faculty and instructors, and will likely have an opportunity to withstand/overcome challenges in a manner that trains students in complex problem solving and imparts resilience (St John et al., 2019). Based on our experiences, we consider CUREs as a means to create more inclusive and equitable spaces for learning to do research, and a basis for a broadening future WAIS community. Our groups have yet to assess student learning gains and STEM entry in a robust way, but we can report that two presenters at WAIS 2022 came from our 2021 CURE, and four polar science graduate researchers gained experience via CURE teaching. Data obtained by CURE students is contributing to our NSF projects’ aims to obtain isotope, age, and petrogenetic criteria with bearing on the subglacial bedrock geology, tectonic and landscape evolution, and ice sheet history of MBL. Cited and recommended works: Cascella & Jez, 2018, doi: 10.1021/acs.jchemed.7b00705 Gentile et al., 2017, doi: 10.17226/24622 Shanahan et al. 2022, https://www.cur.org/assets/1/23/01-01_TOC_SPUR_Winter21.pdf Shortlidge & Brownell, 2016, doi: 10.1128/jmbe.v17i3.1103 St. John et al. 2019, EOS, doi: 10.1029/2019EO127285. 
    more » « less
  3. Obeid, Iyad ; Picone, Joseph ; Selesnick, Ivan (Ed.)
    The Neural Engineering Data Consortium (NEDC) is developing a large open source database of high-resolution digital pathology images known as the Temple University Digital Pathology Corpus (TUDP) [1]. Our long-term goal is to release one million images. We expect to release the first 100,000 image corpus by December 2020. The data is being acquired at the Department of Pathology at Temple University Hospital (TUH) using a Leica Biosystems Aperio AT2 scanner [2] and consists entirely of clinical pathology images. More information about the data and the project can be found in Shawki et al. [3]. We currently have a National Science Foundation (NSF) planning grant [4] to explore how best the community can leverage this resource. One goal of this poster presentation is to stimulate community-wide discussions about this project and determine how this valuable resource can best meet the needs of the public. The computing infrastructure required to support this database is extensive [5] and includes two HIPAA-secure computer networks, dual petabyte file servers, and Aperio’s eSlide Manager (eSM) software [6]. We currently have digitized over 50,000 slides from 2,846 patients and 2,942 clinical cases. There is an average of 12.4 slides per patient and 10.5 slides per case with one report per case. The data is organized by tissue type as shown below: Filenames: tudp/v1.0.0/svs/gastro/000001/00123456/2015_03_05/0s15_12345/0s15_12345_0a001_00123456_lvl0001_s000.svs tudp/v1.0.0/svs/gastro/000001/00123456/2015_03_05/0s15_12345/0s15_12345_00123456.docx Explanation: tudp: root directory of the corpus v1.0.0: version number of the release svs: the image data type gastro: the type of tissue 000001: six-digit sequence number used to control directory complexity 00123456: 8-digit patient MRN 2015_03_05: the date the specimen was captured 0s15_12345: the clinical case name 0s15_12345_0a001_00123456_lvl0001_s000.svs: the actual image filename consisting of a repeat of the case name, a site code (e.g., 0a001), the type and depth of the cut (e.g., lvl0001) and a token number (e.g., s000) 0s15_12345_00123456.docx: the filename for the corresponding case report We currently recognize fifteen tissue types in the first installment of the corpus. The raw image data is stored in Aperio’s “.svs” format, which is a multi-layered compressed JPEG format [3,7]. Pathology reports containing a summary of how a pathologist interpreted the slide are also provided in a flat text file format. A more complete summary of the demographics of this pilot corpus will be presented at the conference. Another goal of this poster presentation is to share our experiences with the larger community since many of these details have not been adequately documented in scientific publications. There are quite a few obstacles in collecting this data that have slowed down the process and need to be discussed publicly. Our backlog of slides dates back to 1997, meaning there are a lot that need to be sifted through and discarded for peeling or cracking. Additionally, during scanning a slide can get stuck, stalling a scan session for hours, resulting in a significant loss of productivity. Over the past two years, we have accumulated significant experience with how to scan a diverse inventory of slides using the Aperio AT2 high-volume scanner. We have been working closely with the vendor to resolve many problems associated with the use of this scanner for research purposes. This scanning project began in January of 2018 when the scanner was first installed. The scanning process was slow at first since there was a learning curve with how the scanner worked and how to obtain samples from the hospital. From its start date until May of 2019 ~20,000 slides we scanned. In the past 6 months from May to November we have tripled that number and how hold ~60,000 slides in our database. This dramatic increase in productivity was due to additional undergraduate staff members and an emphasis on efficient workflow. The Aperio AT2 scans 400 slides a day, requiring at least eight hours of scan time. The efficiency of these scans can vary greatly. When our team first started, approximately 5% of slides failed the scanning process due to focal point errors. We have been able to reduce that to 1% through a variety of means: (1) best practices regarding daily and monthly recalibrations, (2) tweaking the software such as the tissue finder parameter settings, and (3) experience with how to clean and prep slides so they scan properly. Nevertheless, this is not a completely automated process, making it very difficult to reach our production targets. With a staff of three undergraduate workers spending a total of 30 hours per week, we find it difficult to scan more than 2,000 slides per week using a single scanner (400 slides per night x 5 nights per week). The main limitation in achieving this level of production is the lack of a completely automated scanning process, it takes a couple of hours to sort, clean and load slides. We have streamlined all other aspects of the workflow required to database the scanned slides so that there are no additional bottlenecks. To bridge the gap between hospital operations and research, we are using Aperio’s eSM software. Our goal is to provide pathologists access to high quality digital images of their patients’ slides. eSM is a secure website that holds the images with their metadata labels, patient report, and path to where the image is located on our file server. Although eSM includes significant infrastructure to import slides into the database using barcodes, TUH does not currently support barcode use. Therefore, we manage the data using a mixture of Python scripts and manual import functions available in eSM. The database and associated tools are based on proprietary formats developed by Aperio, making this another important point of community-wide discussion on how best to disseminate such information. Our near-term goal for the TUDP Corpus is to release 100,000 slides by December 2020. We hope to continue data collection over the next decade until we reach one million slides. We are creating two pilot corpora using the first 50,000 slides we have collected. The first corpus consists of 500 slides with a marker stain and another 500 without it. This set was designed to let people debug their basic deep learning processing flow on these high-resolution images. We discuss our preliminary experiments on this corpus and the challenges in processing these high-resolution images using deep learning in [3]. We are able to achieve a mean sensitivity of 99.0% for slides with pen marks, and 98.9% for slides without marks, using a multistage deep learning algorithm. While this dataset was very useful in initial debugging, we are in the midst of creating a new, more challenging pilot corpus using actual tissue samples annotated by experts. The task will be to detect ductal carcinoma (DCIS) or invasive breast cancer tissue. There will be approximately 1,000 images per class in this corpus. Based on the number of features annotated, we can train on a two class problem of DCIS or benign, or increase the difficulty by increasing the classes to include DCIS, benign, stroma, pink tissue, non-neoplastic etc. Those interested in the corpus or in participating in community-wide discussions should join our listserv, nedc_tuh_dpath@googlegroups.com, to be kept informed of the latest developments in this project. You can learn more from our project website: https://www.isip.piconepress.com/projects/nsf_dpath. 
    more » « less
  4. There is a critical need for more students with engineering and computer science majors to enter into, persist in, and graduate from four-year postsecondary institutions. Increasing the diversity of the workforce by inclusive practices in engineering and science is also a profound identified need. According to national statistics, the largest groups of underrepresented minority students in engineering and science attend U.S. public higher education institutions. Most often, a large proportion of these students come to colleges and universities with unique challenges and needs, and are more likely to be first in their family to attend college. In response to these needs, engineering education researchers and practitioners have developed, implemented and assessed interventions to provide support and help students succeed in college, particularly in their first year. These interventions typically target relatively small cohorts of students and can be managed by a small number of faculty and staff. In this paper, we report on “work in progress” research in a large-scale, first-year engineering and computer science intervention program at a public, comprehensive university using multivariate comparative statistical approaches. Large-scale intervention programs are especially relevant to minority serving institutions that prepare growing numbers of students who are first in their family to attend college and who are also under-resourced, financially. These students most often encounter academic difficulties and come to higher education with challenging experiences and backgrounds. Our studied first-year intervention program, first piloted in 2015, is now in its 5th year of implementation. Its intervention components include: (a) first-year block schedules, (b) project-based introductory engineering and computer science courses, (c) an introduction to mechanics course, which provides students with the foundation needed to succeed in a traditional physics sequence, and (d) peer-led supplemental instruction workshops for calculus, physics and chemistry courses. This intervention study responds to three research questions: (1) What role does the first-year intervention’s components play in students’ persistence in engineering and computer science majors across undergraduate program years? (2) What role do particular pedagogical and cocurricular support structures play in students’ successes? And (3) What role do various student socio-demographic and experiential factors play in the effectiveness of first-year interventions? To address these research questions and therefore determine the formative impact of the firstyear engineering and computer science program on which we are conducting research, we have collected diverse student data including grade point averages, concept inventory scores, and data from a multi-dimensional questionnaire that measures students’ use of support practices across their four to five years in their degree program, and diverse background information necessary to determine the impact of such factors on students’ persistence to degree. Background data includes students’ experiences prior to enrolling in college, their socio-demographic characteristics, and their college social capital throughout their higher education experience. For this research, we compared students who were enrolled in the first-year intervention program to those who were not enrolled in the first-year intervention. We have engaged in cross-sectional 2 data collection from students’ freshman through senior years and employed multivariate statistical analytical techniques on the collected student data. Results of these analyses were interesting and diverse. Generally, in terms of backgrounds, our research indicates that students’ parental education is positively related to their success in engineering and computer science across program years. Likewise, longitudinally (across program years), students’ college social capital predicted their academic success and persistence to degree. With regard to the study’s comparative research of the first-year intervention, our results indicate that students who were enrolled in the first-year intervention program as freshmen continued to use more support practices to assist them in academic success across their degree matriculation compared to students who were not in the first-year program. This suggests that the students continued to recognize the value of such supports as a consequence of having supports required as first-year students. In terms of students’ understanding of scientific or engineering-focused concepts, we found significant impact resulting from student support practices that were academically focused. We also found that enrolling in the first-year intervention was a significant predictor of the time that students spent preparing for classes and ultimately their grade point average, especially in STEM subjects across students’ years in college. In summary, we found that the studied first-year intervention program has longitudinal, positive impacts on students’ success as they navigate through their undergraduate experiences toward engineering and computer science degrees. 
    more » « less
  5. Science policy makers are looking for approaches to increase the extent of collaboration in the production of scientific software, looking to open collaborations in open source software for inspiration. We examine the software ecosystem surrounding BLAST, a key bioinformatics tool, identifying outside improvements and interviewing their authors. We find that academic credit is a powerful motivator for the production and revealing of improvements. Yet surprisingly, we also find that improvements motivated by academic credit are less likely to be integrated than those with other motivations, including financial gain. We argue that this is because integration makes it harder to see who has contributed what and thereby undermines the ability of reputation to function as a reward for collaboration. We consider how open source avoids these issues and conclude with policy approaches to promoting wider collaboration by addressing incentives for integration. 
    more » « less