skip to main content


Title: Understanding the scientific software ecosystem and its impact: Current and future measures
Software is increasingly important to the scientific enterprise, and science-funding agencies are increasingly funding software work. Accordingly, many different participants need insight into how to understand the relationship between software, its development, its use, and its scientific impact. In this article, we draw on interviews and participant observation to describe the information needs of domain scientists, software component producers, infrastructure providers, and ecosystem stewards, including science funders. We provide a framework by which to categorize different types of measures and their relationships as they reach around from funding, development, scientific use, and through to scientific impact. We use this framework to organize a presentation of existing measures and techniques, and to identify areas in which techniques are either not widespread, or are entirely missing. We conclude with policy recommendations designed to improve insight into the scientific software ecosystem, make it more understandable, and thereby contribute to the progress of science.  more » « less
Award ID(s):
0943168 1064209
NSF-PAR ID:
10038373
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Research evaluation
Volume:
24
Issue:
4
ISSN:
1471-5449
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Who and by what means do we ensure that engineering education evolves to meet the ever changing needs of our society? This and other papers presented by our research team at this conference offer our initial set of findings from an NSF sponsored collaborative study on engineering education reform. Organized around the notion of higher education governance and the practice of educational reform, our open-ended study is based on conducting semi-structured interviews at over three dozen universities and engineering professional societies and organizations, along with a handful of scholars engaged in engineering education research. Organized as a multi-site, multi-scale study, our goal is to document differences in perspectives and interest the exist across organizational levels and institutions, and to describe the coordination that occurs (or fails to occur) in engineering education given the distributed structure of the engineering profession. This paper offers for all engineering educators and administrators a qualitative and retrospective analysis of ABET EC 2000 and its implementation. The paper opens with a historical background on the Engineers Council for Professional Development (ECPD) and engineering accreditation; the rise of quantitative standards during the 1950s as a result of the push to implement an engineering science curriculum appropriate to the Cold War era; EC 2000 and its call for greater emphasis on professional skill sets amidst concerns about US manufacturing productivity and national competitiveness; the development of outcomes assessment and its implementation; and the successive negotiations about assessment practice and the training of both of program evaluators and assessment coordinators for the degree programs undergoing evaluation. It was these negotiations and the evolving practice of assessment that resulted in the latest set of changes in ABET engineering accreditation criteria (“1-7” versus “a-k”). To provide an insight into the origins of EC 2000, the “Gang of Six,” consisting of a group of individuals loyal to ABET who used the pressure exerted by external organizations, along with a shared rhetoric of national competitiveness to forge a common vision organized around the expanded emphasis on professional skill sets. It was also significant that the Gang of Six was aware of the fact that the regional accreditation agencies were already contemplating a shift towards outcomes assessment; several also had a background in industrial engineering. However, this resulted in an assessment protocol for EC 2000 that remained ambiguous about whether the stated learning outcomes (Criterion 3) was something faculty had to demonstrate for all of their students, or whether EC 2000’s main emphasis was continuous improvement. When it proved difficult to demonstrate learning outcomes on the part of all students, ABET itself began to place greater emphasis on total quality management and continuous process improvement (TQM/CPI). This gave institutions an opening to begin using increasingly limited and proximate measures for the “a-k” student outcomes as evidence of effort and improvement. In what social scientific terms would be described as “tactical” resistance to perceived oppressive structures, this enabled ABET coordinators and the faculty in charge of degree programs, many of whom had their own internal improvement processes, to begin referring to the a-k criteria as “difficult to achieve” and “ambiguous,” which they sometimes were. Inconsistencies in evaluation outcomes enabled those most discontented with the a-k student outcomes to use ABET’s own organizational processes to drive the latest revisions to EAC accreditation criteria, although the organization’s own process for member and stakeholder input ultimately restored much of the professional skill sets found in the original EC 2000 criteria. Other refinements were also made to the standard, including a new emphasis on diversity. This said, many within our interview population believe that EC 2000 had already achieved much of the changes it set out to achieve, especially with regards to broader professional skills such as communication, teamwork, and design. Regular faculty review of curricula is now also a more routine part of the engineering education landscape. While programs vary in their engagement with ABET, there are many who are skeptical about whether the new criteria will produce further improvements to their programs, with many arguing that their own internal processes are now the primary drivers for change. 
    more » « less
  2. Abstract: 100 words Jurors are increasingly exposed to scientific information in the courtroom. To determine whether providing jurors with gist information would assist in their ability to make well-informed decisions, the present experiment utilized a Fuzzy Trace Theory-inspired intervention and tested it against traditional legal safeguards (i.e., judge instructions) by varying the scientific quality of the evidence. The results indicate that jurors who viewed high quality evidence rated the scientific evidence significantly higher than those who viewed low quality evidence, but were unable to moderate the credibility of the expert witness and apply damages appropriately resulting in poor calibration. Summary: <1000 words Jurors and juries are increasingly exposed to scientific information in the courtroom and it remains unclear when they will base their decisions on a reasonable understanding of the relevant scientific information. Without such knowledge, the ability of jurors and juries to make well-informed decisions may be at risk, increasing chances of unjust outcomes (e.g., false convictions in criminal cases). Therefore, there is a critical need to understand conditions that affect jurors’ and juries’ sensitivity to the qualities of scientific information and to identify safeguards that can assist with scientific calibration in the courtroom. The current project addresses these issues with an ecologically valid experimental paradigm, making it possible to assess causal effects of evidence quality and safeguards as well as the role of a host of individual difference variables that may affect perceptions of testimony by scientific experts as well as liability in a civil case. Our main goal was to develop a simple, theoretically grounded tool to enable triers of fact (individual jurors) with a range of scientific reasoning abilities to appropriately weigh scientific evidence in court. We did so by testing a Fuzzy Trace Theory-inspired intervention in court, and testing it against traditional legal safeguards. Appropriate use of scientific evidence reflects good calibration – which we define as being influenced more by strong scientific information than by weak scientific information. Inappropriate use reflects poor calibration – defined as relative insensitivity to the strength of scientific information. Fuzzy Trace Theory (Reyna & Brainerd, 1995) predicts that techniques for improving calibration can come from presentation of easy-to-interpret, bottom-line “gist” of the information. Our central hypothesis was that laypeople’s appropriate use of scientific information would be moderated both by external situational conditions (e.g., quality of the scientific information itself, a decision aid designed to convey clearly the “gist” of the information) and individual differences among people (e.g., scientific reasoning skills, cognitive reflection tendencies, numeracy, need for cognition, attitudes toward and trust in science). Identifying factors that promote jurors’ appropriate understanding of and reliance on scientific information will contribute to general theories of reasoning based on scientific evidence, while also providing an evidence-based framework for improving the courts’ use of scientific information. All hypotheses were preregistered on the Open Science Framework. Method Participants completed six questionnaires (counterbalanced): Need for Cognition Scale (NCS; 18 items), Cognitive Reflection Test (CRT; 7 items), Abbreviated Numeracy Scale (ABS; 6 items), Scientific Reasoning Scale (SRS; 11 items), Trust in Science (TIS; 29 items), and Attitudes towards Science (ATS; 7 items). Participants then viewed a video depicting a civil trial in which the defendant sought damages from the plaintiff for injuries caused by a fall. The defendant (bar patron) alleged that the plaintiff (bartender) pushed him, causing him to fall and hit his head on the hard floor. Participants were informed at the outset that the defendant was liable; therefore, their task was to determine if the plaintiff should be compensated. Participants were randomly assigned to 1 of 6 experimental conditions: 2 (quality of scientific evidence: high vs. low) x 3 (safeguard to improve calibration: gist information, no-gist information [control], jury instructions). An expert witness (neuroscientist) hired by the court testified regarding the scientific strength of fMRI data (high [90 to 10 signal-to-noise ratio] vs. low [50 to 50 signal-to-noise ratio]) and gist or no-gist information both verbally (i.e., fairly high/about average) and visually (i.e., a graph). After viewing the video, participants were asked if they would like to award damages. If they indicated yes, they were asked to enter a dollar amount. Participants then completed the Positive and Negative Affect Schedule-Modified Short Form (PANAS-MSF; 16 items), expert Witness Credibility Scale (WCS; 20 items), Witness Credibility and Influence on damages for each witness, manipulation check questions, Understanding Scientific Testimony (UST; 10 items), and 3 additional measures were collected, but are beyond the scope of the current investigation. Finally, participants completed demographic questions, including questions about their scientific background and experience. The study was completed via Qualtrics, with participation from students (online vs. in-lab), MTurkers, and non-student community members. After removing those who failed attention check questions, 469 participants remained (243 men, 224 women, 2 did not specify gender) from a variety of racial and ethnic backgrounds (70.2% White, non-Hispanic). Results and Discussion There were three primary outcomes: quality of the scientific evidence, expert credibility (WCS), and damages. During initial analyses, each dependent variable was submitted to a separate 3 Gist Safeguard (safeguard, no safeguard, judge instructions) x 2 Scientific Quality (high, low) Analysis of Variance (ANOVA). Consistent with hypotheses, there was a significant main effect of scientific quality on strength of evidence, F(1, 463)=5.099, p=.024; participants who viewed the high quality evidence rated the scientific evidence significantly higher (M= 7.44) than those who viewed the low quality evidence (M=7.06). There were no significant main effects or interactions for witness credibility, indicating that the expert that provided scientific testimony was seen as equally credible regardless of scientific quality or gist safeguard. Finally, for damages, consistent with hypotheses, there was a marginally significant interaction between Gist Safeguard and Scientific Quality, F(2, 273)=2.916, p=.056. However, post hoc t-tests revealed significantly higher damages were awarded for low (M=11.50) versus high (M=10.51) scientific quality evidence F(1, 273)=3.955, p=.048 in the no gist with judge instructions safeguard condition, which was contrary to hypotheses. The data suggest that the judge instructions alone are reversing the pattern, though nonsignificant, those who received the no gist without judge instructions safeguard awarded higher damages in the high (M=11.34) versus low (M=10.84) scientific quality evidence conditions F(1, 273)=1.059, p=.30. Together, these provide promising initial results indicating that participants were able to effectively differentiate between high and low scientific quality of evidence, though inappropriately utilized the scientific evidence through their inability to discern expert credibility and apply damages, resulting in poor calibration. These results will provide the basis for more sophisticated analyses including higher order interactions with individual differences (e.g., need for cognition) as well as tests of mediation using path analyses. [References omitted but available by request] Learning Objective: Participants will be able to determine whether providing jurors with gist information would assist in their ability to award damages in a civil trial. 
    more » « less
  3. Abstract

    Modeling and simulation is transforming modern materials science, becoming an important tool for the discovery of new materials and material phenomena, for gaining insight into the processes that govern materials behavior, and, increasingly, for quantitative predictions that can be used as part of a design tool in full partnership with experimental synthesis and characterization. Modeling and simulation is the essential bridge from good science to good engineering, spanning from fundamental understanding of materials behavior to deliberate design of new materials technologies leveraging new properties and processes. This Roadmap presents a broad overview of the extensive impact computational modeling has had in materials science in the past few decades, and offers focused perspectives on where the path forward lies as this rapidly expanding field evolves to meet the challenges of the next few decades. The Roadmap offers perspectives on advances within disciplines as diverse as phase field methods to model mesoscale behavior and molecular dynamics methods to deduce the fundamental atomic-scale dynamical processes governing materials response, to the challenges involved in the interdisciplinary research that tackles complex materials problems where the governing phenomena span different scales of materials behavior requiring multiscale approaches. The shift from understanding fundamental materials behavior to development of quantitative approaches to explain and predict experimental observations requires advances in the methods and practice in simulations for reproducibility and reliability, and interacting with a computational ecosystem that integrates new theory development, innovative applications, and an increasingly integrated software and computational infrastructure that takes advantage of the increasingly powerful computational methods and computing hardware.

     
    more » « less
  4. Abstract We investigate the link between individual differences in science reasoning skills and mock jurors’ deliberation behavior; specifically, how much they talk about the scientific evidence presented in a complicated, ecologically valid case during deliberation. Consistent with our preregistered hypothesis, mock jurors strong in scientific reasoning discussed the scientific evidence more during deliberation than those with weaker science reasoning skills. Summary With increasing frequency, legal disputes involve complex scientific information (Faigman et al., 2014; Federal Judicial Center, 2011; National Research Council, 2009). Yet people often have trouble consuming scientific information effectively (McAuliff et al., 2009; National Science Board, 2014; Resnick et al., 2016). Individual differences in reasoning styles and skills can affect how people comprehend complex evidence (e.g., Hans, Kaye, Dann, Farley, Alberston, 2011; McAuliff & Kovera, 2008). Recently, scholars have highlighted the importance of studying group deliberation contexts as well as individual decision contexts (Salerno & Diamond, 2010; Kovera, 2017). If individual differences influence how jurors understand scientific evidence, it invites questions about how these individual differences may affect the way jurors discuss science during group deliberations. The purpose of the current study was to examine how individual differences in the way people process scientific information affects the extent to which jurors discuss scientific evidence during deliberations. Methods We preregistered the data collection plan, sample size, and hypotheses on the Open Science Framework. Jury-eligible community participants (303 jurors across 50 juries) from Phoenix, AZ (Mage=37.4, SD=16.9; 58.8% female; 51.5% White, 23.7% Latinx, 9.9% African-American, 4.3% Asian) were paid $55 for a 3-hour mock jury study. Participants completed a set of individual questionnaires related to science reasoning skills and attitudes toward science prior to watching a 45-minute mock armed-robbery trial. The trial included various pieces of evidence and testimony, including forensic experts testifying about mitochondrial DNA evidence (mtDNA; based on Hans et al. 2011 materials). Participants were then given 45 minutes to deliberate. The deliberations were video recorded and transcribed to text for analysis. We analyzed the deliberation content for discussions related to the scientific evidence presented during trial. We hypothesized that those with stronger scientific and numeric reasoning skills, higher need for cognition, and more positive views towards science would discuss scientific evidence more than their counterparts during deliberation. Measures We measured Attitudes Toward Science (ATS) with indices of scientific promise and scientific reservations (Hans et al., 2011; originally developed by the National Science Board, 2004; 2006). We used Drummond and Fischhoff’s (2015) Scientific Reasoning Scale (SRS) to measure scientific reasoning skills. Weller et al.’s (2012) Numeracy Scale (WNS) measured proficiency in reasoning with quantitative information. The NFC-Short Form (Cacioppo et al., 1984) measured need for cognition. Coding We identified verbal utterances related to the scientific evidence presented in court. For instance, references to DNA evidence in general (e.g. nuclear DNA being more conclusive than mtDNA), the database that was used to compare the DNA sample (e.g. the database size, how representative it was), exclusion rates (e.g. how many other people could not be excluded as a possible match), and the forensic DNA experts (e.g. how credible they were perceived). We used word count to operationalize the extent to which each juror discussed scientific information. First we calculated the total word count for each complete jury deliberation transcript. Based on the above coding scheme we determined the number of words each juror spent discussing scientific information. To compare across juries, we wanted to account for the differing length of deliberation; thus, we calculated each juror’s scientific deliberation word count as a proportion of their jury’s total word count. Results On average, jurors discussed the science for about 4% of their total deliberation (SD=4%, range 0-22%). We regressed proportion of the deliberation jurors spend discussing scientific information on the four individual difference measures (i.e., SRS, NFC, WNS, ATS). Using the adjusted R-squared, the measures significantly accounted for 5.5% of the variability in scientific information deliberation discussion, SE=0.04, F(4, 199)=3.93, p=0.004. When controlling for all other variables in the model, the Scientific Reasoning Scale was the only measure that remained significant, b=0.003, SE=0.001, t(203)=2.02, p=0.045. To analyze how much variability each measure accounted for, we performed a stepwise regression, with NFC entered at step 1, ATS entered at step 2, WNS entered at step 3, and SRS entered at step 4. At step 1, NFC accounted for 2.4% of the variability, F(1, 202)=5.95, p=0.02. At step 2, ATS did not significantly account for any additional variability. At step 3, WNS accounted for an additional 2.4% of variability, ΔF(1, 200)=5.02, p=0.03. Finally, at step 4, SRS significantly accounted for an additional 1.9% of variability in scientific information discussion, ΔF(1, 199)=4.06, p=0.045, total adjusted R-squared of 0.055. Discussion This study provides additional support for previous findings that scientific reasoning skills affect the way jurors comprehend and use scientific evidence. It expands on previous findings by suggesting that these individual differences also impact the way scientific evidence is discussed during juror deliberations. In addition, this study advances the literature by identifying Scientific Reasoning Skills as a potentially more robust explanatory individual differences variable than more well-studied constructs like Need for Cognition in jury research. Our next steps for this research, which we plan to present at AP-LS as part of this presentation, incudes further analysis of the deliberation content (e.g., not just the mention of, but the accuracy of the references to scientific evidence in discussion). We are currently coding this data with a software program called Noldus Observer XT, which will allow us to present more sophisticated results from this data during the presentation. Learning Objective: Participants will be able to describe how individual differences in scientific reasoning skills affect how much jurors discuss scientific evidence during deliberation. 
    more » « less
  5. Translational software research bridges the gap between scientific innovations and practical applications, driving impactful societal advancements. However, developing such software is challenging due to interdisciplinary collaboration, technology adoption, and postfunding sustainability. This article presents the experiences and insights of the Scalable Adaptive Graphics Environment (SAGE) team, which has spent two decades developing translational, cross-disciplinary, collaboration tools to benefit computational science research. With a focus on SAGE and its next-generation iterations, we explore the inherent challenges in translational research, such as fostering cross-disciplinary collaboration, motivating technology adoption, and ensuring postfunding product sustainability. We also discuss the roles of funding agencies, policymakers, and academic institutions in promoting translational research. Although the journey is fraught with challenges, the societal impact and satisfaction derived from translational research underscore its significance in the broader scientific landscape. This article aims to encourage further conversation and the development of effective models for translational software projects. 
    more » « less