skip to main content

Title: Getting Messy with Authentic Data: Exploring the Potential of Using Data from Scientific Research to Support Student Data Literacy
Data are becoming increasingly important in science and society, and thus data literacy is a vital asset to students as they prepare for careers in and outside science, technology, engineering, and mathematics and go on to lead productive lives. In this paper, we discuss why the strongest learning experiences surrounding data literacy may arise when students are given opportunities to work with authentic data from scientific research. First, we explore the overlap between the fields of quantitative reasoning, data science, and data literacy, specifically focusing on how data literacy results from practicing quantitative reasoning and data science in the context of authentic data. Next, we identify and describe features that influence the complexity of authentic data sets (selection, curation, scope, size, and messiness) and implications for data-literacy instruction. Finally, we discuss areas for future research with the aim of identifying the impact that authentic data may have on student learning. These include defining desired learning outcomes surrounding data use in the classroom and identification of teaching best practices when using data in the classroom to develop students’ data-literacy abilities.
Authors:
; ;
Award ID(s):
1832042 1637653 1027253
Publication Date:
NSF-PAR ID:
10112628
Journal Name:
CBE—Life Sciences Education
Volume:
18
Issue:
2
Page Range or eLocation-ID:
es2
ISSN:
1931-7913
Sponsoring Org:
National Science Foundation
More Like this
  1. Authentic, “messy data” contain variability that comes from many sources, such as natural variation in nature, chance occurrences during research, and human error. It is this messiness that both deters potential users of authentic data and gives data the power to create unique learning opportunities that reveal the nature of science itself. While the value of bringing contemporary research and messy data into the classroom is recognized, implementation can seem overwhelming. We discuss the importance of frequent interactions with messy data throughout K–16 science education as a mechanism for students to engage in the practices of science, such as visualizing,more »analyzing, and interpreting data. Next, we describe strategies to help facilitate the use of messy data in the classroom while building complexity over time. Finally, we outline one potential sequence of activities, with specific examples, to highlight how various activity types can be used to scaffold students' interactions with messy data.« less
  2. Despite efforts to diversify the engineering workforce, the field remains dominated by White, male engineers. Research shows that underrepresented groups, including women and minorities, are less likely to identify and engage with scientific texts and literacy practices. Often, children of minority groups and/or working-class families do not receive the same kinds of exposure to science, technology, engineering, and mathematics (STEM) knowledge and practices as those from majority groups. Consequently, these children are less likely to engage in school subjects that provide pathways to engineering careers. Therefore, to mitigate the lack of diversity in engineering, new approaches able to broadly supportmore »engineering literacy are needed. One promising approach is disciplinary literacy instruction (DLI). DLI is a method for teaching students how advanced practitioners in a given field generate, interpret, and evaluate discipline-specific texts. DLI helps teachers provide access to to high quality, discipline-specific content to all students, regardless of race, ethnicity, gender, or socio-economic status, Therefore, DLI has potential to reduce literacy-based barriers that discourage underrepresented students from pursuing engineering careers. While models of DLI have been developed and implemented in history, science, and mathematics, little is known about DLI in engineering. The purpose of this research is to identify the authentic texts, practices, and evaluative frameworks employed by professional engineers to inform a model of DLI in engineering. While critiques of this approach may suggest that a DLI model will reflect the literacy practices of majority engineering groups, (i.e., White male engineers), we argue that a DLI model can directly empower diverse K-16 students to become engineers by instructing them in the normed knowledge and practices of engineering. This paper presents a comparative case study conducted to investigate the literacy practices of electrical and mechanical engineers. We scaffolded our research using situated learning theory and rhetorical genre studies and considered the engineering profession as a community of practice. We generated multiple types of data with four participants (i.e., two electrical and two mechanical engineers). Specifically, we generated qualitative data, including written field notes of engineer observations, interview transcripts, think-aloud protocols, and engineer logs of literacy practices. We used constant comparative analysis (CCA) coding techniques to examine how electrical and mechanical engineers read, wrote, and evaluated texts to identify the frameworks that guide their literacy practices. We then conducted within-group and cross-group constant comparative analyses (CCA) to compare and contrast the literacy practices specific to each sub-discipline Findings suggest that there are two types of engineering literacy practices: those that resonate across both mechanical and electrical engineering disciplines and those that are specific to each discipline. For example, both electrical and mechanical engineers used test procedures to review and assess steps taken to evaluate electrical or mechanical system performance. In contrast, engineers from the two sub-disciplines used different forms of representation when depicting components and arrangements of engineering systems. While practices that are common across sub-disciplines will inform a model of DLI in engineering for K-12 settings, discipline-specific practices can be used to develop and/or improve undergraduate engineering curricula.« less
  3. Abstract We investigate the link between individual differences in science reasoning skills and mock jurors’ deliberation behavior; specifically, how much they talk about the scientific evidence presented in a complicated, ecologically valid case during deliberation. Consistent with our preregistered hypothesis, mock jurors strong in scientific reasoning discussed the scientific evidence more during deliberation than those with weaker science reasoning skills. Summary With increasing frequency, legal disputes involve complex scientific information (Faigman et al., 2014; Federal Judicial Center, 2011; National Research Council, 2009). Yet people often have trouble consuming scientific information effectively (McAuliff et al., 2009; National Science Board, 2014; Resnickmore »et al., 2016). Individual differences in reasoning styles and skills can affect how people comprehend complex evidence (e.g., Hans, Kaye, Dann, Farley, Alberston, 2011; McAuliff & Kovera, 2008). Recently, scholars have highlighted the importance of studying group deliberation contexts as well as individual decision contexts (Salerno & Diamond, 2010; Kovera, 2017). If individual differences influence how jurors understand scientific evidence, it invites questions about how these individual differences may affect the way jurors discuss science during group deliberations. The purpose of the current study was to examine how individual differences in the way people process scientific information affects the extent to which jurors discuss scientific evidence during deliberations. Methods We preregistered the data collection plan, sample size, and hypotheses on the Open Science Framework. Jury-eligible community participants (303 jurors across 50 juries) from Phoenix, AZ (Mage=37.4, SD=16.9; 58.8% female; 51.5% White, 23.7% Latinx, 9.9% African-American, 4.3% Asian) were paid $55 for a 3-hour mock jury study. Participants completed a set of individual questionnaires related to science reasoning skills and attitudes toward science prior to watching a 45-minute mock armed-robbery trial. The trial included various pieces of evidence and testimony, including forensic experts testifying about mitochondrial DNA evidence (mtDNA; based on Hans et al. 2011 materials). Participants were then given 45 minutes to deliberate. The deliberations were video recorded and transcribed to text for analysis. We analyzed the deliberation content for discussions related to the scientific evidence presented during trial. We hypothesized that those with stronger scientific and numeric reasoning skills, higher need for cognition, and more positive views towards science would discuss scientific evidence more than their counterparts during deliberation. Measures We measured Attitudes Toward Science (ATS) with indices of scientific promise and scientific reservations (Hans et al., 2011; originally developed by the National Science Board, 2004; 2006). We used Drummond and Fischhoff’s (2015) Scientific Reasoning Scale (SRS) to measure scientific reasoning skills. Weller et al.’s (2012) Numeracy Scale (WNS) measured proficiency in reasoning with quantitative information. The NFC-Short Form (Cacioppo et al., 1984) measured need for cognition. Coding We identified verbal utterances related to the scientific evidence presented in court. For instance, references to DNA evidence in general (e.g. nuclear DNA being more conclusive than mtDNA), the database that was used to compare the DNA sample (e.g. the database size, how representative it was), exclusion rates (e.g. how many other people could not be excluded as a possible match), and the forensic DNA experts (e.g. how credible they were perceived). We used word count to operationalize the extent to which each juror discussed scientific information. First we calculated the total word count for each complete jury deliberation transcript. Based on the above coding scheme we determined the number of words each juror spent discussing scientific information. To compare across juries, we wanted to account for the differing length of deliberation; thus, we calculated each juror’s scientific deliberation word count as a proportion of their jury’s total word count. Results On average, jurors discussed the science for about 4% of their total deliberation (SD=4%, range 0-22%). We regressed proportion of the deliberation jurors spend discussing scientific information on the four individual difference measures (i.e., SRS, NFC, WNS, ATS). Using the adjusted R-squared, the measures significantly accounted for 5.5% of the variability in scientific information deliberation discussion, SE=0.04, F(4, 199)=3.93, p=0.004. When controlling for all other variables in the model, the Scientific Reasoning Scale was the only measure that remained significant, b=0.003, SE=0.001, t(203)=2.02, p=0.045. To analyze how much variability each measure accounted for, we performed a stepwise regression, with NFC entered at step 1, ATS entered at step 2, WNS entered at step 3, and SRS entered at step 4. At step 1, NFC accounted for 2.4% of the variability, F(1, 202)=5.95, p=0.02. At step 2, ATS did not significantly account for any additional variability. At step 3, WNS accounted for an additional 2.4% of variability, ΔF(1, 200)=5.02, p=0.03. Finally, at step 4, SRS significantly accounted for an additional 1.9% of variability in scientific information discussion, ΔF(1, 199)=4.06, p=0.045, total adjusted R-squared of 0.055. Discussion This study provides additional support for previous findings that scientific reasoning skills affect the way jurors comprehend and use scientific evidence. It expands on previous findings by suggesting that these individual differences also impact the way scientific evidence is discussed during juror deliberations. In addition, this study advances the literature by identifying Scientific Reasoning Skills as a potentially more robust explanatory individual differences variable than more well-studied constructs like Need for Cognition in jury research. Our next steps for this research, which we plan to present at AP-LS as part of this presentation, incudes further analysis of the deliberation content (e.g., not just the mention of, but the accuracy of the references to scientific evidence in discussion). We are currently coding this data with a software program called Noldus Observer XT, which will allow us to present more sophisticated results from this data during the presentation. Learning Objective: Participants will be able to describe how individual differences in scientific reasoning skills affect how much jurors discuss scientific evidence during deliberation.« less
  4. In mechanics, the standard 3-credit, 45-hour course is sufficient to deliver standard lectures with prepared examples and questions. Moreover, it is not only feasible, but preferable, to employ any of a variety of active learning and teaching techniques. Nevertheless, even when active learning is strategically used, students and instructors alike experience pressure to accomplish their respective learning and teaching goals under the constraints of the academic calendar, raising questions as to whether the allocated time is sufficient to enable authentic learning. One way to assess learning progress is to examine the learning cycles through which students attempt, re-think, and re-attemptmore »their work. This article provides data to benchmark the time required to learn key Statics concepts based on results of instruction of approximately 50 students in a Statics class at a public research university during the Fall 2020 semester. Two parallel techniques are employed to foster and understand student learning cycles. • Through a Mastery Based Learning model, 15 weekly pass/fail “Mastery Tests” are given. Students who do not pass may re-test with a different but similar test on the same topic each week until the semester’s conclusion. The tests are highly structured in that they are well posed and highly focused. For example, some tests focus only on drawing Free Body Diagrams, with no equations or calculations. Other tests focus on writing equilibrium equations from a given Free Body Diagram. Passing the first six tests is required to earn the grade of D; passing the next three for C; the next three for B; and the final three for A. Evaluations include coding of student responses to infer student reasoning. Learning cycles occur as students repeat the same topics, and their progress is assessed by passing rates and by comparing evolving responses to the same test topics. • Concept Questions that elicit qualitative responses and written explanations are deployed at least weekly. The learning cycle here consists of students answering a question, seeing the overall class results (but without the correct answer), having a chance to explore the question with other students and the instructor, and finally an opportunity to re-answer the same question, perhaps a few minutes or up to a couple days later. Sometimes, that same question is given a third time to encourage further effort or progress. To date, results from both cycles appear to agree on one important conclusion: the rate of demonstrated learning is quite low. For example, each Mastery Test has a passing rate of 20%-30%, including for students with several repeats. With the Concept Questions, typically no more than half of the students who answered incorrectly change to the correct answer by the time of the final poll. The final article will provide quantitative and qualitative results from each type of cycle, including tracking coded responses on Mastery Tests, written responses on Concept Questions, and cross-comparisons thereof. Additional results will be presented from student surveys. Since the Mastery Tests and Concept Questions follow typical Statics topics, this work has potential to lead to a standardized set of benchmarks and standards for measuring student learning – and its rate – in Statics.« less
  5. Data science has been growing in prominence across both academia and industry, but there is still little formal consensus about how to teach it. Many people who currently teach data science are practitioners such as computational researchers in academia or data scientists in industry. To understand how these practitioner-instructors pass their knowledge onto novices and howthat contrasts with teaching more traditional forms of programming, we interviewed 20 data scientists who teach in settings ranging from small-group workshops to large online courses. We found that: 1) they must empathize with a diverse array of student backgrounds and expectations, 2) they teachmore »technical workflows that integrate authentic practices surrounding code, data, and communication, 3) they face challenges involving authenticity versus abstraction in software setup, finding and curating pedagogically-relevant datasets, and acclimating students to live with uncertainty in data analysis. These findings can point the way toward better tools for data science education and help bring data literacy to more people around the world.« less