skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: INVESTIGATING DATA LIKE A DATA SCIENTIST: KEY PRACTICES AND PROCESSES
With a call for schools to infuse data across the curriculum, many are creating curricula and examining students’ thinking in data-intensive problems. As the discipline of statistics education broadens to data science education, there is a need to examine how practices in data science can inform work in K-12. We synthesize literature about statistics investigation processes, data science as a field and practices of data scientists. Further, we provide results from an ethnographic and interview study of the work of data scientists. Together, these inform a new framework to support data investigation processes. We explicate the practices and dispositions needed and offer a glimpse of how the framework can be used to move the discipline of data science education forward.  more » « less
Award ID(s):
1908760
NSF-PAR ID:
10352639
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
STATISTICS EDUCATION RESEARCH JOURNAL
Volume:
21
Issue:
2
ISSN:
1570-1824
Page Range / eLocation ID:
3
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Statisticians have a long history of consulting and collaborating with experts from a variety of fields. Now with the rise of data science, collaborating across disciplines is both more important and more prevalent than ever before. This paper examines the goals of statistics and data science collaborations and uses the ASCCR (Attitude-Structure-Communication-Content-Relationship) Framework for Collaboration to connect these goals. Specifically, we propose that a useful way of guiding consultations and collaborations is for statisticians and data scientists to work toward two terminal goals of a collaboration: to make a deep contribution to the field and create a strong relationship with the domain expert. To help in achieving these goals, statisticians and data scientists should strive to achieve three instrumental goals: adopt an attitude of collaboration, provide effective structure for the collaboration, and communicate to create shared understanding. We show how these five goals map onto the ASCCR Frame, how they are connected to each other, and how to have conversations about these goals. The goal of this paper is to show statisticians and data scientists how they can become more effective collaborators by providing motivation for using the ASCCR framework to improve their practice of statistics and data science. 
    more » « less
  2. Investigative data journalists work with a variety of data sources to tell a story. Though prior work has indicated that there is a close relationship between journalists' data work practices and that of data scientists. However, these relationships and data work practices are not empirically examined, and understanding them is crucial to inform the design of tools that are used by different groups of people including data scientists and data journalists. Thus, to bridge this gap, we studied investigative reporters' data work practices with one non-profit investigative newsroom. Our study design includes two activities: 1) semi-structured interviews with journalists, and 2) a sketching activity allowing journalists to depict examples of their work practices. By analyzing these data and synthesizing them across related prior work, we propose the major phases in the data-driven investigative journalism story idea generation process. Our study findings show that the journalists employ a collection of multiple, iterative, cyclic processes to identify journalistically "interesting'' story ideas. These processes both significantly resemble and show subtle nuanced differences with data science work practices identified in prior research. We further verified our proposal through a member check with key informants. This work offers three primary contributions. First, it provides a close glimpse into the main phases of investigative journalists' data-driven story idea generation technique. Second, it complements prior work studying formal data science practices by examining data-driven investigative journalists, whose primary expertise lies outside computing. Third, it identifies particular points in the data exploration processes that would benefit from design interventions and suggests future research directions. 
    more » « less
  3. Abstract

    There is now a significant research literature devoted to reconceptualizing scientific activities, such as modeling, explanation, and argumentation, to realize a vision of science‐as‐practice in classrooms. As yet, however, not all scientific practices have received equal attention.Planning and Carrying out Investigationsis one of the eight scientific practices identified in the Next Generation Science Standards, and there is a long line of research from both psychological and science education traditions that addresses topics about investigation, such as the generation and interpretation of evidence. However, investigation has not been subject to concerted reconceptualization within recent research and instructional design efforts focused on science‐as‐practice. In this article, we propose a framework that centers the investigation as a key locus for constructing alignments among phenomena, data, and explanatory models and makes visible the work that scientists engage in as they develop and stabilize alignments. We argue that these alignments are currently under‐theorized and under‐utilized in instructional environments. We explore four opportunities that we argue are both accessible to students from a young age and can support conceptual innovation. These are (a) developing empirical systems, (b) getting a grip on empirical systems, (c) determining, defining and operationalizing data as “evidence,” and (d) making sense of what the results of empirical systems do and do not help us understand.

     
    more » « less
  4. null (Ed.)
    Prompted by the skyrocketing demand for data scientists, progress made by the ACM Data Science Task Force on defining data science competencies, and inquiries about data science accreditation, ABET is in the process of developing accreditation criteria for undergraduate data science programs. The effort is led by members of a joint data science criteria subcommittee appointed by ABET’s Computing Accreditation Commission (CAC) and CSAB (the lead society for computing accreditation). Establishing data science accreditation criteria is a notable milestone in the maturing data science discipline, indicating the presence of an accepted body of knowledge, standards of practice, and ethical codes for practitioners. This position paper motivates the effort and discusses prior work towards defining data science education requirements. It describes the ongoing process for creating and obtaining approval of the accreditation criteria, and how feedback was and will be solicited from the computing and statistical communities. The current draft data science criteria, which was approved in July 2020 by the relevant ABET bodies for a year of public review and comment, is presented. These criteria emphasize the three pillars of data science: computing foundations, mathematical/statistical foundations, and experience in at least one data application domain. This report thus serves both to inform and to stimulate the academic discussion needed to finalize appropriate data science accreditation by ABET. 
    more » « less
  5. Abstract

    The data sets used in statistics education have changed over time, from mathematically “well‐behaved” ones that facilitated computation, to more context‐rich sources and now, with the increasing influence of data science practices, to “found” data, often from open data sites. As data sources change, it is important for educators to take a fresh look at the ways we engage students in thinking about the processes that generated the data they encounter. The use of already collected data requires particular attention because many of the decisions that went into the processes of obtaining the data are hidden. Students need to learn to ask “Who, When, How, Where, and Why?” data were collected and to wonder if the data really measure what needs to be measured. Our advocacy in this paper is to deepen the educational treatment of data production to better reflect the current and future practice of statistics and data science.

     
    more » « less