Text provides a compelling example of unstructured data that can be used to motivate and explore classification problems. Challenges arise regarding the representation of features of text and student linkage between text representations as character strings and identification of features that embed connections with underlying phenomena. In order to observe how students reason with text data in scenarios designed to elicit certain aspects of the domain, we employed a task‐based interview method using a structured protocol with six pairs of undergraduate students. Our goal was to shed light on students' understanding of text as data using a motivating task to classify headlines as “clickbait” or “news.” Three types of features (function, content, and form) surfaced, the majority from the first scenario. Our analysis of the interviews indicates that this sequence of activities engaged the participants in thinking at both the human‐perception level and the computer‐extraction level and conceptualizing connections between them.
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
AbstractFree, publicly-accessible full text available May 6, 2024
While coursework provides undergraduate data science students with some relevant analytic skills, many are not given the rich experiences with data and computing they need to be successful in the workplace. Additionally, students often have limited exposure to team-based data science and the principles and tools of collaboration that are encountered outside of school.
In this paper, we describe the DSC-WAV program, an NSF-funded data science workforce development project in which teams of undergraduate sophomores and juniors work with a local non-profit organization on a data-focused problem. To help students develop a sense of agency and improve confidence in their technical and non-technical data science skills, the project promoted a team-based approach to data science, adopting several processes and tools intended to facilitate this collaboration.
Evidence from the project evaluation, including participant survey and interview data, is presented to document the degree to which the project was successful in engaging students in team-based data science, and how the project changed the students' perceptions of their technical and non-technical skills. We also examine opportunities for improvement and offer insight to other data science educators who may want to implement a similar team-based approach to data science projects at their ownmore »