Text provides a compelling example of unstructured data that can be used to motivate and explore classification problems. Challenges arise regarding the representation of features of text and student linkage between text representations as character strings and identification of features that embed connections with underlying phenomena. In order to observe how students reason with text data in scenarios designed to elicit certain aspects of the domain, we employed a task‐based interview method using a structured protocol with six pairs of undergraduate students. Our goal was to shed light on students' understanding of text as data using a motivating task to classify headlines as “clickbait” or “news.” Three types of features (function, content, and form) surfaced, the majority from the first scenario. Our analysis of the interviews indicates that this sequence of activities engaged the participants in thinking at both the human‐perception level and the computer‐extraction level and conceptualizing connections between them.
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract -
Abstract To date, many AI initiatives (eg, AI4K12, CS for All) developed standards and frameworks as guidance for educators to create accessible and engaging Artificial Intelligence (AI) learning experiences for K‐12 students. These efforts revealed a significant need to prepare youth to gain a fundamental understanding of how intelligence is created, applied, and its potential to perpetuate bias and unfairness. This study contributes to the growing interest in K‐12 AI education by examining student learning of modelling real‐world text data. Four students from an Advanced Placement computer science classroom at a public high school participated in this study. Our qualitative analysis reveals that the students developed nuanced and in‐depth understandings of how text classification models—a type of AI application—are trained. Specifically, we found that in modelling texts, students: (1) drew on their social experiences and cultural knowledge to create predictive features, (2) engineered predictive features to address model errors, (3) described model learning patterns from training data and (4) reasoned about noisy features when comparing models. This study contributes to an initial understanding of student learning of modelling unstructured data and offers implications for scaffolding in‐depth reasoning about model decision making.
Practitioner notes What is already known about this topic
Scholarly attention has turned to examining Artificial Intelligence (AI) literacy in K‐12 to help students understand the working mechanism of AI technologies and critically evaluate automated decisions made by computer models.
While efforts have been made to engage students in understanding AI through building machine learning models with data, few of them go in‐depth into teaching and learning of feature engineering, a critical concept in modelling data.
There is a need for research to examine students' data modelling processes, particularly in the little‐researched realm of unstructured data.
What this paper adds
Results show that students developed nuanced understandings of models learning patterns in data for automated decision making.
Results demonstrate that students drew on prior experience and knowledge in creating features from unstructured data in the learning task of building text classification models.
Students needed support in performing feature engineering practices, reasoning about noisy features and exploring features in rich social contexts that the data set is situated in.
Implications for practice and/or policy
It is important for schools to provide hands‐on model building experiences for students to understand and evaluate automated decisions from AI technologies.
Students should be empowered to draw on their cultural and social backgrounds as they create models and evaluate data sources.
To extend this work, educators should consider opportunities to integrate AI learning in other disciplinary subjects (ie, outside of computer science classes).
-
It’s critical to foster artificial intelligence (AI) literacy for high school students, the first generation to grow up surrounded by AI, to understand working mechanism of data-driven AI technologies and critically evaluate automated decisions from predictive models. While efforts have been made to engage youth in understanding AI through developing machine learning models, few provided in-depth insights into the nuanced learning processes. In this study, we examined high school students’ data modeling practices and processes. Twenty-eight students developed machine learning models with text data for classifying negative and positive reviews of ice cream stores. We identified nine data modeling practices that describe students’ processes of model exploration, development, and testing and two themes about evaluating automated decisions from data technologies. The results provide implications for designing accessible data modeling experiences for students to understand data justice as well as the role and responsibility of data modelers in creating AI technologies.more » « less
-
null (Ed.)Fanfiction presents an opportunity as a data source for research in NLP, education, and social science. However, answering specific research questions with this data is difficult, since fanfiction contains more diverse writing styles than formal fiction. We present a text processing pipeline for fanfiction, with a fo- cus on identifying text associated with characters. The pipeline includes modules for character identification and coreference, as well as the attribution of quotes and narration to those characters. Additionally, the pipeline contains a novel approach to character coreference that uses knowledge from quote attribution to resolve pronouns within quotes. For each module, we evaluate the effectiveness of various approaches on 10 annotated fanfiction stories. This pipeline outperforms tools developed for formal fiction on the tasks of character coreference and quote attribution.more » « less
-
null (Ed.)In this paper, we present a co-design study with teachers to contribute towards the development of a technology-enhanced Artificial Intelligence (AI) curriculum, focusing on modeling unstructured data. We created an initial design of a learning activity prototype and explored ways to incorporate the design into high school classes. Specifically, teachers explored text classification models with the prototype and reflected on the exploration as a user, learner, and teacher. They provided insights about learning opportunities in the activity and feedback for integrating it into their teaching. Findings from qualitative analysis demonstrate that exploring text classification models provided an accessible and comprehensive approach for integrated learning of mathematics, language arts, and computing with the potential of supporting the understanding of core AI concepts including identifying structure within unstructured data and reasoning about the roles of human insight in developing AI technologies.more » « less
-
de Vries, E. ; Hod, Y. ; Ahn, J. (Ed.)In this paper, we present a co-design study with teachers to contribute towards development of a technology-enhanced Artificial Intelligence (AI) curriculum, focusing on modeling unstructured data. We created an initial design of a learning activity prototype and explored ways to incorporate the design into high school classes. Specifically, teachers explored text classification models with the prototype and reflected on the exploration as a user, learner, and teacher. They provided insights about learning opportunities in the activity and feedback for integrating it into their teaching. Findings from qualitative analysis demonstrate that exploring text classification models provided an accessible and comprehensive approach for integrated learning of mathematics, language arts, and computing with the potential of supporting the understanding of core AI concepts including identifying structure within unstructured data and reasoning about the roles of human insight in developing AI technologies.more » « less