skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: How learners produce data from text in classifying clickbait
Abstract Text provides a compelling example of unstructured data that can be used to motivate and explore classification problems. Challenges arise regarding the representation of features of text and student linkage between text representations as character strings and identification of features that embed connections with underlying phenomena. In order to observe how students reason with text data in scenarios designed to elicit certain aspects of the domain, we employed a task‐based interview method using a structured protocol with six pairs of undergraduate students. Our goal was to shed light on students' understanding of text as data using a motivating task to classify headlines as “clickbait” or “news.” Three types of features (function, content, and form) surfaced, the majority from the first scenario. Our analysis of the interviews indicates that this sequence of activities engaged the participants in thinking at both the human‐perception level and the computer‐extraction level and conceptualizing connections between them.  more » « less
Award ID(s):
1949110
PAR ID:
10419861
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Teaching Statistics
Volume:
45
Issue:
S1
ISSN:
0141-982X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract To date, many AI initiatives (eg, AI4K12, CS for All) developed standards and frameworks as guidance for educators to create accessible and engaging Artificial Intelligence (AI) learning experiences for K‐12 students. These efforts revealed a significant need to prepare youth to gain a fundamental understanding of how intelligence is created, applied, and its potential to perpetuate bias and unfairness. This study contributes to the growing interest in K‐12 AI education by examining student learning of modelling real‐world text data. Four students from an Advanced Placement computer science classroom at a public high school participated in this study. Our qualitative analysis reveals that the students developed nuanced and in‐depth understandings of how text classification models—a type of AI application—are trained. Specifically, we found that in modelling texts, students: (1) drew on their social experiences and cultural knowledge to create predictive features, (2) engineered predictive features to address model errors, (3) described model learning patterns from training data and (4) reasoned about noisy features when comparing models. This study contributes to an initial understanding of student learning of modelling unstructured data and offers implications for scaffolding in‐depth reasoning about model decision making. Practitioner notesWhat is already known about this topicScholarly attention has turned to examining Artificial Intelligence (AI) literacy in K‐12 to help students understand the working mechanism of AI technologies and critically evaluate automated decisions made by computer models.While efforts have been made to engage students in understanding AI through building machine learning models with data, few of them go in‐depth into teaching and learning of feature engineering, a critical concept in modelling data.There is a need for research to examine students' data modelling processes, particularly in the little‐researched realm of unstructured data.What this paper addsResults show that students developed nuanced understandings of models learning patterns in data for automated decision making.Results demonstrate that students drew on prior experience and knowledge in creating features from unstructured data in the learning task of building text classification models.Students needed support in performing feature engineering practices, reasoning about noisy features and exploring features in rich social contexts that the data set is situated in.Implications for practice and/or policyIt is important for schools to provide hands‐on model building experiences for students to understand and evaluate automated decisions from AI technologies.Students should be empowered to draw on their cultural and social backgrounds as they create models and evaluate data sources.To extend this work, educators should consider opportunities to integrate AI learning in other disciplinary subjects (ie, outside of computer science classes). 
    more » « less
  2. Abstract Using a mixed methods approach, we explore a relationship between students’ graph reasoning and graph selection via a fully online assessment. Our population includes 673 students enrolled in college algebra, an introductory undergraduate mathematics course, across four U.S. postsecondary institutions. The assessment is accessible on computers, tablets, and mobile phones. There are six items; for each, students are to view a video animation of a dynamic situation (e.g., a toy car moving along a square track), declare their understanding of the situation, select a Cartesian graph to represent a relationship between given attributes in the situation, and enter text to explain their graph choice. To theorize students’ graph reasoning, we draw on Thompson’s theory of quantitative reasoning, which explains students’ conceptions of attributes as being possible to measure. To code students’ written responses, we appeal to Johnson and colleagues’ graph reasoning framework, which distinguishes students’ quantitative reasoning about one or more attributes capable of varying (Covariation, Variation) from students’ reasoning about observable elements in a situation (Motion, Iconic). Quantitizing those qualitative codes, we examine connections between the latent variables of students’ graph reasoning and graph selection. Using structural equation modeling, we report a significant finding: Students’ graph reasoning explains 40% of the variance in their graph selection (standardized regression weight is 0.64,p < 0.001). Furthermore, our results demonstrate that students’ quantitative forms of graph reasoning (i.e., variational and covariational reasoning) influence the accuracy of their graph selection. 
    more » « less
  3. Abstract BackgroundIt is well known that earning a bachelor's degree in engineering is a demanding task, but ripe with opportunity. For students from historically excluded demographic groups, this task is exacerbated by oppressive circumstances. Although considerable research has documented how student outcomes differ across demographic groups, much less is known about the dynamic processes that marginalize some students. PurposeThe purpose of this article is to propose a conceptual model of student navigation in the context of undergraduate engineering programs. Our goal is to illustrate how localized, structural features unjustly shape the demands and opportunities encountered by students and influence how they respond. Scope/MethodWe developed our model using an iterative, four‐stage process. This process included (1)clarifyingthe purpose of the development process; (2)identifyingconcepts and insights from prior research; (3)synthesizingthe concepts and insights into propositions; and (4)visualizingthe suspected relationships between the salient constructs in the propositions. ResultsOur model focuses on the dynamic interactions between the characteristics of students, the embedded contexts in which they are situated, and the support infrastructure of their learning environment. ConclusionThe resulting model illustrates the influence of structural features on how students a) respond to demands and opportunities and b) navigate obstacles present in the learning environment. Although its focus is on marginalized students in undergraduate engineering programs, the model may be applicable to STEM higher education more broadly. 
    more » « less
  4. Abstract Ideological divisions in the United States have become increasingly prominent in daily communication. Accordingly, there has been much research on political polarization, including many recent efforts that take a computational perspective. By detecting political biases in a text document, one can attempt to discern and describe its polarity. Intuitively, the named entities (i.e., the nouns and the phrases that act as nouns) and hashtags in text often carry information about political views. For example, people who use the term “pro-choice” are likely to be liberal and people who use the term “pro-life” are likely to be conservative. In this paper, we seek to reveal political polarities in social-media text data and to quantify these polarities by explicitly assigning a polarity score to entities and hashtags. Although this idea is straightforward, it is difficult to perform such inference in a trustworthy quantitative way. Key challenges include the small number of known labels, the continuous spectrum of political views, and the preservation of both a polarity score and a polarity-neutral semantic meaning in an embedding vector of words. To attempt to overcome these challenges, we propose thePolarity-awareEmbeddingMulti-task learning (PEM) model. This model consists of (1) a self-supervised context-preservation task, (2) an attention-based tweet-level polarity-inference task, and (3) an adversarial learning task that promotes independence between an embedding’s polarity component and its semantic component. Our experimental results demonstrate that ourPEMmodel can successfully learn polarity-aware embeddings that perform well at tweet-level and account-level classification tasks. We examine a variety of applications—including a study of spatial and temporal distributions of polarities and a comparison between tweets from Twitter and posts from Parler—and we thereby demonstrate the effectiveness of ourPEMmodel. We also discuss important limitations of our work and encourage caution when applying thePEMmodel to real-world scenarios. 
    more » « less
  5. Abstract We know that reading involves coordination between textual characteristics and visual attention, but research linking eye movements during reading and comprehension assessed after reading is surprisingly limited, especially for reading long connected texts. We tested two competing possibilities: (a) the weak association hypothesis: Links between eye movements and comprehension are weak and short‐lived, versus (b) the strong association hypothesis: The two are robustly linked, even after a delay. Using a predictive modeling approach, we trained regression models to predict comprehension scores from global eye movement features, using participant‐level cross‐validation to ensure that the models generalize across participants. We used data from three studies in which readers (Ns = 104, 130, 147) answered multiple‐choice comprehension questions ~30 min after reading a 6,500‐word text, or after reading up to eight 1,000‐word texts. The models generated accurate predictions of participants' text comprehension scores (correlations between observed and predicted comprehension: 0.384, 0.362, 0.372,ps < .001), in line with the strong association hypothesis. We found that making more, but shorter fixations, consistently predicted comprehension across all studies. Furthermore, models trained on one study's data could successfully predict comprehension on the others, suggesting generalizability across studies. Collectively, these findings suggest that there is a robust link between eye movements and subsequent comprehension of a long connected text, thereby connecting theories of low‐level eye movements with those of higher order text processing during reading. 
    more » « less