skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Towards validity for a formative assessment for language-specific program tracing skills
Formative assessments can have positive effects on learning, but few exist for computing, even for basic skills such as program tracing. Instead, teachers often rely on overly broad test questions that lack the diagnostic granularity needed to measure early learning. We followed Kane's framework for assessment validity to design a formative assessment of JavaScript program tracing, developing "an argument for effectiveness for a specific use." This included: 1) a fine-grained scoring model to guide practice, 2) item design to test parts of our fine-grained model with low confound-caused variance, 3) a covering test design that samples from a space of items and covers the scoring model, and 4) a feasibility argument for effectiveness for formative use (can target and improve learning). We contribute a distillation of Kane's framework situated for computing education, and a novel application of Kane's framework to formative assessment of program tracing, focusing on scoring, generalization, and use. Our application also contributes a novel way of modeling possible conceptions of a programming language's semantics by modeling prevalent compositions of control flow and data flow graphs and the paths through them, a process for generating test items, and principles for minimizing item confounds.  more » « less
Award ID(s):
1735123
PAR ID:
10190890
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ACM Koli Calling International Conference on Computing Education
Page Range / eLocation ID:
1 to 10
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Unlike summative assessment that is aimed at grading students at the end of a unit or academic term, formative assessment is assess- ment for learning, aimed at monitoring ongoing student learning to provide feedback to both student and teacher, so that learning gaps can be addressed during the learning process. Education research points to formative assessment as a crucial vehicle for improving student learning. Formative assessment in K-12 CS and program- ming classrooms remains a crucial unaddressed need. Given that assessment for learning is closely tied to teacher pedagogical con- tent knowledge, formative assessment literacy needs to also be a topic of CS teacher PD. This position paper addresses the broad need to understand formative assessment and build a framework to understand the what, why, and how of formative assessment of introductory programming in K-12 CS. It shares specific pro- gramming examples to articulate the cycle of formative assessment, diagnostic evaluation, feedback, and action. The design of formative assessment items is informed by CS research on assessment design, albeit related largely to summative assessment and in CS1 contexts, and learning of programming, especially student misconceptions. It describes what teacher formative assessment literacy PD should entail and how to catalyze assessment-focused collaboration among K-12 CS teachers through assessment platforms and repositories. 
    more » « less
  2. Abstract Argumentation, a key scientific practice presented in theFramework for K-12 Science Education, requires students to construct and critique arguments, but timely evaluation of arguments in large-scale classrooms is challenging. Recent work has shown the potential of automated scoring systems for open response assessments, leveraging machine learning (ML) and artificial intelligence (AI) to aid the scoring of written arguments in complex assessments. Moreover, research has amplified that the features (i.e., complexity, diversity, and structure) of assessment construct are critical to ML scoring accuracy, yet how the assessment construct may be associated with machine scoring accuracy remains unknown. This study investigated how the features associated with the assessment construct of a scientific argumentation assessment item affected machine scoring performance. Specifically, we conceptualized the construct in three dimensions: complexity, diversity, and structure. We employed human experts to code characteristics of the assessment tasks and score middle school student responses to 17 argumentation tasks aligned to three levels of a validated learning progression of scientific argumentation. We randomly selected 361 responses to use as training sets to build machine-learning scoring models for each item. The scoring models yielded a range of agreements with human consensus scores, measured by Cohen’s kappa (mean = 0.60; range 0.38 − 0.89), indicating good to almost perfect performance. We found that higher levels ofComplexityandDiversity of the assessment task were associated with decreased model performance, similarly the relationship between levels ofStructureand model performance showed a somewhat negative linear trend. These findings highlight the importance of considering these construct characteristics when developing ML models for scoring assessments, particularly for higher complexity items and multidimensional assessments. 
    more » « less
  3. Modeling student knowledge is important for assessment design, adaptive testing, curriculum design, and pedagogical intervention. The assessment design community has primarily focused on continuous latent-skill models with strong conditional independence assumptions among knowledge items, while the prerequisite discovery community has developed many models that aim to exploit the interdependence of discrete knowledge items. This paper attempts to bridge the gap by asking, "When does modeling assessment item interdependence improve predictive accuracy?" A novel adaptive testing evaluation framework is introduced that is amenable to techniques from both communities, and an efficient algorithm, Directed Item-Dependence And Confidence Thresholds (DIDACT), is introduced and compared with an Item-Response-Theory based model on several real and synthetic datasets. Experiments suggest that assessments with closely related questions benefit significantly from modeling item interdependence. 
    more » « less
  4. null (Ed.)
    Research-based assessment instruments (RBAIs) are essential tools to measure aspects of student learning and improve pedagogical practice. RBAIs are designed to measure constructs related to a well-defined learning goal. However, relatively few RBAIs exist that are suitable for the specific learning goals of upper-division physics lab courses. One such learning goal is modeling, the process of constructing, testing, and refining models of physical and measurement systems. Here, we describe the creation of one component of an RBAI to measure proficiency with modeling. The RBAI is called the Modeling Assessment for Physics Laboratory Experiments (MAPLE). For use with large numbers of students, MAPLE must be scalable, which includes not requiring impractical amounts of labor to analyze its data as is often the case with large free-response assessments. We, therefore, use the coupled multiple response (CMR) format, from which data can be analyzed by a computer, to create items for measuring student reasoning in this component of MAPLE.We describe the process we used to create a set of CMR items for MAPLE, provide an example of this process for an item, and lay out an argument for construct validity of the resulting items based on our process. 
    more » « less
  5. Knowledge tracing (KT), or modeling student knowledge state given their past activity sequence, is one of the essential tasks in online education systems. Research has demonstrated that students benefit from both assessed (e.g., solving problems, which can be graded) and non-assessed learning activities (e.g., watching video lectures, which cannot be graded), and thus, modeling student knowledge from multiple types of activities with knowledge transfer between them is crucial. However, current approaches to multi-activity knowledge tracing cannot capture coarse-grained between-type associations and are primarily evaluated by predicting student performance on upcoming assessed activities (labeled data). Therefore, they are inadequate in incorporating signals from non-assessed activities (unlabeled data). We propose Graph-enhanced Multi-activity Knowledge Tracing (GMKT) that addresses these challenges by jointly learning a fine-grained recurrent memory-augmented student knowledge model and a coarse-grained graph neural network. In GMKT, we formulate multi-activity knowledge tracing as a semi-supervised sequence learning problem and optimize for accurate student performance and activity type at each time step. We demonstrate the effectiveness of our proposed model by experimenting on three real-world datasets. 
    more » « less