skip to main content


Title: Towards validity for a formative assessment for language-specific program tracing skills
Formative assessments can have positive effects on learning, but few exist for computing, even for basic skills such as program tracing. Instead, teachers often rely on overly broad test questions that lack the diagnostic granularity needed to measure early learning. We followed Kane's framework for assessment validity to design a formative assessment of JavaScript program tracing, developing "an argument for effectiveness for a specific use." This included: 1) a fine-grained scoring model to guide practice, 2) item design to test parts of our fine-grained model with low confound-caused variance, 3) a covering test design that samples from a space of items and covers the scoring model, and 4) a feasibility argument for effectiveness for formative use (can target and improve learning). We contribute a distillation of Kane's framework situated for computing education, and a novel application of Kane's framework to formative assessment of program tracing, focusing on scoring, generalization, and use. Our application also contributes a novel way of modeling possible conceptions of a programming language's semantics by modeling prevalent compositions of control flow and data flow graphs and the paths through them, a process for generating test items, and principles for minimizing item confounds.  more » « less
Award ID(s):
1735123
NSF-PAR ID:
10190890
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ACM Koli Calling International Conference on Computing Education
Page Range / eLocation ID:
1 to 10
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    This research paper describes the development of an assessment instrument for use with middle school students that provides insight into students’ interpretive understanding by looking at early indicators of developing expertise in students’ responses to solution generation, reflection, and concept demonstration tasks. We begin by detailing a synthetic assessment model that served as the theoretical basis for assessing specific thinking skills. We then describe our process of developing test items by working with a Teacher Design Team (TDT) of instructors in our partner school system to set guidelines that would better orient the assessment in that context and working within the framework of standards and disciplinary core ideas enumerated in the Next Generation Science Standards (NGSS). We next specify our process of refining the assessment from 17 items across three separate item pools to a final total of three open-response items. We then provide evidence for the validity and reliability of the assessment instrument from the standards of (1) content, (2) meaningfulness, (3) generalizability, and (4) instructional sensitivity. As part of the discussion from the standards of generalizability and instructional sensitivity, we detail a study carried out in our partner school system in the fall of 2019. The instrument was administered to students in treatment (n= 201) and non-treatment (n = 246) groups, wherein the former participated in a two-to-three-week, NGSS-aligned experimental instructional unit introducing the principles of engineering design that focused on engaging students using the Imaginative Education teaching approach. The latter group were taught using the district’s existing engineering design curriculum. Results from statistical analysis of student responses showed that the interrater reliability of the scoring procedures were good-to-excellent, with intra-class correlation coefficients ranging between .72 and .95. To gauge the instructional sensitivity of the assessment instrument, a series of non-parametric comparative analyses (independent two-group Mann-Whitney tests) were carried out. These found statistically significant differences between treatment and non-treatment student responses related to the outcomes of fluency and elaboration, but not reflection. 
    more » « less
  2. null (Ed.)
    This research paper describes the development of an assessment instrument for use with middle school students that provides insight into students’ interpretive understanding by looking at early indicators of developing expertise in students’ responses to solution generation, reflection, and concept demonstration tasks. We begin by detailing a synthetic assessment model that served as the theoretical basis for assessing specific thinking skills. We then describe our process of developing test items by working with a Teacher Design Team (TDT) of instructors in our partner school system to set guidelines that would better orient the assessment in that context and working within the framework of standards and disciplinary core ideas enumerated in the Next Generation Science Standards (NGSS). We next specify our process of refining the assessment from 17 items across three separate item pools to a final total of three open-response items. We then provide evidence for the validity and reliability of the assessment instrument from the standards of (1) content, (2) meaningfulness, (3) generalizability, and (4) instructional sensitivity. As part of the discussion from the standards of generalizability and instructional sensitivity, we detail a study carried out in our partner school system in the fall of 2019. The instrument was administered to students in treatment (n= 201) and non- treatment (n = 246) groups, wherein the former participated in a two-to-three- week, NGSS-aligned experimental instructional unit introducing the principles of engineering design that focused on engaging students using the Imaginative Education teaching approach. The latter group were taught using the district’s existing engineering design curriculum. Results from statistical analysis of student responses showed that the interrater reliability of the scoring procedures were good-to-excellent, with intra-class correlation coefficients ranging between .72 and .95. To gauge the instructional sensitivity of the assessment instrument, a series of non-parametric comparative analyses (independent two-group Mann- Whitney tests) were carried out. These found statistically significant differences between treatment and non-treatment student responses related to the outcomes of fluency and elaboration, but not reflection. 
    more » « less
  3. Our research, Landscapes of Deep Time in the Red Earth of France (NSF International Research Experience for Students project), aims to mentor U.S. undergraduate science students from underserved populations (e.g. students of Native American heritage and/or first-generation college students) in geological research. During the first field season (June 2018) formative and summative assessments (outlined below) will be issued to assist in our evaluation of student learning. The material advancement of a student's sedimentological skillsets and self-efficacy development in research applications are a direct measure of our program's success. (1) Immediately before and after the program, students will self-rank their competency of specific skillsets (e.g. data collection, lithologic description, use of field equipment) in an anonymous summative assessment. (2) Formative assessments throughout the field season (e.g. describing stratigraphic section independently, oral and written communication of results) will assess improved comprehension of the scientific process. (3) An anonymous attitudinal survey will be issued at the conclusion of the field season to shed light on the program's quality as a whole, influence on student desire to pursue a higher-level degree/career in STEM, and effectiveness of the program on aiding the development of participant confidence and self-efficacy in research design and application. We discuss herein the results of first-year assessments with a focus on strategies for improvement. We expect each individual's outcomes to differ depending on his/her own characteristics and background. Furthermore, some of the most valued intentions of this experience are inherently difficult to measure (e.g., improved understanding of the scientific process, a stimulated passion to pursue a STEM career). We hope to address shortcomings in design; e.g. Where did we lose visibility on certain aspects of the learning experience? How can we revise the format and content of our assessment to better evaluate student participants and improve our program in subsequent years? 
    more » « less
  4. null (Ed.)
    Integrated approaches to teaching science, technology, engineering, and mathematics (commonly referred to as STEM education) in K-12 classrooms have resulted in a growing number of teachers incorporating engineering in their science classrooms. Such changes are a result of shifts in science standards to include engineering as evidenced by the Next Generation Science Standards. To date, 20 states and the District of Columbia have adopted the NGSS and another 24 have adopted standards based on the Framework for K-12 Science Education. Despite the increased presence of engineering and integrated STEM education in K-12 education, there are several concerns to consider. One concern is the limited availability of observation instruments appropriate for instruction where multiple STEM disciplines are present and integrated with one another. Addressing this concern requires the development of a new observation instrument, designed with integrated STEM instruction in mind. An instrument such as this has implications for both research and practice. For example, research using this instrument could help educators compare integrated STEM instruction across grade bands. Additionally, this tool could be useful in the preparation of pre-service teachers and professional development of in-service teachers new to integrated STEM education and formative learning through professional learning communities or classroom coaching. The work presented here describes in detail the development of an integrated STEM observation instrument that can be used for both research and practice. Over a period of approximately 18-months, a team of STEM educators and educational researchers developed a 10-item integrated STEM observation instrument for use in K-12 science and engineering classrooms. The process of developing the instrument began with establishing a conceptual framework, drawing on the integrated STEM research literature, national standards documents, and frameworks for both K-12 engineering education and integrated STEM education. As part of the instrument development process, the project team had access to over 2000 classroom videos where integrated STEM education took place. Initial analysis of a selection of these videos helped the project team write a preliminary draft instrument consisting of 52 items. Through several rounds of revisions, including the construction of detailed scoring levels of the items and collapsing of items that significantly overlapped, and piloting of the instrument for usability, items were added, edited, and/or removed for various reasons. These reasons included issues concerning the intricacy of the observed phenomenon or the item not being specific to integrated STEM education (e.g., questioning). In its final form, the instrument consists of 10 items, each comprising four descriptive levels. Each item is also accompanied by a set of user guidelines, which have been refined by the project team as a result of piloting the instrument and reviewed by external experts in the field. The instrument has shown to be reliable with the project team and further validation is underway. This instrument will be of use to a wide variety of educators and educational researchers looking to understand the implementation of integrated STEM education in K-12 science and engineering classrooms. 
    more » « less
  5. null (Ed.)
    Integrated approaches to teaching science, technology, engineering, and mathematics (commonly referred to as STEM education) in K-12 classrooms have resulted in a growing number of teachers incorporating engineering in their science classrooms. Such changes are a result of shifts in science standards to include engineering as evidenced by the Next Generation Science Standards. To date, 20 states and the District of Columbia have adopted the NGSS and another 24 have adopted standards based on the Framework for K-12 Science Education. Despite the increased presence of engineering and integrated STEM education in K-12 education, there are several concerns to consider. One concern is the limited availability of observation instruments appropriate for instruction where multiple STEM disciplines are present and integrated with one another. Addressing this concern requires the development of a new observation instrument, designed with integrated STEM instruction in mind. An instrument such as this has implications for both research and practice. For example, research using this instrument could help educators compare integrated STEM instruction across grade bands. Additionally, this tool could be useful in the preparation of pre-service teachers and professional development of in-service teachers new to integrated STEM education and formative learning through professional learning communities or classroom coaching. The work presented here describes in detail the development of an integrated STEM observation instrument - the STEM Observation Protocol (STEM-OP) - that can be used for both research and practice. Over a period of approximately 18-months, a team of STEM educators and educational researchers developed a 10-item integrated STEM observation instrument for use in K-12 science and engineering classrooms. The process of developing the STEM-OP began with establishing a conceptual framework, drawing on the integrated STEM research literature, national standards documents, and frameworks for both K-12 engineering education and integrated STEM education. As part of the instrument development process, the project team had access to over 2000 classroom videos where integrated STEM education took place. Initial analysis of a selection of these videos helped the project team write a preliminary draft instrument consisting of 79 items. Through several rounds of revisions, including the construction of detailed scoring levels of the items and collapsing of items that significantly overlapped, and piloting of the instrument for usability, items were added, edited, and/or removed for various reasons. These reasons included issues concerning the intricacy of the observed phenomenon or the item not being specific to integrated STEM education (e.g., questioning). In its final form, the STEM-OP consists of 10 items, each comprising four descriptive levels. Each item is also accompanied by a set of user guidelines, which have been refined by the project team as a result of piloting the instrument and reviewed by external experts in the field. The instrument has shown to be reliable with the project team and further validation is underway. The STEM-OP will be of use to a wide variety of educators and educational researchers looking to understand the implementation of integrated STEM education in K-12 science and engineering classrooms. 
    more » « less