skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Work in Progress: Evaluating the Effect of Symbolic Problem Solving on Testing Validity and Reliability
Problem-solving is a typical type of assessment in engineering dynamics tests. To solve a problem, students need to set up equations and find a numerical answer. Depending on its difficulty and complexity, it can take anywhere from ten to thirty minutes to solve a quantitative problem. Due to the time constraint of in-class testing, a typical test may only contain a limited number of problems, covering an insufficient range of problem types. This can potentially reduce validity and reliability, two crucial factors which contribute to assessment results. A test with high validity should cover proper content. It should be able to distinguish high-performing students from low-performing students and every student in between. A reliable test should have a sufficient number of items to provide consistent information about students’ mastery of the materials. In this work-in-progress study, we will investigate to what extent a newly developed assessment is valid and reliable. Symbolic problem solving in this study refers to solving problems by setting up a system of equations without finding numeric solutions. Such problems usually take much less time. As a result, we can include more problems of a variety of types in a test. We evaluate the new assessment's validity and reliability. The efficient approach focused in symbolic problem-solving allows for a diverse range of problems in a single test. We will follow Standards for Educational and Psychological Testing, referred to as the Standards, for our study. The Standards were developed jointly by three professional organizations including the American Educational Research Association (AERA), the American Psychological Association (APA), and the National Council on Measurement in Education (NCME). We will use the standards to evaluate the content validity and internal consistency of a collection of symbolic problems. Examples on rectilinear kinematics and angular motion will be provided to illustrate how symbolic problem solving is used in both homework and assessments. Numerous studies in the literature have shown that symbolic questions impose greater challenges because of students’ algebraic difficulties. Thus, we will share strategies on how to prepare students to approach such problems.  more » « less
Award ID(s):
1927284
PAR ID:
10483245
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
ASEE
Date Published:
Journal Name:
ASEE annual conference exposition proceedings
ISSN:
2153-5868
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. [This paper is part of the Focused Collection in Artificial Intelligence Tools in Physics Teaching and Physics Education Research.] One of the greatest weaknesses of physics education research is the paucity of research on graduate education. While there are a growing number of investigations of graduate student degree progress and admissions, there are very few investigations of at the graduate level. Additionally, existing studies of learning in physics graduate programs frequently focus on content knowledge rather than professional skills such as problem solving. Given that over 90% of physics Ph.D. graduates report solving technical problems regularly in the workplace, we sought to develop an assessment to measure how well graduate programs are training students to solve problems. Using a framework that characterizes expert-like problem-solving skills as a set of decisions to be made, we developed and validated such an assessment in graduate quantum mechanics (QM) following recently developed design frameworks for measuring problem solving and best practices for assessment validation. We collected validity evidence through think-aloud interviews with practicing physicists and physics graduate students, as well as written solutions provided by physics graduate and undergraduate students. The assessment shows strong potential in differentiating novice and expert problem solving in QM and showed reliability in repeated testing with similar populations. These results show the promise of measuring expert decision making in graduate QM and provide baseline measurements for future educational interventions to more effectively teach these skills. Published by the American Physical Society2025 
    more » « less
  2. The Standards for educational and psychological assessment were developed by the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (AERA et al., 2014). The Standards specify assessment developers establish five types of validity evidence: test content, response processes, internal structure, relationship to other variables, and consequential/bias. Relevant to this proposal is consequential validity evidence that identifies the potential negative impact of testing or bias. Standard 3.1 of The Standards (2014) on fairness in testing states that “those responsible for test development, revision, and administration should design all steps of the testing process to promote valid score interpretations for intended score uses for the widest possible range of individuals and relevant sub-groups in the intended populations” (p. 63). Three types of bias include construct, method, and item bias (Boer et al., 2018). Testing for differential item functioning (DIF) is a standard analysis adopted to detect item bias against a subgroup (Boer et al., 2018). Example subgroups include gender, race/ethnic group, socioeconomic status, native language, or disability. DIF is when “equally able test takers differ in their probabilities answering a test item correctly as a function of group membership” (AERA et al., 2005, p. 51). DIF indicates systematic error as compared to real mean group differences (Camilli & Shepard, 1994). Items exhibiting significant DIF are removed or reviewed for sources leading to bias to determine modifications to retain and further test an item. The Delphi technique is an emergent systematic research method whereby expert panel members review item content through an iterative process (Yildirim & Büyüköztürk, 2018). Experts independently evaluate each item for potential sources leading to DIF, researchers group their responses, and experts then independently complete a survey to rate their level of agreement with the anonymously grouped responses. This process continues until saturation and consensus are reached among experts as established through some criterion (e.g., median agreement rating, item quartile range, and percent agreement). The technique allows researchers to “identify, learn, and share the ideas of experts by searching for agreement among experts” (Yildirim & Büyüköztürk, 2018, p. 451). Research has illustrated this technique applied after DIF is detected, but not before administering items in the field. The current research is a methodological illustration of the Delphi technique applied in the item construction phase of assessment development as part of a five-year study to develop and test new problem-solving measures (PSM; Bostic et al., 2015, 2017) for U.S.A. grades 6-8 in a computer adaptive testing environment. As part of an iterative design-science-based methodology (Middleton et al., 2008), we illustrate the integration of the Delphi technique into the item writing process. Results from two three-person panels each reviewing a set of 45 PSM items are utilized to illustrate the technique. Advantages and limitations identified through a survey by participating experts and researchers are outlined to advance the method. 
    more » « less
  3. Problem solving is central to mathematics learning (NCTM, 2014). Assessments are needed that appropriately measure students’ problem-solving performance. More importantly, assessments must be grounded in robust validity evidence that justifies their interpretations and outcomes (AERA et al., 2014). Thus, measures that are grounded in validity evidence are warranted for use by practitioners and scholars. The purpose of this presentation is to convey validity evidence for a new measure titled Problem-Solving Measure for grade four (PSM4). The research question is: What validity evidence supports PSM4 administration? The PSM4 is one assessment within the previously published PSM series designed for elementary and middle grades students. Problems are grounded in Schoenfeld’s (2011) framework and rely upon Verschaffel et al. (1999) perspective that word problems be open, complex, and realistic. The mathematics in the problems is tied to USA grade-level content and practice standards (CCSSI, 2010). 
    more » « less
  4. This evidence-based practices paper discusses the method employed in validating the use of a project modified version of the PROCESS tool (Grigg, Van Dyken, Benson, & Morkos, 2013) for measuring student problem solving skills. The PROCESS tool allows raters to score students’ ability in the domains of Problem definition, Representing the problem, Organizing information, Calculations, Evaluating the solution, Solution communication, and Self-assessment. Specifically, this research compares student performance on solving traditional textbook problems with novel, student-generated learning activities (i.e. reverse engineering videos in order to then create their own homework problem and solution). The use of student-generated learning activities to assess student problem solving skills has theoretical underpinning in Felder’s (1987) work of “creating creative engineers,” as well as the need to develop students’ abilities to transfer learning and solve problems in a variety of real world settings. In this study, four raters used the PROCESS tool to score the performance of 70 students randomly selected from two undergraduate chemical engineering cohorts at two Midwest universities. Students from both cohorts solved 12 traditional textbook style problems and students from the second cohort solved an additional nine student-generated video problems. Any large scale assessment where multiple raters use a rating tool requires the investigation of several aspects of validity. The many-facets Rasch measurement model (MFRM; Linacre, 1989) has the psychometric properties to determine if there are any characteristics other than “student problem solving skills” that influence the scores assigned, such as rater bias, problem difficulty, or student demographics. Before implementing the full rating plan, MFRM was used to examine how raters interacted with the six items on the modified PROCESS tool to score a random selection of 20 students’ performance in solving one problem. An external evaluator led “inter-rater reliability” meetings where raters deliberated rationale for their ratings and differences were resolved by recourse to Pretz, et al.’s (2003) problem-solving cycle that informed the development of the PROCESS tool. To test the new understandings of the PROCESS tool, raters were assigned to score one new problem from a different randomly selected group of six students. Those results were then analyzed in the same manner as before. This iterative process resulted in substantial increases in reliability, which can be attributed to increased confidence that raters were operating with common definitions of the items on the PROCESS tool and rating with consistent and comparable severity. This presentation will include examples of the student-generated problems and a discussion of common discrepancies and solutions to the raters’ initial use of the PROCESS tool. Findings as well as the adapted PROCESS tool used in this study can be useful to engineering educators and engineering education researchers. 
    more » « less
  5. Problem solving is a central focus of mathematics teaching and learning. If teachers are expected to support students' problem-solving development, then it reasons that teachers should also be able to solve problems aligned to grade level content standards. The purpose of this validation study is twofold: (1) to present evidence supporting the use of the Problem Solving Measures Grades 3–5 with preservice teachers (PSTs), and (2) to examine PSTs' abilities to solve problems aligned to grades 3–5 academic content standards. This study used Rasch measurement techniques to support psychometric analysis of the Problem Solving Measures when used with PSTs. Results indicate the Problem Solving Measures are appropriate for use with PSTs, and PSTs' performance on the Problem Solving Measures differed between first-year PSTs and end-of-program PSTs. Implications include program evaluation and the potential benefits of using K-12 student-level assessments as measures of PSTs' content knowledge. 
    more » « less