skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Methods for utilizing Item response theory with Coupled, Multiple-Response assessments
Recent years have seen a movement within the research-based assessment development community towards item formats that go beyond simple multiple-choice formats. Some have moved towards free-response questions, particularly at the upper-division level; however, free-response items have the constraint that they must be scored by hand. To avoid this limitation, some assessment developers have moved toward formats that maintain the closed-response format, while still providing more nuanced insight into student reasoning. One such format is known as coupled, multiple response (CMR). This format pairs multiple-choice and multiple-response formats to allow students to both commit to an answer in addition to selecting options that correspond with their reasoning. In addition to being machine-scorable, this format allows for more nuanced scoring than simple right or wrong. However, such nuanced scoring presents a potential challenge with respect to utilizing certain testing theories to construct validity arguments for the assessment. In particular, Item Response Theory (IRT) models often assume dichotomously scored items. While polytomous IRT models do exist, each brings with it certain constraints and limitations. Here, we will explore multiple IRT models and scoring schema using data from an existing CMR test, with the goal of providing guidance and insight for possible methods for simultaneously leveraging the affordances of both the CMR format and IRT models in the context of constructing validity arguments for research-based assessments.  more » « less
Award ID(s):
2013332 2013339
PAR ID:
10446610
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Physics Education Research Conference Proceedings
Page Range / eLocation ID:
488 to 493
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Research-based assessment instruments (RBAIs) are essential tools to measure aspects of student learning and improve pedagogical practice. RBAIs are designed to measure constructs related to a well-defined learning goal. However, relatively few RBAIs exist that are suitable for the specific learning goals of upper-division physics lab courses. One such learning goal is modeling, the process of constructing, testing, and refining models of physical and measurement systems. Here, we describe the creation of one component of an RBAI to measure proficiency with modeling. The RBAI is called the Modeling Assessment for Physics Laboratory Experiments (MAPLE). For use with large numbers of students, MAPLE must be scalable, which includes not requiring impractical amounts of labor to analyze its data as is often the case with large free-response assessments. We, therefore, use the coupled multiple response (CMR) format, from which data can be analyzed by a computer, to create items for measuring student reasoning in this component of MAPLE.We describe the process we used to create a set of CMR items for MAPLE, provide an example of this process for an item, and lay out an argument for construct validity of the resulting items based on our process. 
    more » « less
  2. Item response theory (IRT) has become one of the most popular statistical models for psychometrics, a field of study concerned with the theory and techniques of psychological measurement. The IRT models are latent factor models tailored to the analysis, interpretation, and prediction of individuals’ behaviors in answering a set of measurement items that typically involve categorical response data. Many important questions of measurement are directly or indirectly answered through the use of IRT models, including scoring individuals’ test performances, validating a test scale, linking two tests, among others. This paper provides a review of item response theory, including its statistical framework and psychometric applications. We establish connections between item response theory and related topics in statistics, including empirical Bayes, nonparametric methods, matrix completion, regularized estimation, and sequential analysis. Possible future directions of IRT are discussed from the perspective of statistical learning. 
    more » « less
  3. Determining the most appropriate method of scoring an assessment is based on multiple factors, including the intended use of results, the assessment's purpose, and time constraints. Both the dichotomous and partial credit models have their advantages, yet direct comparisons of assessment outcomes from each method are not typical with constructed response items. The present study compared the impact of both scoring methods on the internal structure and consequential validity of a middle-grades problem-solving assessment called the problem solving measure for grade six (PSM6). After being scored both ways, Rasch dichotomous and partial credit analyses indicated similarly strong psychometric findings across models. Student outcome measures on the PSM6, scored both dichotomously and with partial credit, demonstrated strong, positive, significant correlation. Similar demographic patterns were noted regardless of scoring method. Both scoring methods produced similar results, suggesting that either would be appropriate to use with the PSM6. 
    more » « less
  4. Abstract Determining the most appropriate method of scoring an assessment is based on multiple factors, including the intended use of results, the assessment's purpose, and time constraints. Both the dichotomous and partial credit models have their advantages, yet direct comparisons of assessment outcomes from each method are not typical with constructed response items. The present study compared the impact of both scoring methods on the internal structure and consequential validity of a middle‐grades problem‐solving assessment called the problem solving measure for grade six (PSM6). After being scored both ways, Rasch dichotomous and partial credit analyses indicated similarly strong psychometric findings across models. Student outcome measures on the PSM6, scored both dichotomously and with partial credit, demonstrated strong, positive, significant correlation. Similar demographic patterns were noted regardless of scoring method. Both scoring methods produced similar results, suggesting that either would be appropriate to use with the PSM6. 
    more » « less
  5. Determining the most appropriate method of scoring an assessment is basedon multiple factors, including the intended use of results, the assessment's pur-pose, and time constraints. Both the dichotomous and partial credit modelshave their advantages, yet direct comparisons of assessment outcomes fromeach method are not typical with constructed response items. The presentstudy compared the impact of both scoring methods on the internal structureand consequential validity of a middle-grades problem-solving assessmentcalled the problem solving measure for grade six (PSM6). After being scoredboth ways, Rasch dichotomous and partial credit analyses indicated similarlystrong psychometric findings across models. Student outcome measures on thePSM6, scored both dichotomously and with partial credit, demonstratedstrong, positive, significant correlation. Similar demographic patterns werenoted regardless of scoring method. Both scoring methods produced similarresults, suggesting that either would be appropriate to use with the PSM6. 
    more » « less