Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
The purpose of this study was to examine the effects of different data conditions on item parameter recovery and classification accuracy of three dichotomous mixture item response theory (IRT) models: the Mix1PL, Mix2PL, and Mix3PL. Manipulated factors in the simulation included the sample size (11 different sample sizes from 100 to 5000), test length (10, 30, and 50), number of classes (2 and 3), the degree of latent class separation (normal/no separation, small, medium, and large), and class sizes (equal vs. nonequal). Effects were assessed using root mean square error (RMSE) and classification accuracy percentage computed between true parameters and estimated parameters. The results of this simulation study showed that more precise estimates of item parameters were obtained with larger sample sizes and longer test lengths. Recovery of item parameters decreased as the number of classes increased with the decrease in sample size. Recovery of classification accuracy for the conditions with two-class solutions was also better than that of three-class solutions. Results of both item parameter estimates and classification accuracy differed by model type. More complex models and models with larger class separations produced less accurate results. The effect of the mixture proportions also differentially affected RMSE and classification accuracy results. Groups of equal size produced more precise item parameter estimates, but the reverse was the case for classification accuracy results. Results suggested that dichotomous mixture IRT models required more than 2,000 examinees to be able to obtain stable results as even shorter tests required such large sample sizes for more precise estimates. This number increased as the number of latent classes, the degree of separation, and model complexity increased.more » « less
-
null (Ed.)Results of a comprehensive simulation study are reported investigating the effects of sample size, test length, number of attributes and base rate of mastery on item parameter recovery and classification accuracy of four DCMs (i.e., C-RUM, DINA, DINO, and LCDMREDUCED). Effects were evaluated using bias and RMSE computed between true (i.e., generating) parameters and estimated parameters. Effects of simulated factors on attribute assignment were also evaluated using the percentage of classification accuracy. More precise estimates of item parameters were obtained with larger sample size and longer test length. Recovery of item parameters decreased as the number of attributes increased from three to five but base rate of mastery had a varying effect on the item recovery. Item parameter and classification accuracy were higher for DINA and DINO models.more » « less
-
null (Ed.)Selected response items and constructed response (CR) items are often found in the same test. Conventional psychometric models for these two types of items typically focus on using the scores for correctness of the responses. Recent research suggests, however, that more information may be available from the CR items than just scores for correctness. In this study, we describe an approach in which a statistical topic model along with a diagnostic classification model (DCM) was applied to a mixed item format formative test of English and Language Arts. The DCM was used to estimate students’ mastery status of reading skills. These mastery statuses were then included in a topic model as covariates to predict students’ use of each of the latent topics in their written answers to a CR item. This approach enabled investigation of the effects of mastery status of reading skills on writing patterns. Results indicated that one of the skills, Integration of Knowledge and Ideas, helped detect and explain students’ writing patterns with respect to students’ use of individual topics.more » « less
-
In this paper, we focus on the design of assessments of mathematics teachers’ knowledge by emphasising the importance of identifying the purpose for the assessment, defining the specific construct to be measured, and considering the affordances of particular psychometric models on the development of assessments as well as how they are able to communicate learning or understanding. We add to the literature by providing illustrations of the interactions among these critical considerations in determining what inferences can be drawn from an assessment. We illustrate how the considerations shape assessments by discussing both existing and ongoing research projects. We feature discussion of two projects on which the authors of this paper are collaborating to demonstrate the affordances of attending to all three considerations in designing assessments of mathematics teachers’ knowledge to provide readers with opportunities to see those considerations in use.more » « less
-
Abstract Conventional assessment analysis of student results, referred to as rubric‐based assessments (RBA), has emphasized numeric scores as the primary way of communicating information to teachers about their students’ learning. In this light, rethinking and reflecting on not only how scores are generated but also what analyses are done with them to inform classroom practices is of utmost importance. Informed by Systemic Functional Linguistics and Latent Dirichlet Allocation analyses, this study utilizes an innovative bilingual (Spanish–English) constructed response assessment of science and language practices for middle and high school students to perform a multilayered analysis of student responses. We explore multiple ways of looking at students’ performance through their written assessments and discuss features of student responses that are made visible through these analyses. Findings from this study suggest that science educators would benefit from a multidimensional model which deploys complementary ways in which we can interpret student performance. This understanding leads us to think that researchers and developers in the field of assessment need to promote approaches that analyze student science performance as a multilayered phenomenon.