skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Impact of Sample Size and Various Other Factors on Estimation of Dichotomous Mixture IRT Models
The purpose of this study was to examine the effects of different data conditions on item parameter recovery and classification accuracy of three dichotomous mixture item response theory (IRT) models: the Mix1PL, Mix2PL, and Mix3PL. Manipulated factors in the simulation included the sample size (11 different sample sizes from 100 to 5000), test length (10, 30, and 50), number of classes (2 and 3), the degree of latent class separation (normal/no separation, small, medium, and large), and class sizes (equal vs. nonequal). Effects were assessed using root mean square error (RMSE) and classification accuracy percentage computed between true parameters and estimated parameters. The results of this simulation study showed that more precise estimates of item parameters were obtained with larger sample sizes and longer test lengths. Recovery of item parameters decreased as the number of classes increased with the decrease in sample size. Recovery of classification accuracy for the conditions with two-class solutions was also better than that of three-class solutions. Results of both item parameter estimates and classification accuracy differed by model type. More complex models and models with larger class separations produced less accurate results. The effect of the mixture proportions also differentially affected RMSE and classification accuracy results. Groups of equal size produced more precise item parameter estimates, but the reverse was the case for classification accuracy results. Results suggested that dichotomous mixture IRT models required more than 2,000 examinees to be able to obtain stable results as even shorter tests required such large sample sizes for more precise estimates. This number increased as the number of latent classes, the degree of separation, and model complexity increased.  more » « less
Award ID(s):
1813760
PAR ID:
10345698
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Educational and Psychological Measurement
ISSN:
0013-1644
Page Range / eLocation ID:
001316442210943
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Results of a comprehensive simulation study are reported investigating the effects of sample size, test length, number of attributes and base rate of mastery on item parameter recovery and classification accuracy of four DCMs (i.e., C-RUM, DINA, DINO, and LCDMREDUCED). Effects were evaluated using bias and RMSE computed between true (i.e., generating) parameters and estimated parameters. Effects of simulated factors on attribute assignment were also evaluated using the percentage of classification accuracy. More precise estimates of item parameters were obtained with larger sample size and longer test length. Recovery of item parameters decreased as the number of attributes increased from three to five but base rate of mastery had a varying effect on the item recovery. Item parameter and classification accuracy were higher for DINA and DINO models. 
    more » « less
  2. von Davier, Matthias (Ed.)
    Computerized assessment provides rich multidimensional data including trial-by-trial accuracy and response time (RT) measures. A key question in modeling this type of data is how to incorporate RT data, for example, in aid of ability estimation in item response theory (IRT) models. To address this, we propose a joint model consisting of a two-parameter IRT model for the dichotomous item response data, a log-normal model for the continuous RT data, and a normal model for corresponding paper-and-pencil scores. Then, we reformulate and reparameterize the model to capture the relationship between the model parameters, to facilitate the prior specification, and to make the Bayesian computation more efficient. Further, we propose several new model assessment criteria based on the decomposition of deviance information criterion (DIC) the logarithm of the pseudo-marginal likelihood (LPML). The proposed criteria can quantify the improvement in the fit of one part of the multidimensional data given the other parts. Finally, we have conducted several simulation studies to examine the empirical performance of the proposed model assessment criteria and have illustrated the application of these criteria using a real dataset from a computerized educational assessment program. 
    more » « less
  3. Abstract Normal ogive (NO) models have contributed substantially to the advancement of item response theory (IRT) and have become popular educational and psychological measurement models. However, estimating NO models remains computationally challenging. The purpose of this paper is to propose an efficient and reliable computational method for fitting NO models. Specifically, we introduce a novel and unified expectation‐maximization (EM) algorithm for estimating NO models, including two‐parameter, three‐parameter, and four‐parameter NO models. A key improvement in our EM algorithm lies in augmenting the NO model to be a complete data model within the exponential family, thereby substantially streamlining the implementation of the EM iteration and avoiding the numerical optimization computation in the M‐step. Additionally, we propose a two‐step expectation procedure for implementing the E‐step, which reduces the dimensionality of the integration and effectively enables numerical integration. Moreover, we develop a computing procedure for estimating the standard errors (SEs) of the estimated parameters. Simulation results demonstrate the superior performance of our algorithm in terms of its recovery accuracy, robustness, and computational efficiency. To further validate our methods, we apply them to real data from the Programme for International Student Assessment (PISA). The results affirm the reliability of the parameter estimates obtained using our method. 
    more » « less
  4. Abstract Recent empirical studies have quantified correlation between survival and recovery by estimating these parameters as correlated random effects with hierarchical Bayesian multivariate models fit to tag‐recovery data. In these applications, increasingly negative correlation between survival and recovery has been interpreted as evidence for increasingly additive harvest mortality. The power of these hierarchal models to detect nonzero correlations has rarely been evaluated, and these few studies have not focused on tag‐recovery data, which is a common data type. We assessed the power of multivariate hierarchical models to detect negative correlation between annual survival and recovery. Using three priors for multivariate normal distributions, we fit hierarchical effects models to a mallard (Anas platyrhychos) tag‐recovery data set and to simulated data with sample sizes corresponding to different levels of monitoring intensity. We also demonstrate more robust summary statistics for tag‐recovery data sets than total individuals tagged. Different priors led to substantially different estimates of correlation from the mallard data. Our power analysis of simulated data indicated most prior distribution and sample size combinations could not estimate strongly negative correlation with useful precision or accuracy. Many correlation estimates spanned the available parameter space (−1,1) and underestimated the magnitude of negative correlation. Only one prior combined with our most intensive monitoring scenario provided reliable results. Underestimating the magnitude of correlation coincided with overestimating the variability of annual survival, but not annual recovery. The inadequacy of prior distributions and sample size combinations previously assumed adequate for obtaining robust inference from tag‐recovery data represents a concern in the application of Bayesian hierarchical models to tag‐recovery data. Our analysis approach provides a means for examining prior influence and sample size on hierarchical models fit to capture–recapture data while emphasizing transferability of results between empirical and simulation studies. 
    more » « less
  5. null (Ed.)
    Contemporary machine learning applications often involve classification tasks with many classes. Despite their extensive use, a precise understanding of the statistical properties and behavior of classification algorithms is still missing, especially in modern regimes where the number of classes is rather large. In this paper, we take a step in this direction by providing the first asymptotically precise analysis of linear multiclass classification. Our theoretical analysis allows us to precisely character- ize how the test error varies over different training algorithms, data distributions, problem dimensions as well as number of classes, inter/intra class correlations and class priors. Specifically, our analysis reveals that the classification accuracy is highly distribution-dependent with different algorithms achieving optimal per- formance for different data distributions and/or training/features sizes. Unlike linear regression/binary classification, the test error in multiclass classification relies on intricate functions of the trained model (e.g., correlation between some of the trained weights) whose asymptotic behavior is difficult to characterize. This challenge is already present in simple classifiers, such as those minimizing a square loss. Our novel theoretical techniques allow us to overcome some of these chal- lenges. The insights gained may pave the way for a precise understanding of other classification algorithms beyond those studied in this paper. 
    more » « less