The recent surge in computerized testing brings challenges in the analysis of testing data with classic item response theory (IRT) models. To handle individually varying and irregularly spaced longitudinal dichotomous responses, we adopt a dynamic IRT model framework and then extend the model to link with individual characteristics at a hierarchical level. Further, we have developed an algorithm to select important characteristics of individuals that can capture the growth changes of one’s ability under this multi-level dynamic IRT model, where we can compute the Bayes factor of the proposed model including different covariates using a single Markov chain Monte Carlo output from the full model. In addition, we have shown the model selection consistency under the modified Zellner–Siow prior, and we have conducted simulations to illustrate the properties of the model selection consistency in finite samples. Finally, we have applied our proposed model and computational algorithms to a real data application, called EdSphere dataset, in educational testing. 
                        more » 
                        « less   
                    
                            
                            Bayesian Model Assessment for Jointly Modeling Multidimensional Response Data with Application to Computerized Testing
                        
                    
    
            Computerized assessment provides rich multidimensional data including trial-by-trial accuracy and response time (RT) measures. A key question in modeling this type of data is how to incorporate RT data, for example, in aid of ability estimation in item response theory (IRT) models. To address this, we propose a joint model consisting of a two-parameter IRT model for the dichotomous item response data, a log-normal model for the continuous RT data, and a normal model for corresponding paper-and-pencil scores. Then, we reformulate and reparameterize the model to capture the relationship between the model parameters, to facilitate the prior specification, and to make the Bayesian computation more efficient. Further, we propose several new model assessment criteria based on the decomposition of deviance information criterion (DIC) the logarithm of the pseudo-marginal likelihood (LPML). The proposed criteria can quantify the improvement in the fit of one part of the multidimensional data given the other parts. Finally, we have conducted several simulation studies to examine the empirical performance of the proposed model assessment criteria and have illustrated the application of these criteria using a real dataset from a computerized educational assessment program. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1848451
- PAR ID:
- 10348490
- Editor(s):
- von Davier, Matthias
- Date Published:
- Journal Name:
- Psychometrika
- ISSN:
- 0033-3123
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract Nowadays, multidimensional data are often available from educational testing. One natural issue is to identify whether more dimensional data are useful in fitting the item response data. To address this important issue, we develop a new decomposition of Widely Applicable Information Criterion (WAIC) via the posterior predictive ordinate (PPO) under the joint model for the response, response time and two additional educational testing scores. Based on this decomposition, a new model assessment criterion is then proposed, which allows us to determine which of the response time and two additional scores are most useful in fitting the response data and whether other dimensional data are further needed given that one of these dimensional data is already included in the joint model with the response data. In addition, an efficient Monte Carlo method is developed to compute PPO. An extensive simulation study is conducted to examine the empirical performance of the proposed joint model and the model assessment criterion in the psychological setting. The proposed methodology is further applied to an analysis of a real dataset from a computerized educational assessment program.more » « less
- 
            Abstract In this paper, we develop a mixed stochastic approximation expectation‐maximization (MSAEM) algorithm coupled with a Gibbs sampler to compute the marginalized maximum a posteriori estimate (MMAPE) of a confirmatory multidimensional four‐parameter normal ogive (M4PNO) model. The proposed MSAEM algorithm not only has the computational advantages of the stochastic approximation expectation‐maximization (SAEM) algorithm for multidimensional data, but it also alleviates the potential instability caused by label‐switching, and then improved the estimation accuracy. Simulation studies are conducted to illustrate the good performance of the proposed MSAEM method, where MSAEM consistently performs better than SAEM and some other existing methods in multidimensional item response theory. Moreover, the proposed method is applied to a real data set from the 2018 Programme for International Student Assessment (PISA) to demonstrate the usefulness of the 4PNO model as well as MSAEM in practice.more » « less
- 
            null (Ed.)The goal of item response theoretic (IRT) models is to provide estimates of latent traits from binary observed indicators and at the same time to learn the item response funcitons (IRFs) that map from latent trait to observed response. However, in many cases observed behavior can deviate significantly from the parametric assumptions of traditional IRT models. Nonparametric IRT (NIRT) models overcome these challenges by relaxing assumptions about the form of the IRFs, but standard tools are unable to simultaneously estimate flexible IRFs and recover ability estimates for respondents. We propose a Bayesian nonparametric model that solves this problem by placing Gaussian process priors on the latent functions defining the IRFs. This allows us to simultaneously relax assumptions about the shape of the IRFs while preserving the ability to estimate latent traits. This in turn allows us to easily extend the model to further tasks such as active learning. GPIRT therefore provides a simple and intuitive solution to several longstanding problems in the IRT literature.more » « less
- 
            Recent years have seen a movement within the research-based assessment development community towards item formats that go beyond simple multiple-choice formats. Some have moved towards free-response questions, particularly at the upper-division level; however, free-response items have the constraint that they must be scored by hand. To avoid this limitation, some assessment developers have moved toward formats that maintain the closed-response format, while still providing more nuanced insight into student reasoning. One such format is known as coupled, multiple response (CMR). This format pairs multiple-choice and multiple-response formats to allow students to both commit to an answer in addition to selecting options that correspond with their reasoning. In addition to being machine-scorable, this format allows for more nuanced scoring than simple right or wrong. However, such nuanced scoring presents a potential challenge with respect to utilizing certain testing theories to construct validity arguments for the assessment. In particular, Item Response Theory (IRT) models often assume dichotomously scored items. While polytomous IRT models do exist, each brings with it certain constraints and limitations. Here, we will explore multiple IRT models and scoring schema using data from an existing CMR test, with the goal of providing guidance and insight for possible methods for simultaneously leveraging the affordances of both the CMR format and IRT models in the context of constructing validity arguments for research-based assessments.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    