skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Curvilinearity in the Reference Composite and Practical Implications for Measurement
Abstract Item difficulty and dimensionality often correlate, implying that unidimensional IRT approximations to multidimensional data (i.e., reference composites) can take a curvilinear form in the multidimensional space. Although this issue has been previously discussed in the context of vertical scaling applications, we illustrate how such a phenomenon can also easily occur within individual tests. Measures of reading proficiency, for example, often use different task types within a single assessment, a feature that may not only lead to multidimensionality, but also an association between item difficulty and dimensionality. Using a latent regression strategy, we demonstrate through simulations and empirical analysis how associations between dimensionality and difficulty yield a nonlinear reference composite where the weights of the underlying dimensionschangeacross the scale continuum according to the difficulties of the items associated with the dimensions. We further show how this form of curvilinearity produces systematic forms of misspecification in traditional unidimensional IRT models (e.g., 2PL) and can be better accommodated by models such as monotone‐polynomial or asymmetric IRT models. Simulations and a real‐data example from the Early Childhood Longitudinal Study—Kindergarten are provided for demonstration. Some implications for measurement modeling and for understanding the effects of 2PL misspecification on measurement metrics are discussed.  more » « less
Award ID(s):
1749275
PAR ID:
10517239
Author(s) / Creator(s):
; ;
Publisher / Repository:
Wiley
Date Published:
Journal Name:
Journal of Educational Measurement
ISSN:
0022-0655
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Item response theory (IRT) has become one of the most popular statistical models for psychometrics, a field of study concerned with the theory and techniques of psychological measurement. The IRT models are latent factor models tailored to the analysis, interpretation, and prediction of individuals’ behaviors in answering a set of measurement items that typically involve categorical response data. Many important questions of measurement are directly or indirectly answered through the use of IRT models, including scoring individuals’ test performances, validating a test scale, linking two tests, among others. This paper provides a review of item response theory, including its statistical framework and psychometric applications. We establish connections between item response theory and related topics in statistics, including empirical Bayes, nonparametric methods, matrix completion, regularized estimation, and sequential analysis. Possible future directions of IRT are discussed from the perspective of statistical learning. 
    more » « less
  2. Research on spatial thinking requires reliable and valid measures of individual differences in various component skills. Spatial perspective taking (PT)-the ability to represent viewpoints different from one's own-is one kind of spatial skill that is especially relevant to navigation. This study had two goals. First, the psychometric properties of four PT tests were examined: Four Mountains Task (FMT), Spatial Orientation Task (SOT), Perspective-Taking Task for Adults (PTT-A), and Photographic Perspective-Taking Task (PPTT). Using item response theory (IRT), item difficulty, discriminability, and efficiency of item information functions were evaluated. Second, the relation of PT scores to general intelligence, working memory, and mental rotation (MR) was assessed. All tasks showed good construct validity except for FMT. PPTT tapped a wide range of PT ability, with maximum measurement precision at average ability. PTT-A captured a lower range of ability. Although SOT contributed less measurement information than other tasks, it did well across a wide range of PT ability. After controlling for general intelligence and working memory, original and IRT-refined versions of PT tasks were each related to MR. PTT-A and PPTT showed relatively more divergent validity from MR than SOT. Tests of dimensionality indicated that PT tasks share one common PT dimension, with secondary task-specific factors also impacting the measurement of individual differences in performance. Advantages and disadvantages of a hybrid PT test that includes a combination of items across tasks are discussed. 
    more » « less
  3. von Davier, Matthias (Ed.)
    Computerized assessment provides rich multidimensional data including trial-by-trial accuracy and response time (RT) measures. A key question in modeling this type of data is how to incorporate RT data, for example, in aid of ability estimation in item response theory (IRT) models. To address this, we propose a joint model consisting of a two-parameter IRT model for the dichotomous item response data, a log-normal model for the continuous RT data, and a normal model for corresponding paper-and-pencil scores. Then, we reformulate and reparameterize the model to capture the relationship between the model parameters, to facilitate the prior specification, and to make the Bayesian computation more efficient. Further, we propose several new model assessment criteria based on the decomposition of deviance information criterion (DIC) the logarithm of the pseudo-marginal likelihood (LPML). The proposed criteria can quantify the improvement in the fit of one part of the multidimensional data given the other parts. Finally, we have conducted several simulation studies to examine the empirical performance of the proposed model assessment criteria and have illustrated the application of these criteria using a real dataset from a computerized educational assessment program. 
    more » « less
  4. Abstract Traditional approaches to the modeling of multiple‐choice item response data (e.g., 3PL, 4PL models) emphasize slips and guesses as random events. In this paper, an item response model is presented that characterizes both disjunctively interacting guessing and conjunctively interacting slipping processes as proficiency‐related phenomena. We show how evidence for this perspective is seen in the systematic form of invariance violations for item slip and guess parameters under four‐parameter IRT models when compared across populations of different mean proficiency levels. Specifically, higher proficiency populations tend to show higher guess and lower slip probabilities than lower proficiency populations. The results undermine the use of traditional models for IRT applications that require invariance and would suggest greater attention to alternatives. 
    more » « less
  5. Abstract Pairwise comparison models are an important type of latent attribute measurement model with broad applications in the social and behavioural sciences. Current pairwise comparison models are typically unidimensional. The existing multidimensional pairwise comparison models tend to be difficult to interpret and they are unable to identify groups of raters that share the same rater-specific parameters. To fill this gap, we propose a new multidimensional pairwise comparison model with enhanced interpretability which explicitly models how object attributes on different dimensions are differentially perceived by raters. Moreover, we add a Dirichlet process prior on rater-specific parameters which allows us to flexibly cluster raters into groups with similar perceptual orientations. We conduct simulation studies to show that the new model is able to recover the true latent variable values from the observed binary choice data. We use the new model to analyse original survey data regarding the perceived truthfulness of statements on COVID-19 collected in the summer of 2020. By leveraging the strengths of the new model, we find that the partisanship of the speaker and the partisanship of the respondent account for the majority of the variation in perceived truthfulness, with statements made by co-partisans being viewed as more truthful. 
    more » « less