skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on September 15, 2026

Title: A Novel Method for Detecting Intersectional DIF: Multilevel Random Item Effects Model with Regularized Gaussian Variational Estimation
Abstract Differential item functioning (DIF) screening has long been suggested to ensure assessment fairness. Traditional DIF methods typically focus on the main effects of demographic variables on item parameters, overlooking the interactions among multiple identities. Drawing on the intersectionality framework, we define intersectional DIF as deviations in item parameters that arise from the interactions among demographic variables beyond their main effects and propose a novel item response theory (IRT) approach for detecting intersectional DIF. Under our framework, fixed effects are used to account for traditional DIF, while random item effects are introduced to capture intersectional DIF. We further introduce the concept of intersectional impact, which refers to interaction effects on group-level mean ability. Depending on which item parameters are affected and whether intersectional impact is considered, we propose four models, which aim to detect intersectional uniform DIF (UDIF), intersectional UDIF with intersectional impact, intersectional non-uniform DIF (NUDIF), and intersectional NUDIF with intersectional impact, respectively. For efficient model estimation, a regularized Gaussian variational expectation-maximization algorithm is developed. Simulation studies demonstrate that our methods can effectively detect intersectional UDIF, although their detection of intersectional NUDIF is more limited.  more » « less
Award ID(s):
1846747 2150601
PAR ID:
10645467
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Cambridge University Press
Date Published:
Journal Name:
Psychometrika
ISSN:
0033-3123
Page Range / eLocation ID:
1 to 25
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. As algorithmic decision making is increasingly deployed in every walk of life, many researchers have raised concerns about fairness-related bias from such algorithms. But there is little research on harnessing psychometric methods to uncover potential discriminatory bias inside decision-making algorithms. The main goal of this article is to propose a new framework for algorithmic fairness based on differential item functioning (DIF), which has been commonly used to measure item fairness in psychometrics. Our fairness notion, which we call differential algorithmic functioning (DAF), is defined based on three pieces of information: a decision variable, a “fair” variable, and a protected variable such as race or gender. Under the DAF framework, an algorithm can exhibit uniform DAF, nonuniform DAF, or neither (i.e., non-DAF). For detecting DAF, we provide modifications of well-established DIF methods: Mantel–Haenszel test, logistic regression, and residual-based DIF. We demonstrate our framework through a real dataset concerning decision-making algorithms for grade retention in K–12 education in the United States. 
    more » « less
  2. Differential item functioning (DIF) is often used to examine validity evidence of alternate form test accommodations. Unfortunately, traditional approaches for evaluating DIF are prone to selection bias. This article proposes a novel DIF framework that capitalizes on regression discontinuity design analysis to control for selection bias. A simulation study was performed to compare the new framework with traditional logistic regression, with respect to Type I error and power rates of the uniform DIF test statistics and bias and root mean square error of the corresponding effect size estimators. The new framework better controlled the Type I error rate and demonstrated minimal bias but suffered from low power and lack of precision. Implications for practice are discussed. 
    more » « less
  3. When measuring academic skills among students whose primary language is not English, standardized assessments are often provided in languages other than English. The degree to which alternate-language test translations yield unbiased, equitable assessment must be evaluated; however, traditional methods of investigating measurement equivalence are susceptible to confounding group differences. The primary purposes of this study were to investigate differential item functioning (DIF) and item bias across Spanish and English forms of an assessment of early mathematics skills. Secondary purposes were to investigate the presence of selection bias and demonstrate a novel approach for investigating DIF that uses a regression discontinuity design framework to control for selection bias. Data were drawn from 1,750 Spanish-speaking Kindergarteners participating in the Early Childhood Longitudinal Study, Kindergarten Class of 1998–1999, who were administered either the Spanish or English version of the mathematics assessment based on their performance on an English language screening measure. Evidence of selection bias—differences between groups in SES, age, approaches to learning, self-control, social interaction, country of birth, childcare, household composition and number in the home, books in the home, and parent involvement—highlighted limitations of a traditional approach for investigating DIF that only controlled for ability. When controlling for selection bias, only 11% of items displayed DIF, and subsequent examination of item content did not suggest item bias. Results provide evidence that the Spanish translation of the ECLS-K mathematics assessment is an equitable and unbiased assessment accommodation for young dual language learners. 
    more » « less
  4. Abstract Establishing the invariance property of an instrument (e.g., a questionnaire or test) is a key step for establishing its measurement validity. Measurement invariance is typically assessed by differential item functioning (DIF) analysis, i.e., detecting DIF items whose response distribution depends not only on the latent trait measured by the instrument but also on the group membership. DIF analysis is confounded by the group difference in the latent trait distributions. Many DIF analyses require knowing several anchor items that are DIF-free in order to draw inferences on whether each of the rest is a DIF item, where the anchor items are used to identify the latent trait distributions. When no prior information on anchor items is available, or some anchor items are misspecified, item purification methods and regularized estimation methods can be used. The former iteratively purifies the anchor set by a stepwise model selection procedure, and the latter selects the DIF-free items by a LASSO-type regularization approach. Unfortunately, unlike the methods based on a correctly specified anchor set, these methods are not guaranteed to provide valid statistical inference (e.g., confidence intervals andp-values). In this paper, we propose a new method for DIF analysis under a multiple indicators and multiple causes (MIMIC) model for DIF. This method adopts a minimal$$L_1$$ L 1 norm condition for identifying the latent trait distributions. Without requiring prior knowledge about an anchor set, it can accurately estimate the DIF effects of individual items and further draw valid statistical inferences for quantifying the uncertainty. Specifically, the inference results allow us to control the type-I error for DIF detection, which may not be possible with item purification and regularized estimation methods. We conduct simulation studies to evaluate the performance of the proposed method and compare it with the anchor-set-based likelihood ratio test approach and the LASSO approach. The proposed method is applied to analysing the three personality scales of the Eysenck personality questionnaire-revised (EPQ-R). 
    more » « less
  5. Abstract The management of sustainable harvest of animal populations is of great ecological and conservation importance. Development of formal quantitative tools to estimate and mitigate the impacts of harvest on animal populations has positively impacted conservation efforts.The vast majority of existing harvest models, however, do not simultaneously estimate ecological and harvest impacts on demographic parameters and population trends. Given that the impacts of ecological drivers are often equal to or greater than the effects of harvest, and can covary with harvest, this disconnect has the potential to lead to flawed inference.In this study, we used Bayesian hierarchical models and a 43‐year capture–mark–recovery dataset from 404,241 female mallardsAnas platyrhynchosreleased in the North American midcontinent to estimate mallard demographic parameters. Furthermore, we model the dynamics of waterfowl hunters and habitat, and the direct and indirect effects of anthropogenic and ecological processes on mallard demographic parameters.We demonstrate that density dependence, habitat conditions and harvest can simultaneously impact demographic parameters of female mallards, and discuss implications for existing and future harvest management models.Our results demonstrate the importance of controlling for multicollinearity among demographic drivers in harvest management models, and provide evidence for multiple mechanisms that lead to partial compensation of mallard harvest. We provide a novel model structure to assess these relationships that may allow for improved inference and prediction in future iterations of harvest management models across taxa. 
    more » « less