skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: DIF Statistical Inference Without Knowing Anchoring Items
Abstract Establishing the invariance property of an instrument (e.g., a questionnaire or test) is a key step for establishing its measurement validity. Measurement invariance is typically assessed by differential item functioning (DIF) analysis, i.e., detecting DIF items whose response distribution depends not only on the latent trait measured by the instrument but also on the group membership. DIF analysis is confounded by the group difference in the latent trait distributions. Many DIF analyses require knowing several anchor items that are DIF-free in order to draw inferences on whether each of the rest is a DIF item, where the anchor items are used to identify the latent trait distributions. When no prior information on anchor items is available, or some anchor items are misspecified, item purification methods and regularized estimation methods can be used. The former iteratively purifies the anchor set by a stepwise model selection procedure, and the latter selects the DIF-free items by a LASSO-type regularization approach. Unfortunately, unlike the methods based on a correctly specified anchor set, these methods are not guaranteed to provide valid statistical inference (e.g., confidence intervals andp-values). In this paper, we propose a new method for DIF analysis under a multiple indicators and multiple causes (MIMIC) model for DIF. This method adopts a minimal$$L_1$$ L 1 norm condition for identifying the latent trait distributions. Without requiring prior knowledge about an anchor set, it can accurately estimate the DIF effects of individual items and further draw valid statistical inferences for quantifying the uncertainty. Specifically, the inference results allow us to control the type-I error for DIF detection, which may not be possible with item purification and regularized estimation methods. We conduct simulation studies to evaluate the performance of the proposed method and compare it with the anchor-set-based likelihood ratio test approach and the LASSO approach. The proposed method is applied to analysing the three personality scales of the Eysenck personality questionnaire-revised (EPQ-R).  more » « less
Award ID(s):
2150601 1846747
PAR ID:
10502887
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Springer
Date Published:
Journal Name:
Psychometrika
Volume:
88
Issue:
4
ISSN:
0033-3123
Page Range / eLocation ID:
1097 to 1122
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Background Individuals on the autism spectrum are reported to display alterations in interoception, the sense of the internal state of the body. The Interoception Sensory Questionnaire (ISQ) is a 20-item self-report measure of interoception specifically intended to measure this construct in autistic people. The psychometrics of the ISQ, however, have not previously been evaluated in a large sample of autistic individuals. Methods Using confirmatory factor analysis, we evaluated the latent structure of the ISQ in a large online sample of adults on the autism spectrum and found that the unidimensional model fit the data poorly. Using misspecification analysis to identify areas of local misfit and item response theory to investigate the appropriateness of the seven-point response scale, we removed redundant items and collapsed the response options to put forth a novel eight-item, five-response choice ISQ. Results The revised, five-response choice ISQ (ISQ-8) showed much improved fit while maintaining high internal reliability. Differential item functioning (DIF) analyses indicated that the items of the ISQ-8 were answered in comparable ways by autistic adolescents and adults and across multiple other sociodemographic groups. Limitations Our results were limited by the fact that we did not collect data for typically developing controls, preventing the analysis of DIF by diagnostic status. Additionally, while this study proposes a new 5-response scale for the ISQ-8, our data were not collected using this method; thus, the psychometric properties for the revised version of this instrument require further investigation. Conclusion The ISQ-8 shows promise as a reliable and valid measure of interoception in adolescents and adults on the autism spectrum, but additional work is needed to examine its psychometrics in this population. A free online score calculator has been created to facilitate the use of ISQ-8 latent trait scores for further studies of autistic adolescents and adults (available at https://asdmeasures.shinyapps.io/ISQ_score/ ). 
    more » « less
  2. It is well established that access to social supports is essential for engineering students’ persistence and yet access to supports varies across groups. Understanding the differential supports inherent in students’ social networks and then working to provide additional needed supports can help the field of engineering education become more inclusive of all students. Our work contributes to this effort by examing the reliability and fairness of a social capital instrument, the Undergraduate Supports Survey (USS). We examined the extent to which two scales were reliable across ability levels (level of social capital), gender groups and year-in-school. We conducted two item response theory (IRT) models using a graded response model and performed differential item functioning (DIF) tests to detect item differences in gender and year-in-school. Our results indicate that most items have acceptable to good item discrimination and difficulty. DIF analysis shows that multiple items report DIF across gender groups in the Expressive Support scale in favor of women and nonbinary engineering students. DIF analysis shows that year-in-school has little to no effect on items, with only one DIF item. Therefore, engineering educators can use the USS confidently to examine expressive and instrumental social capital in undergraduates across year-in-school. Our work can be used by the engineering education research community to identify and address differences in students’ access to support. We recommend that the engineering education community works to be explicit in their expressive and instrumental support. Future work will explore the measurement invariance in Expressive Support items across gender. 
    more » « less
  3. It is well established that access to social supports is essential for engineering students’ persistence and yet access to supports varies across groups. Understanding the differential supports inherent in students’ social networks and then working to provide additional needed supports can help the field of engineering education become more inclusive of all students. Our work contributes to this effort by examing the reliability and fairness of a social capital instrument, the Undergraduate Supports Survey (USS). We examined the extent to which two scales were reliable across ability levels (level of social capital), gender groups and year-in-school. We conducted two item response theory (IRT) models using a graded response model and performed differential item functioning (DIF) tests to detect item differences in gender and year-in-school. Our results indicate that most items have acceptable to good item discrimination and difficulty. DIF analysis shows that multiple items report DIF across gender groups in the Expressive Support scale in favor of women and nonbinary engineering students. DIF analysis shows that year-in-school has little to no effect on items, with only one DIF item. Therefore, engineering educators can use the USS confidently to examine expressive and instrumental social capital in undergraduates across year-in-school. Our work can be used by the engineering education research community to identify and address differences in students’ access to support. We recommend that the engineering education community works to be explicit in their expressive and instrumental support. Future work will explore the measurement invariance in Expressive Support items across gender. 
    more » « less
  4. The Standards for educational and psychological assessment were developed by the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (AERA et al., 2014). The Standards specify assessment developers establish five types of validity evidence: test content, response processes, internal structure, relationship to other variables, and consequential/bias. Relevant to this proposal is consequential validity evidence that identifies the potential negative impact of testing or bias. Standard 3.1 of The Standards (2014) on fairness in testing states that “those responsible for test development, revision, and administration should design all steps of the testing process to promote valid score interpretations for intended score uses for the widest possible range of individuals and relevant sub-groups in the intended populations” (p. 63). Three types of bias include construct, method, and item bias (Boer et al., 2018). Testing for differential item functioning (DIF) is a standard analysis adopted to detect item bias against a subgroup (Boer et al., 2018). Example subgroups include gender, race/ethnic group, socioeconomic status, native language, or disability. DIF is when “equally able test takers differ in their probabilities answering a test item correctly as a function of group membership” (AERA et al., 2005, p. 51). DIF indicates systematic error as compared to real mean group differences (Camilli & Shepard, 1994). Items exhibiting significant DIF are removed or reviewed for sources leading to bias to determine modifications to retain and further test an item. The Delphi technique is an emergent systematic research method whereby expert panel members review item content through an iterative process (Yildirim & Büyüköztürk, 2018). Experts independently evaluate each item for potential sources leading to DIF, researchers group their responses, and experts then independently complete a survey to rate their level of agreement with the anonymously grouped responses. This process continues until saturation and consensus are reached among experts as established through some criterion (e.g., median agreement rating, item quartile range, and percent agreement). The technique allows researchers to “identify, learn, and share the ideas of experts by searching for agreement among experts” (Yildirim & Büyüköztürk, 2018, p. 451). Research has illustrated this technique applied after DIF is detected, but not before administering items in the field. The current research is a methodological illustration of the Delphi technique applied in the item construction phase of assessment development as part of a five-year study to develop and test new problem-solving measures (PSM; Bostic et al., 2015, 2017) for U.S.A. grades 6-8 in a computer adaptive testing environment. As part of an iterative design-science-based methodology (Middleton et al., 2008), we illustrate the integration of the Delphi technique into the item writing process. Results from two three-person panels each reviewing a set of 45 PSM items are utilized to illustrate the technique. Advantages and limitations identified through a survey by participating experts and researchers are outlined to advance the method. 
    more » « less
  5. In a physical system with conformal symmetry, observables depend on cross-ratios, measures of distance invariant under global conformal transformations (conformal geometry for short). We identify a quantum information-theoretic mechanism by which the conformal geometry emerges at the gapless edge of a 2+1D quantum many-body system with a bulk energy gap. We introduce a novel pair of information-theoretic quantities(\mathfrak{c}_{\textrm{tot}}, \eta) ( 𝔠 tot , η ) that can be defined locally on the edge from the wavefunction of the many-body system, without prior knowledge of any distance measure. We posit that, for a topological groundstate, the quantity\mathfrak{c}_{\textrm{tot}} 𝔠 tot is stationary under arbitrary variations of the quantum state, and study the logical consequences. We show that stationarity, modulo an entanglement-based assumption about the bulk, implies (i)\mathfrak{c}_{\textrm{tot}} 𝔠 tot is a non-negative constant that can be interpreted as the total central charge of the edge theory. (ii)\eta η is a cross-ratio, obeying the full set of mathematical consistency rules, which further indicates the existence of a distance measure of the edge with global conformal invariance. Thus, the conformal geometry emerges from a simple assumption on groundstate entanglement. We show that stationarity of\mathfrak{c}_{\textrm{tot}} 𝔠 tot is equivalent to a vector fixed-point equation involving\eta η , making our assumption locally checkable. We also derive similar results for 1+1D systems under a suitable set of assumptions. 
    more » « less