When measuring academic skills among students whose primary language is not English, standardized assessments are often provided in languages other than English. The degree to which alternate-language test translations yield unbiased, equitable assessment must be evaluated; however, traditional methods of investigating measurement equivalence are susceptible to confounding group differences. The primary purposes of this study were to investigate differential item functioning (DIF) and item bias across Spanish and English forms of an assessment of early mathematics skills. Secondary purposes were to investigate the presence of selection bias and demonstrate a novel approach for investigating DIF that uses a regression discontinuity design framework to control for selection bias. Data were drawn from 1,750 Spanish-speaking Kindergarteners participating in the Early Childhood Longitudinal Study, Kindergarten Class of 1998–1999, who were administered either the Spanish or English version of the mathematics assessment based on their performance on an English language screening measure. Evidence of selection bias—differences between groups in SES, age, approaches to learning, self-control, social interaction, country of birth, childcare, household composition and number in the home, books in the home, and parent involvement—highlighted limitations of a traditional approach for investigating DIF that only controlled for ability. When controlling for selection bias, only 11% of items displayed DIF, and subsequent examination of item content did not suggest item bias. Results provide evidence that the Spanish translation of the ECLS-K mathematics assessment is an equitable and unbiased assessment accommodation for young dual language learners.
more »
« less
This content will become publicly available on January 1, 2026
Measuring visual ability in linguistically diverse populations
Abstract Measurement of object recognition (OR) ability could predict learning and success in real-world settings, and there is hope that it may reduce bias often observed in cognitive tests. Although the measurement of visual OR is not expected to be influenced by the language of participants or the language of instructions, these assumptions remain largely untested. Here, we address the challenges of measuring OR abilities across linguistically diverse populations. In Study 1, we find that English–Spanish bilinguals, when randomly assigned to the English or Spanish version of the novel object memory test (NOMT), exhibit a highly similar overall performance. Study 2 extends this by assessing psychometric equivalence using an approach grounded in item response theory (IRT). We examined whether groups fluent in English or Spanish differed in (a) latent OR ability as assessed by a three-parameter logistic IRT model, and (2) the mapping of observed item responses on the latent OR construct, as assessed by differential item functioning (DIF) analyses. Spanish speakers performed better than English speakers, a difference we suggest is due to motivational differences between groups of vastly different size on the Prolific platform. That we found no substantial DIF between the groups tested in English or Spanish on the NOMT indicates measurement invariance. The feasibility of increasing diversity by combining groups tested in different languages remains unexplored. Adopting this approach could enable visual scientists to enhance diversity, equity, and inclusion in their research, and potentially in the broader application of their work in society.
more »
« less
- Award ID(s):
- 2316474
- PAR ID:
- 10589970
- Publisher / Repository:
- Springer
- Date Published:
- Journal Name:
- Behavior Research Methods
- Volume:
- 57
- Issue:
- 1
- ISSN:
- 1554-3528
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract When large-scale assessment programs are developed and administered in a particular language, students from other native language backgrounds may experience considerable barriers to appropriate measurement of the targeted knowledge and skills. Empirical work is needed to determine if one of the most commonly-applied accommodations to address language barriers, namely extended test time limits, corresponds to score comparability for students who use it. Prior work has examined score comparability for English learners (ELs) eligible to use extended time on tests in the United States, but not specifically for those who more specifically show evidence of using the accommodation. NAEP process data were used to explore score comparability for two groups of ELs eligible for extended time: those who used extended time and those who did not. Analysis of differential item functioning (DIF) was applied to examine potential item bias for these groups when compared to a reference group of native English speakers. Items showing significant and large DIF were identified in both comparisons, with slightly more DIF items identified for the comparison involving ELs who used extended time. Item location and word counts were examined for those items displaying DIF, with results showing some alignment with the notion that language-related barriers may be present for ELs even when extended time is used. Overall, results point to a need for ongoing consideration of the unique needs of ELs during large-scale testing, and the opportunities test process data offer for more comprehensive analyses of accommodation use and effectiveness.more » « less
-
The present study examined the role of script in bilingual speech planning by comparing the performance of same and different-script bilinguals. Spanish-English bilinguals (Experiment 1) and Japanese-English bilinguals (Experiment 2) performed a picture-word interference task in which they were asked to name a picture of an object in English, their second language, while ignoring a visual distractor word in Spanish or Japanese, their first language. Results replicated the general pattern seen in previous bilingual picture-word interference studies for the same-script, Spanish-English bilinguals but not for the different-script, Japanese-English bilinguals. Both groups showed translation facilitation, whereas only Spanish-English bilinguals demonstrated semantic interference, phonological facilitation, and phono-translation facilitation. These results suggest that when the script of the language not in use is present in the task, bilinguals appear to exploit the perceptual difference as a language cue to direct lexical access to the intended language earlier in the process of speech planning.more » « less
-
Abstract Establishing the invariance property of an instrument (e.g., a questionnaire or test) is a key step for establishing its measurement validity. Measurement invariance is typically assessed by differential item functioning (DIF) analysis, i.e., detecting DIF items whose response distribution depends not only on the latent trait measured by the instrument but also on the group membership. DIF analysis is confounded by the group difference in the latent trait distributions. Many DIF analyses require knowing several anchor items that are DIF-free in order to draw inferences on whether each of the rest is a DIF item, where the anchor items are used to identify the latent trait distributions. When no prior information on anchor items is available, or some anchor items are misspecified, item purification methods and regularized estimation methods can be used. The former iteratively purifies the anchor set by a stepwise model selection procedure, and the latter selects the DIF-free items by a LASSO-type regularization approach. Unfortunately, unlike the methods based on a correctly specified anchor set, these methods are not guaranteed to provide valid statistical inference (e.g., confidence intervals andp-values). In this paper, we propose a new method for DIF analysis under a multiple indicators and multiple causes (MIMIC) model for DIF. This method adopts a minimal$$L_1$$ norm condition for identifying the latent trait distributions. Without requiring prior knowledge about an anchor set, it can accurately estimate the DIF effects of individual items and further draw valid statistical inferences for quantifying the uncertainty. Specifically, the inference results allow us to control the type-I error for DIF detection, which may not be possible with item purification and regularized estimation methods. We conduct simulation studies to evaluate the performance of the proposed method and compare it with the anchor-set-based likelihood ratio test approach and the LASSO approach. The proposed method is applied to analysing the three personality scales of the Eysenck personality questionnaire-revised (EPQ-R).more » « less
-
It is well established that access to social supports is essential for engineering students’ persistence and yet access to supports varies across groups. Understanding the differential supports inherent in students’ social networks and then working to provide additional needed supports can help the field of engineering education become more inclusive of all students. Our work contributes to this effort by examing the reliability and fairness of a social capital instrument, the Undergraduate Supports Survey (USS). We examined the extent to which two scales were reliable across ability levels (level of social capital), gender groups and year-in-school. We conducted two item response theory (IRT) models using a graded response model and performed differential item functioning (DIF) tests to detect item differences in gender and year-in-school. Our results indicate that most items have acceptable to good item discrimination and difficulty. DIF analysis shows that multiple items report DIF across gender groups in the Expressive Support scale in favor of women and nonbinary engineering students. DIF analysis shows that year-in-school has little to no effect on items, with only one DIF item. Therefore, engineering educators can use the USS confidently to examine expressive and instrumental social capital in undergraduates across year-in-school. Our work can be used by the engineering education research community to identify and address differences in students’ access to support. We recommend that the engineering education community works to be explicit in their expressive and instrumental support. Future work will explore the measurement invariance in Expressive Support items across gender.more » « less
An official website of the United States government
