Abstract Measurement of object recognition (OR) ability could predict learning and success in real-world settings, and there is hope that it may reduce bias often observed in cognitive tests. Although the measurement of visual OR is not expected to be influenced by the language of participants or the language of instructions, these assumptions remain largely untested. Here, we address the challenges of measuring OR abilities across linguistically diverse populations. In Study 1, we find that English–Spanish bilinguals, when randomly assigned to the English or Spanish version of the novel object memory test (NOMT), exhibit a highly similar overall performance. Study 2 extends this by assessing psychometric equivalence using an approach grounded in item response theory (IRT). We examined whether groups fluent in English or Spanish differed in (a) latent OR ability as assessed by a three-parameter logistic IRT model, and (2) the mapping of observed item responses on the latent OR construct, as assessed by differential item functioning (DIF) analyses. Spanish speakers performed better than English speakers, a difference we suggest is due to motivational differences between groups of vastly different size on the Prolific platform. That we found no substantial DIF between the groups tested in English or Spanish on the NOMT indicates measurement invariance. The feasibility of increasing diversity by combining groups tested in different languages remains unexplored. Adopting this approach could enable visual scientists to enhance diversity, equity, and inclusion in their research, and potentially in the broader application of their work in society.
more »
« less
Are translated mathematics items a valid accommodation for dual language learners? Evidence from ECLS-K
When measuring academic skills among students whose primary language is not English, standardized assessments are often provided in languages other than English. The degree to which alternate-language test translations yield unbiased, equitable assessment must be evaluated; however, traditional methods of investigating measurement equivalence are susceptible to confounding group differences. The primary purposes of this study were to investigate differential item functioning (DIF) and item bias across Spanish and English forms of an assessment of early mathematics skills. Secondary purposes were to investigate the presence of selection bias and demonstrate a novel approach for investigating DIF that uses a regression discontinuity design framework to control for selection bias. Data were drawn from 1,750 Spanish-speaking Kindergarteners participating in the Early Childhood Longitudinal Study, Kindergarten Class of 1998–1999, who were administered either the Spanish or English version of the mathematics assessment based on their performance on an English language screening measure. Evidence of selection bias—differences between groups in SES, age, approaches to learning, self-control, social interaction, country of birth, childcare, household composition and number in the home, books in the home, and parent involvement—highlighted limitations of a traditional approach for investigating DIF that only controlled for ability. When controlling for selection bias, only 11% of items displayed DIF, and subsequent examination of item content did not suggest item bias. Results provide evidence that the Spanish translation of the ECLS-K mathematics assessment is an equitable and unbiased assessment accommodation for young dual language learners.
more »
« less
- Award ID(s):
- 1749275
- PAR ID:
- 10340578
- Date Published:
- Journal Name:
- Early childhood research quarterly
- Volume:
- 57
- Issue:
- 2021
- ISSN:
- 0885-2006
- Page Range / eLocation ID:
- 89-101
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Differential item functioning (DIF) is often used to examine validity evidence of alternate form test accommodations. Unfortunately, traditional approaches for evaluating DIF are prone to selection bias. This article proposes a novel DIF framework that capitalizes on regression discontinuity design analysis to control for selection bias. A simulation study was performed to compare the new framework with traditional logistic regression, with respect to Type I error and power rates of the uniform DIF test statistics and bias and root mean square error of the corresponding effect size estimators. The new framework better controlled the Type I error rate and demonstrated minimal bias but suffered from low power and lack of precision. Implications for practice are discussed.more » « less
-
The Standards for educational and psychological assessment were developed by the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (AERA et al., 2014). The Standards specify assessment developers establish five types of validity evidence: test content, response processes, internal structure, relationship to other variables, and consequential/bias. Relevant to this proposal is consequential validity evidence that identifies the potential negative impact of testing or bias. Standard 3.1 of The Standards (2014) on fairness in testing states that “those responsible for test development, revision, and administration should design all steps of the testing process to promote valid score interpretations for intended score uses for the widest possible range of individuals and relevant sub-groups in the intended populations” (p. 63). Three types of bias include construct, method, and item bias (Boer et al., 2018). Testing for differential item functioning (DIF) is a standard analysis adopted to detect item bias against a subgroup (Boer et al., 2018). Example subgroups include gender, race/ethnic group, socioeconomic status, native language, or disability. DIF is when “equally able test takers differ in their probabilities answering a test item correctly as a function of group membership” (AERA et al., 2005, p. 51). DIF indicates systematic error as compared to real mean group differences (Camilli & Shepard, 1994). Items exhibiting significant DIF are removed or reviewed for sources leading to bias to determine modifications to retain and further test an item. The Delphi technique is an emergent systematic research method whereby expert panel members review item content through an iterative process (Yildirim & Büyüköztürk, 2018). Experts independently evaluate each item for potential sources leading to DIF, researchers group their responses, and experts then independently complete a survey to rate their level of agreement with the anonymously grouped responses. This process continues until saturation and consensus are reached among experts as established through some criterion (e.g., median agreement rating, item quartile range, and percent agreement). The technique allows researchers to “identify, learn, and share the ideas of experts by searching for agreement among experts” (Yildirim & Büyüköztürk, 2018, p. 451). Research has illustrated this technique applied after DIF is detected, but not before administering items in the field. The current research is a methodological illustration of the Delphi technique applied in the item construction phase of assessment development as part of a five-year study to develop and test new problem-solving measures (PSM; Bostic et al., 2015, 2017) for U.S.A. grades 6-8 in a computer adaptive testing environment. As part of an iterative design-science-based methodology (Middleton et al., 2008), we illustrate the integration of the Delphi technique into the item writing process. Results from two three-person panels each reviewing a set of 45 PSM items are utilized to illustrate the technique. Advantages and limitations identified through a survey by participating experts and researchers are outlined to advance the method.more » « less
-
The early development of spatial reasoning skills has been linked to future success in mathematics (Wai, Lubinski, & Benbow, 2009), but research to date has mainly focused on the development of these skills within classroom settings rather than at home. The home environment is often the first place students are exposed to, and develop, early mathematics skills, including spatial reasoning (Blevins-Knabe, 2016; Hart, Ganley, & Purpura, 2016). The purpose of the current study is to develop a survey instrument to better understand Kindergarten through Grade 2 students’ opportunities to learn spatial reasoning skills at home. Using an argument-based approach to validation (Kane, 2013), we collected multiple sources of validity evidence, including expert review of item wording and content and pilot data from 201 parent respondents. This manuscript outlines the interpretation/use argument that guides our validation study and presents evidence collected to evaluate the scoring inferences for using the survey to measure students’ opportunities to learn spatial reasoning skills at home.more » « less
-
Abstract Cross-linguistic interactions are the hallmark of bilingual development. Theoretical perspectives highlight the key role ofcross-linguistic distancesandlanguage structurein literacy development. Despite the strong theoretical assumptions, the impact of such bilingualism factors in heritage-language speakers remains elusive given high variability in children's heritage-language experiences. A longitudinal inquiry of heritage-language learners of structurally distinct languages – Spanish–English and Chinese–English bilinguals (N= 181,Mage= 7.57, measured 1.5 years apart) aimed to fill this gap. Spanish–English bilinguals showed stronger associations between morphological awareness skills across their two languages, across time, likely reflecting cross-linguistic similarities in vocabulary and lexical morphology between Spanish and English. Chinese–English bilinguals, however, showed stronger associations between morphological and word reading skills in English, likely reflecting the critical role of morphology in spoken and written Chinese word structure. The findings inform theories of literacy by uncovering the mechanisms by which bilingualism factors influence child literacy development.more » « less