skip to main content


Title: Are translated mathematics items a valid accommodation for dual language learners? Evidence from ECLS-K
When measuring academic skills among students whose primary language is not English, standardized assessments are often provided in languages other than English. The degree to which alternate-language test translations yield unbiased, equitable assessment must be evaluated; however, traditional methods of investigating measurement equivalence are susceptible to confounding group differences. The primary purposes of this study were to investigate differential item functioning (DIF) and item bias across Spanish and English forms of an assessment of early mathematics skills. Secondary purposes were to investigate the presence of selection bias and demonstrate a novel approach for investigating DIF that uses a regression discontinuity design framework to control for selection bias. Data were drawn from 1,750 Spanish-speaking Kindergarteners participating in the Early Childhood Longitudinal Study, Kindergarten Class of 1998–1999, who were administered either the Spanish or English version of the mathematics assessment based on their performance on an English language screening measure. Evidence of selection bias—differences between groups in SES, age, approaches to learning, self-control, social interaction, country of birth, childcare, household composition and number in the home, books in the home, and parent involvement—highlighted limitations of a traditional approach for investigating DIF that only controlled for ability. When controlling for selection bias, only 11% of items displayed DIF, and subsequent examination of item content did not suggest item bias. Results provide evidence that the Spanish translation of the ECLS-K mathematics assessment is an equitable and unbiased assessment accommodation for young dual language learners.  more » « less
Award ID(s):
1749275
NSF-PAR ID:
10340578
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Early childhood research quarterly
Volume:
57
Issue:
2021
ISSN:
0885-2006
Page Range / eLocation ID:
89-101
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The Standards for educational and psychological assessment were developed by the American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (AERA et al., 2014). The Standards specify assessment developers establish five types of validity evidence: test content, response processes, internal structure, relationship to other variables, and consequential/bias. Relevant to this proposal is consequential validity evidence that identifies the potential negative impact of testing or bias. Standard 3.1 of The Standards (2014) on fairness in testing states that “those responsible for test development, revision, and administration should design all steps of the testing process to promote valid score interpretations for intended score uses for the widest possible range of individuals and relevant sub-groups in the intended populations” (p. 63). Three types of bias include construct, method, and item bias (Boer et al., 2018). Testing for differential item functioning (DIF) is a standard analysis adopted to detect item bias against a subgroup (Boer et al., 2018). Example subgroups include gender, race/ethnic group, socioeconomic status, native language, or disability. DIF is when “equally able test takers differ in their probabilities answering a test item correctly as a function of group membership” (AERA et al., 2005, p. 51). DIF indicates systematic error as compared to real mean group differences (Camilli & Shepard, 1994). Items exhibiting significant DIF are removed or reviewed for sources leading to bias to determine modifications to retain and further test an item. The Delphi technique is an emergent systematic research method whereby expert panel members review item content through an iterative process (Yildirim & Büyüköztürk, 2018). Experts independently evaluate each item for potential sources leading to DIF, researchers group their responses, and experts then independently complete a survey to rate their level of agreement with the anonymously grouped responses. This process continues until saturation and consensus are reached among experts as established through some criterion (e.g., median agreement rating, item quartile range, and percent agreement). The technique allows researchers to “identify, learn, and share the ideas of experts by searching for agreement among experts” (Yildirim & Büyüköztürk, 2018, p. 451). Research has illustrated this technique applied after DIF is detected, but not before administering items in the field. The current research is a methodological illustration of the Delphi technique applied in the item construction phase of assessment development as part of a five-year study to develop and test new problem-solving measures (PSM; Bostic et al., 2015, 2017) for U.S.A. grades 6-8 in a computer adaptive testing environment. As part of an iterative design-science-based methodology (Middleton et al., 2008), we illustrate the integration of the Delphi technique into the item writing process. Results from two three-person panels each reviewing a set of 45 PSM items are utilized to illustrate the technique. Advantages and limitations identified through a survey by participating experts and researchers are outlined to advance the method. 
    more » « less
  2. Differential item functioning (DIF) is often used to examine validity evidence of alternate form test accommodations. Unfortunately, traditional approaches for evaluating DIF are prone to selection bias. This article proposes a novel DIF framework that capitalizes on regression discontinuity design analysis to control for selection bias. A simulation study was performed to compare the new framework with traditional logistic regression, with respect to Type I error and power rates of the uniform DIF test statistics and bias and root mean square error of the corresponding effect size estimators. The new framework better controlled the Type I error rate and demonstrated minimal bias but suffered from low power and lack of precision. Implications for practice are discussed. 
    more » « less
  3. Context

    The Medicare Shared Savings Program (MSSP) establishes incentives for participating accountable care organizations (ACOs) to lower spending for their attributed fee‐for‐service Medicare patients. Turnover in ACO physicians and patient panels has raised concerns that ACOs may be earning shared‐savings bonuses by selecting lower‐risk patients or providers with lower‐risk panels.

    Methods

    We conducted three sets of analyses of Medicare claims data. First, we estimated overall MSSP savings through 2015 using a difference‐in‐differences approach and methods that eliminated selection bias from ACO program exit or changes in the practices or physicians included in ACO contracts. We then checked for residual risk selection at the patient level. Second, we reestimated savings with methods that address undetected risk selection but could introduce bias from other sources. These included patient fixed effects, baseline or prospective assignment, and area‐level MSSP exposure to hold patient populations constant. Third, we tested for changes in provider composition or provider billing that may have contributed to bonuses, even if they were eliminated as sources of bias in the evaluation analyses.

    Findings

    MSSP participation was associated with modest and increasing annual gross savings in the 2012‐2013 entry cohorts of ACOs that reached $139 to $302 per patient by 2015. Savings in the 2014 entry cohort were small and not statistically significant. Robustness checks revealed no evidence of residual risk selection. Alternative methods to address risk selection produced results that were substantively consistent with our primary analysis but varied somewhat and were more sensitive to adjustment for patient characteristics, suggesting the introduction of bias from within‐patient changes in time‐varying characteristics. We found no evidence of ACO manipulation of provider composition or billing to inflate savings. Finally, larger savings for physician group ACOs were robust to consideration of differential changes in organizational structure among non‐ACO providers (eg, from consolidation).

    Conclusions

    Participation in the original MSSP program was associated with modest savings and not with favorable risk selection. These findings suggest an opportunity to build on early progress. Understanding the effect of new opportunities and incentives for risk selection in the revamped MSSP will be important for guiding future program reforms.

     
    more » « less
  4. Background/Context:

    Computer programming is rarely accessible to K–12 students, especially for those from culturally and linguistically diverse backgrounds. Middle school age is a transitioning time when adolescents are more likely to make long-term decisions regarding their academic choices and interests. Having access to productive and positive knowledge and experiences in computer programming can grant them opportunities to realize their abilities and potential in this field.

    Purpose/Focus of Study:

    This study focuses on the exploration of the kind of relationship that bilingual Latinx students developed with themselves and computer programming and mathematics (CPM) practices through their participation in a CPM after-school program, first as students and then as cofacilitators teaching CPM practices to other middle school peers.

    Setting:

    An after-school program, Advancing Out-of-School Learning in Mathematics and Engineering (AOLME), was held at two middle schools located in rural and urban areas in the Southwest. It was designed to support an inclusive cultural environment that nurtured students’ opportunities to learn CPM practices through the inclusion of languages (Spanish and English), tasks, and participants congruent to students in the program. Students learned how to represent, design, and program digital images and videos using a sequence of 2D arrays of hexadecimal numbers with Python on a Raspberry Pi computer. The six bilingual cofacilitators attended Levels 1 and 2 as students and were offered the opportunity to participate as cofacilitators in the next implementation of Level 1.

    Research Design:

    This longitudinal case study focused on analyzing the experiences and shifts (if any) of students who participated as cofacilitators in AOLME. Their narratives were analyzed collectively, and our analysis describes the experiences of the cofacilitators as a single case study (with embedded units) of what it means to be a bilingual cofacilitator in AOLME. Data included individual exit interviews of the six cofacilitators and their focus groups (30–45 minutes each), an adapted 20-item CPM attitude 5-point Likert scale, and self-report from each of them. Results from attitude scales revealed cofacilitators’ greater initial and posterior connections to CPM practices. The self-reports on CPM included two number lines (0–10) for before and after AOLME for students to self-assess their liking and knowledge of CPM. The numbers were used as interview prompts to converse with students about experiences. The interview data were analyzed qualitatively and coded through a contrast-comparative process regarding students’ description of themselves, their experiences in the program, and their perception of and relationship toward CPM practices.

    Findings:

    Findings indicated that students had continued/increased motivation and confidence in CPM as they engaged in a journey as cofacilitators, described through two thematic categories: (a) shifting views by personally connecting to CPM, and (b) affirming CPM practices through teaching. The shift in connecting to CPM practices evolved as students argued that they found a new way of learning mathematics, in that they used mathematics as a tool to create videos and images that they programmed by using Python while making sense of the process bilingually (Spanish and English). This mathematics was viewed by students as high level, which in turned helped students gain self-confidence in CPM practices. Additionally, students affirmed their knowledge and confidence in CPM practices by teaching them to others, a process in which they had to mediate beyond the understanding of CPM practices. They came up with new ways of explaining CPM practices bilingually to their peers. In this new role, cofacilitators considered the topic and language, and promoted a communal support among the peers they worked with.

    Conclusions/Recommendations:

    Bilingual middle school students can not only program, but also teach bilingually and embrace new roles with nurturing support. Schools can promote new student roles, which can yield new goals and identities. There is a great need to redesign the school mathematics curriculum as a discipline that teenagers can use and connect with by creating and finding things they care about. In this way, school mathematics can support a closer “fit” with students’ identification with the world of mathematics. Cofacilitators learned more about CPM practices by teaching them, extending beyond what was given to them, and constructing new goals that were in line with a sophisticated knowledge and shifts in the practice. Assigned responsibility in a new role can strengthen students’ self-image, agency, and ways of relating to mathematics.

     
    more » « less
  5. Bilinguals occasionally produce language intrusion errors (inadvertent translations of the intended word), especially when attempting to produce function word targets, and often when reading aloud mixed-language paragraphs. We investigate whether these errors are due to a failure of attention during speech planning, or failure of monitoring speech output by classifying errors based on whether and when they were corrected, and investigating eye movement behaviour surrounding them. Prior research on this topic has primarily tested alphabetic languages (e.g., Spanish–English bilinguals) in which part of speech is confounded with word length, which is related to word skipping (i.e., decreased attention). Therefore, we tested 29 Chinese–English bilinguals whose languages differ in orthography, visually cueing language membership, and for whom part of speech (in Chinese) is less confounded with word length. Despite the strong orthographic cue, Chinese–English bilinguals produced intrusion errors with similar effects as previously reported (e.g., especially with function word targets written in the dominant language). Gaze durations did differ by whether errors were made and corrected or not, but these patterns were similar for function and content words and therefore cannot explain part of speech effects. However, bilinguals regressed to words produced as errors more often than to correctly produced words, but regressions facilitated correction of errors only for content, not for function words. These data suggest that the vulnerability of function words to language intrusion errors primarily reflects automatic retrieval and failures of speech monitoring mechanisms from stopping function versus content word errors after they are planned for production.

     
    more » « less