skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Bias on the Force Concept Inventory across the intersection of gender and race
Education researchers often compare performance across race and gender on research-based assessments of physics knowledge to investigate the impacts of racism and sexism on physics student learning. These investigations' claims rely on research-based assessments providing reliable, unbiased measures of student knowledge across social identity groups. We used classical test theory and differential item functioning (DIF) analysis to examine whether the items on the Force Concept Inventory (FCI) provided unbiased data across social identifiers for race, gender, and their intersections. The data was accessed through the Learning About STEM Student Outcomes platform and included responses from 4,848 students posttests in 152 calculus-based introductory physics courses from 16 institutions. The results indicated that the majority of items (22) on the FCI were biased towards a group. These results point to the need for instrument validation to account for item bias and the identification or development of fair research-based assessments.  more » « less
Award ID(s):
1928596
PAR ID:
10355682
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
2021 Physics Education Research Conference Proceedings
Page Range / eLocation ID:
69 to 74
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Research-based assessments (RBAs) allow researchers and practitioners to compare student performance across different contexts and institutions. In recent years, research attention has focused on the student populations these RBAs were initially developed with because much of that research was done with “samples of convenience” that were predominantly white men. Prior research has found that the Force Concept Inventory (FCI) behaved differently for men and women using differential item functioning (DIF) analysis. We extend this research in two ways. First, we test the FCI for DIF across the intersection of gender and race for Asian, Black, Hispanic, White, and White Hispanic men and women. Second, we apply the Eaton and Willoughby five-factor model of the FCI to interpret the results of the DIF analysis. We found large DIF on a large number of FCI items. The patterns of items with large DIF follows the five-factor model. The alignment of DIF with this factor structure, along with the measurement invariance of this factor structure across these ten social identities, indicates that the items on the FCI are likely not biased but are instead measuring real differences in physics knowledge among these groups. We frame these differences as educational debts that society owes to these marginalized groups that physics instruction needs to actively repay. 
    more » « less
  2. Abstract We investigated the intersectional nature of race/racism and gender/sexism in broad scale inequities in physics student learning using a critical quantitative intersectionality. To provide transparency and create a nuanced picture of learning, we problematized the measurement of equity by using two competing operationalizations of equity:Equity of IndividualityandEquality of Learning. These two models led to conflicting conclusions. The analyses used hierarchical linear models to examine student's conceptual learning as measured by gains in scores on research‐based assessments administered as pretests and posttests. The data came from the Learning About STEM Student Outcomes' (LASSO) national database and included data from 13,857 students in 187 first‐semester college physics courses. Findings showed differences in student gains across gender and race. Large gender differences existed for White and Hispanic students but not for Asian, Black, and Pacific Islander students. The models predicted larger gains for students in collaborative learning than in lecture‐based courses. The Equity of Individuality operationalization indicated that collaborative instruction improved equity because all groups learned more with collaborative learning. The Equality of Learning operationalization indicated that collaborative instruction did not improve equity because differences between groups were unaffected. We discuss the implications of these mixed findings and identify areas for future research using critical quantitative perspectives in education research. 
    more » « less
  3. In physics education research, instructors and researchers often use research-based assessments (RBAs) to assess students’ skills and knowledge. In this paper, we support the development of a mechanics cognitive diagnostic to test and implement effective and equitable pedagogies for physics instruction. Adaptive assessments using cognitive diagnostic models provide significant advantages over fixed-length RBAs commonly used in physics education research. As part of a broader project to develop a cognitive diagnostic assessment for introductory mechanics within an evidence-centered design framework, we identified and tested the student models of four skills that cross content areas in introductory physics: apply vectors, conceptual relationships, algebra, and visualizations. We developed the student models in three steps. First, we based the model on learning objectives from instructors. Second, we coded the items on RBAs using the student models. Finally, we then tested and refined this coding using a common cognitive diagnostic model, the deterministic inputs, noisy “and” gate model. The data included 19 889 students who completed either the Force Concept Inventory, Force and Motion Conceptual Evaluation, or Energy and Momentum Conceptual Survey on the LASSO platform. The results indicated a good to adequate fit for the student models with high accuracies for classifying students with many of the skills. The items from these three RBAs do not cover all of the skills in enough detail, however, they will form a useful initial item bank for the development of the mechanics cognitive diagnostic. 
    more » « less
  4. This study investigates differences in student participation rates between in-class and online administrations of research-based assessments. A sample of 1,310 students from 25 sections of 3 different introductory physics courses over two semesters were instructed to complete the CLASS attitudinal survey and the concept inventory relevant to their course, either the FCI or the CSEM. Each student was randomly assigned to take one of the surveys in class and the other survey online at home using the Learning About STEM Student Outcomes (LASSO) platform. Results indicate large variations in participation rates across both test conditions (online and in class). A hierarchical generalized linear model (HGLM) of the student data utilizing logistic regression indicates that student grades in the course and faculty assessment administration practices were both significant predictors of student participation. When the recommended online assessments administration practices were implemented, participation rates were similar across test conditions. Implications for student and course assessment methodologies will be discussed. 
    more » « less
  5. Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents evidence that among nationally representative samples of 15-year-olds in the United States participating in the 2009, 2012, and 2015 PISA math and reading tests, there are consistent item format by gender differences. On average, male students answer multiple-choice items correctly relatively more often and female students answer constructed-response items correctly relatively more often. These patterns were consistent across 34 additional participating PISA jurisdictions, although the size of the format differences varied and were larger on average in reading than math. The average magnitude of the format differences is not large enough to be flagged in routine differential item functioning analyses intended to detect test bias but is large enough to raise questions about the validity of inferences based on comparisons of scores across gender groups. Researchers and other test users should account for test item format, particularly when comparing scores across gender groups. 
    more » « less