skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Best vs. All: Equity and Accuracy of Standardized Test Score Reporting
We study a game theoretic model of standardized testing for college admissions. Students are of two types; High and Low. There is a college that would like to admit the High type students. Students take a potentially costly standardized exam which provides a noisy signal of their type. The students come from two populations, which are identical in talent (i.e. the type distribution is the same), but differ in their access to resources: the higher resourced population can at their option take the exam multiple times, whereas the lower resourced population can only take the exam once. We study two models of score reporting, which capture existing policies used by colleges. The first policy (sometimes known as "super-scoring") allows students to report the max of the scores they achieve. The other policy requires that all scores be reported. We find in our model that requiring that all scores be reported results in superior outcomes in equilibrium, both from the perspective of the college (the admissions rule is more accurate), and from the perspective of equity across populations: a student's probability of admission is independent of their population, conditional on their type. In particular, the false positive rates and false negative rates are identical in this setting, across the highly and poorly resourced student populations. This is the case despite the fact that the more highly resourced students can -- at their option -- either report a more accurate signal of their type, or pool with the lower resourced population under this policy.  more » « less
Award ID(s):
1763307
PAR ID:
10333171
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ACM Conference on Fairness, Accountability, and Transparancy (ACM FAccT)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study a two-stage model, in which students are 1) admitted to college on the basis of an entrance exam which is a noisy signal about their qualifications (type), and then 2) those students who were admitted to college can be hired by an employer as a function of their college grades, which are an independently drawn noisy signal of their type. Students are drawn from one of two populations, which might have different type distributions. We assume that the employer at the end of the pipeline is rational, in the sense that it computes a posterior distribution on student type conditional on all information that it has available (college admissions, grades, and group membership), and makes a decision based on posterior expectation. We then study what kinds of fairness goals can be achieved by the college by setting its admissions rule and grading policy. For example, the college might have the goal of guaranteeing equal opportunity across populations: that the probability of passing through the pipeline and being hired by the employer should be independent of group membership, conditioned on type. Alternately, the college might have the goal of incentivizing the employer to have a group blind hiring rule. We show that both goals can be achieved when the college does not report grades. On the other hand, we show that under reasonable conditions, these goals are impossible to achieve even in isolation when the college uses an (even minimally) informative grading policy 
    more » « less
  2. Efforts to improve college-completion rates have dominated higher education policy agendas. Performance-based funding (PBF) intends to improve college completion and links state funding for public colleges and universities to performance measures. One critique of PBF policies is that institutions might restrict student access. This study uses a difference-in-differences design and institution-level data from 2001 to 2014 to examine whether 4-year, public institutions become more selective or enroll fewer underrepresented students under PBF. Our findings, supported by various robustness checks, suggest that institutions subject to PBF enroll students with higher standardized test scores and enroll fewer first-generation students. PBF models tied to institutions’ base funding are more strongly associated with increased standardized test scores and enrollment of Pell students. 
    more » « less
  3. In this research-based paper, we explore the relationships among Rice University STEM students’ high school preparation, psychological characteristics, and career aspirations. Although greater high school preparation in STEM coursework predicts higher STEM retention and performance in college [1], objective academic preparation and college performance do not fully explain STEM retention decisions, and the students who leave STEM are often not the lowest performing students [2]. Certain psychosocial experiences may also influence students’ STEM decisions. We explored the predictive validity of 1) a STEM diagnostic exam as an objective measure of high school science and math preparation and 2) self-efficacy as a psychological measure on long-term (three years later) STEM career aspirations and STEM identity of underprepared Rice STEM students. University administrators use diagnostic exam scores (along with other evidence of high school underpreparation) to identify students who might benefit from additional support. Using linear regression to explore the link between diagnostic exam scores and self-efficacy, exam scores predicted self-efficacy a semester after students’ first semester in college; exam scores were also marginally correlated with self-efficacy three years later. Early STEM career aspirations predicted later career aspirations, accounting for 21.3% of the variance of career outcome expectations three years later (β=.462, p=.006). Scores on the math diagnostic exam accounted for an additional 10.1% of the variance in students’ three-year STEM career aspirations (p=.041). Self-efficacy after students’ first semester did not predict future STEM aspirations. Early STEM identity explained 28.8% of the variance in three-year STEM identity (p=.001). Math diagnostic exam scores accounted for only marginal incremental variance after STEM identity, and self-efficacy after students’ first semester did not predict three-year STEM aspirations. Overall, we found that the diagnostic exam provided incremental predictive validity in STEM career aspirations after students’ sixth semester of college, indicating that early STEM preparation has long-lasting ramifications for students’ STEM career intentions. Our next steps include examining whether students’ diagnostic exam scores predict STEM graduation rates and final GPAs for science and math versus engineering majors. 
    more » « less
  4. null (Ed.)
    We explore how course policies affect students' studying and learning when a second-chance exam is offered. High-stakes, one-off exams remain a de facto standard for assessing student knowledge in STEM, despite compelling evidence that other assessment paradigms such as mastery learning can improve student learning. Unfortunately, mastery learning can be costly to implement. We explore the use of optional second-chance testing to sustainably reap the benefits of mastery-based learning at scale. Prior work has shown that course policies affect students' studying and learning but have not compared these effects within the same course context. We conducted a quasi-experimental study in a single course to compare the effect of two grading policies for second-chance exams and the effect of increasing the size of the range of dates for students taking asynchronous exams. The first grading policy, called 90-cap, allowed students to optionally take a second-chance exam that would fully replace their score on a first-chance exam except the second-chance exam would be capped at 90% credit. The second grading policy, called 90-10, combined students' first- and second-chance exam scores as a weighted average (90% max score + 10% min score). The 90-10 policy significantly increased the likelihood that marginally competent students would take the second-chance exam. Further, our data suggests that students learned more under the 90-10 policy, providing improved student learning outcomes at no cost to the instructor. Most students took exams on the last day an exam was available, regardless of how many days the exam was available. 
    more » « less
  5. Students often find biology courses to be very difficult and isolating, particularly if they identify as part of a group that has been historically excluded from STEM. Some of this anxiety and isolation comes from high-stakes exams. We decided to use the collaborative structure of two-stage exams to try to overcome the isolation of assessment. In two-stage exams, students take an individual exam, and then immediately get into groups and take the exam again, discussing the questions and the rationale behind the answers. Their exam scores are a combination of the two attempts. Our move to emergency online learning because of the COVID-19 pandemic forced us to try our two-stage exams online. In this Teaching Tools and Strategies essay, we discuss our process of offering two-stage exams online at two different institutions: a two-year Community College and four-year Research University. We share feedback from the students and discuss our iterative improvements to two-stage exam use. 
    more » « less