Addressing discretization-induced bias in demographic prediction

Dong, Evan; Schein, Aaron; Wang, Yixin; Garg, Nikhil

doi:10.1093/pnasnexus/pgaf027

Citation Details

This content will become publicly available on February 1, 2026

Addressing discretization-induced bias in demographic prediction

Abstract Racial and other demographic imputation is necessary for many applications, especially in auditing disparities and outreach targeting in political campaigns. The canonical approach is to construct continuous predictions—e.g. based on name and geography—and then to often discretize the predictions by selecting the most likely class (argmax), potentially with a minimum threshold (thresholding). We study how this practice produces discretization bias. For example, we show that argmax labeling, as used by a prominent commercial voter file vendor to impute race/ethnicity, results in a substantial under-count of Black voters, e.g. by 28.2% points in North Carolina. This bias can have substantial implications in downstream tasks that use such labels. We then introduce a joint optimization approach—and a tractable data-driven threshold heuristic—that can eliminate this bias, with negligible individual-level accuracy loss. Finally, we theoretically analyze discretization bias, show that calibrated continuous models are insufficient to eliminate it, and that an approach such as ours is necessary. Broadly, we warn researchers and practitioners against discretizing continuous demographic predictions without considering downstream consequences. more »

Award ID(s):: 2339427

PAR ID:: 10616974

Author(s) / Creator(s):: Dong, Evan; Schein, Aaron; Wang, Yixin; Garg, Nikhil

Editor(s):: Levy, Morris

Publisher / Repository:: PNAS Nexus

Date Published:: 2025-02-01

Journal Name:: PNAS Nexus

Volume:: 4

Issue:: 2

ISSN:: 2752-6542

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on February 1, 2026
Journal Article:
https://doi.org/10.1093/pnasnexus/pgaf027

More Like this