Retraining with Predicted Hard Labels Provably Increases Model Accuracy

Das, Rudrajit; Dhillon, Inderjit_S; Epasto, Alessandro; Javanmard, Adel; Mao, Jieming; Mirrokni, Vahab; Sanghavi, Sujay; Zhong, Peilin

Citation Details

This content will become publicly available on May 7, 2026

Retraining with Predicted Hard Labels Provably Increases Model Accuracy

Training with noisy labels often yields suboptimal performance, but retraining a model with its own predicted hard labels (binary 1/0 outputs) has been empirically shown to improve accuracy. This paper provides the first theoretical characterization of this phenomenon. In the setting of linearly separable binary classification with randomly corrupted labels, the authors prove that retraining can indeed improve the population accuracy compared to initial training with noisy labels. Retraining also has practical implications for local label differential privacy (DP), where models are trained with noisy labels. The authors propose consensus-based retraining, where retraining is done selectively on samples for which the predicted label matches the given noisy label. This approach significantly improves DP training accuracy at no additional privacy cost. For example, training ResNet-18 on CIFAR-100 with ε = 3 label DP achieves over 6% accuracy improvement with consensus-based retraining. more »

Award ID(s):: 2505865

PAR ID:: 10631934

Author(s) / Creator(s):: Das, Rudrajit; Dhillon, Inderjit_S; Epasto, Alessandro; Javanmard, Adel; Mao, Jieming; Mirrokni, Vahab; Sanghavi, Sujay; Zhong, Peilin

Publisher / Repository:: https://doi.org/10.48550/arXiv.2406.11206

Date Published:: 2025-05-07

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on May 7, 2026
Conference Paper:
The DOI is not currently available.

More Like this