Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation

Tan, Sarah; Caruana, Rich; Hooker, Giles; Lou, Yin

doi:10.1145/3278721.3278725

Citation Details

Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation

Black-box risk scoring models permeate our lives, yet are typically proprietary or opaque. We propose Distill-and-Compare, an approach to audit such models without probing the black-box model API or pre-defining features to audit. To gain insight into black-box models, we treat them as teachers, training transparent student models to mimic the risk scores assigned by the black-box models. We compare the mimic model trained with distillation to a second, un-distilled transparent model trained on ground truth outcomes, and use differences between the two models to gain insight into the black-box model. We demonstrate the approach on four data sets: COMPAS, Stop-and-Frisk, Chicago Police, and Lending Club. We also propose a statistical test to determine if a data set is missing key features used to train the black-box model. Our test finds that the ProPublica data is likely missing key feature(s) used in COMPAS. more »

Award ID(s):: 1712554

PAR ID:: 10298501

Author(s) / Creator(s):: Tan, Sarah; Caruana, Rich; Hooker, Giles; Lou, Yin

Date Published:: 2018-12-27

Journal Name:: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society

Page Range / eLocation ID:: 303 to 310

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3278721.3278725

More Like this