MLINSPECT: A Data Distribution Debugger for Machine Learning Pipelines

Grafberger, Stefan; Guha, Shubha; Stoyanovich, Julia; Schelter, Sebastian

doi:10.1145/3448016.3452759

Citation Details

MLINSPECT: A Data Distribution Debugger for Machine Learning Pipelines

Machine Learning (ML) is increasingly used to automate impactful decisions, and the risks arising from this wide-spread use are garnering attention from policymakers, scientists, and the media. ML applications are often very brittle with respect to their input data, which leads to concerns about their reliability, accountability, and fairness. While bias detection cannot be fully automated, computational tools can help pinpoint particular types of data issues. We recently proposed mlinspect, a library that enables lightweight lineage-based inspection of ML preprocessing pipelines. In this demonstration, we show how mlinspect can be used to detect data distribution bugs in a representative pipeline. In contrast to existing work, mlinspect operates on declarative abstractions of popular data science libraries like estimator/transformer pipelines, can handle both relational and matrix data, and does not require manual code instrumentation. The library is publicly available at https://github.com/stefan-grafberger/mlinspect. more »

Award ID(s):: 1926250 1934464 1922658

PAR ID:: 10287318

Author(s) / Creator(s):: Grafberger, Stefan; Guha, Shubha; Stoyanovich, Julia; Schelter, Sebastian

Date Published:: 2021-01-01

Journal Name:: ACM SIGMOD: International Conference on Management of Data

Page Range / eLocation ID:: 2736 to 2739

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3448016.3452759

More Like this