The introduction of machine learning (ML) components in software projects has created the need for software engineers to collaborate with data scientists and other specialists. While collaboration can always be challenging, ML introduces additional challenges with its exploratory model development process, additional skills and knowledge needed, difficulties testing ML systems, need for continuous evolution and monitoring, and non-traditional quality requirements such as fairness and explainability. Through interviews with 45 practitioners from 28 organizations, we identified key collaboration challenges that teams face when building and deploying ML systems into production. We report on common collaboration points in the development of production ML systems for requirements, data, and integration, as well as corresponding team patterns and challenges. We find that most of these challenges center around communication, documentation, engineering, and process, and collect recommendations to address these challenges.
more »
« less
Seldonian Toolkit: Building Software with Safe and Fair Machine Learning
Abstract—We present the Seldonian Toolkit, which enables
software engineers to integrate provably safe and fair machine
learning algorithms into their systems. Software systems that
use data and machine learning are routinely deployed in a
wide range of settings from medical applications, autonomous
vehicles, the criminal justice system, and hiring processes. These
systems, however, can produce unsafe and unfair behavior,
such as suggesting potentially fatal medical treatments, making
racist or sexist predictions, or facilitating radicalization and
polarization. To reduce these undesirable behaviors, software
engineers need the ability to easily integrate their machine-
learning-based systems with domain-specific safety and fairness
requirements defined by domain experts, such as doctors and
hiring managers. The Seldonian Toolkit provides special machine
learning algorithms that enable software engineers to incorporate
such expert-defined requirements of safety and fairness into their
systems, while provably guaranteeing those requirements will be
satisfied. A video demonstrating the Seldonian Toolkit is available
at https://youtu.be/wHR-hDm9jX4/.
more »
« less
- Award ID(s):
- 2018372
- PAR ID:
- 10462986
- Date Published:
- Journal Name:
- 2023 IEEE/ACM 45th International Conference on Software Engineering: Companion Proceedings
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The development of Artificial Intelligence (AI) systems involves a significant level of judgment and decision making on the part of engineers and designers to ensure the safety, robustness, and ethical design of such systems. However, the kinds of judgments that practitioners employ while developing AI platforms are rarely foregrounded or examined to explore areas practitioners might need ethical support. In this short paper, we employ the concept of design judgment to foreground and examine the kinds of sensemaking software engineers use to inform their decisionmaking while developing AI systems. Relying on data generated from two exploratory observation studies of student software engineers, we connect the concept of fairness to the foregrounded judgments to implicate their potential algorithmic fairness impacts. Our findings surface some ways in which the design judgment of software engineers could adversely impact the downstream goal of ensuring fairness in AI systems. We discuss the implications of these findings in fostering positive innovation and enhancing fairness in AI systems, drawing attention to the need to provide ethical guidance, support, or intervention to practitioners as they engage in situated and contextual judgments while developing AI systems.more » « less
-
null (Ed.)Machine learning models are increasingly being used in important decision-making software such as approving bank loans, recommending criminal sentencing, hiring employees, and so on. It is important to ensure the fairness of these models so that no discrimination is made based on protected attribute (e.g., race, sex, age) while decision making. Algorithms have been developed to measure unfairness and mitigate them to a certain extent. In this paper, we have focused on the empirical evaluation of fairness and mitigations on real-world machine learning models. We have created a benchmark of 40 top-rated models from Kaggle used for 5 different tasks, and then using a comprehensive set of fairness metrics, evaluated their fairness. Then, we have applied 7 mitigation techniques on these models and analyzed the fairness, mitigation results, and impacts on performance. We have found that some model optimization techniques result in inducing unfairness in the models. On the other hand, although there are some fairness control mechanisms in machine learning libraries, they are not documented. The mitigation algorithm also exhibit common patterns such as mitigation in the post-processing is often costly (in terms of performance) and mitigation in the pre-processing stage is preferred in most cases. We have also presented different trade-off choices of fairness mitigation decisions. Our study suggests future research directions to reduce the gap between theoretical fairness aware algorithms and the software engineering methods to leverage them in practice.more » « less
-
Machine learning models are increasingly being used in important decision-making software such as approving bank loans, recommending criminal sentencing, hiring employees, and so on. It is important to ensure the fairness of these models so that no discrimination is made between different groups in a protected attribute (e.g., race, sex, age) while decision making. Algorithms have been developed to measure unfairness and mitigate them to a certain extent. In this paper, we have focused on the empirical evaluation of fairness and mitigations on real-world machine learning models. We have created a benchmark of 40 top-rated models from Kaggle used for 5 different tasks, and then using a comprehensive set of fairness metrics evaluated their fairness. Then, we have applied 7 mitigation techniques on these models and analyzed the fairness, mitigation results, and impacts on performance. We have found that some model optimization techniques result in inducing unfairness in the models. On the other hand, although there are some fairness control mechanisms in machine learning libraries, they are not documented. The mitigation algorithm also exhibit common patterns such as mitigation in the post-processing is often costly (in terms of performance) and mitigation in the pre-processing stage is preferred in most cases. We have also presented different trade-off choices of fairness mitigation decisions. Our study suggests future research directions to reduce the gap between theoretical fairness aware algorithms and the software engineering methods to leverage them in practice.more » « less
-
Many organizations seek to ensure that machine learning (ML) and artificial intelligence (AI) systems work as intended in production but currently do not have a cohesive methodology in place to do so. To fill this gap, we propose MLTE (Machine Learning Test and Evaluation, colloquially referred to as "melt"), a framework and implementation to evaluate ML models and systems. The framework compiles state-of-the-art evaluation techniques into an organizational process for interdisciplinary teams, including model developers, software engineers, system owners, and other stakeholders. MLTE tooling supports this process by providing a domain-specific language that teams can use to express model requirements, an infrastructure to define, generate, and collect ML evaluation metrics, and the means to communicate results.more » « less