skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Aspirations and Practice of ML Model Documentation: Moving the Needle with Nudging and Traceability
The documentation practice for machine-learned (ML) models often falls short of established practices for traditional software, which impedes model accountability and inadvertently abets inappropriate or misuse of models. Recently, model cards, a proposal for model documentation, have attracted notable attention, but their impact on the actual practice is unclear. In this work, we systematically study the model documentation in the field and investigate how to encourage more responsible and accountable documentation practice. Our analysis of publicly available model cards reveals a substantial gap between the proposal and the practice. We then design a tool named DocML aiming to (1) nudge the data scientists to comply with the model cards proposal during the model development, especially the sections related to ethics, and (2) assess and manage the documentation quality. A lab study reveals the benefit of our tool towards long-term documentation quality and accountability.  more » « less
Award ID(s):
2131477
PAR ID:
10444831
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
Page Range / eLocation ID:
1 to 17
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Recently, there have been increasing calls for computer science curricula to complement existing technical training with topics related to Fairness, Accountability, Transparency and Ethics (FATE). In this paper, we present Value Cards, an educational toolkit to inform students and practitioners the social impacts of different machine learning models via deliberation. This paper presents an early use of our approach in a college-level computer science course. Through an in-class activity, we report empirical data for the initial effectiveness of our approach. Our results suggest that the use of the Value Cards toolkit can improve students' understanding of both the technical definitions and trade-offs of performance metrics and apply them in real-world contexts, help them recognize the significance of considering diverse social values in the development and deployment of algorithmic systems, and enable them to communicate, negotiate and synthesize the perspectives of diverse stakeholders. Our study also demonstrates a number of caveats we need to consider when using the different variants of the Value Cards toolkit. Finally, we discuss the challenges as well as future applications of our approach. 
    more » « less
  2. Measuring the level of institutional capacity for grantsmanship within higher education informs administrators about the needs of their organization and where resources and institutional supports can be implemented to support faculty and staff. Receiving grant funding can lead to implementing cutting-edge programming and research support, which could improve the quality of education provided and, ultimately, student retention. While conducting an institutional capacity needs assessment is crucial for making data-informed decisions, there is a significant gap in institutional capacity research; specifically, there is no valid and reliable assessment tool designed to measure institutional capacity for grantsmanship. The present study aims to develop an assessment tool for higher education institutions to evaluate support systems and identify the needs of their faculty and administrators for grant writing efforts. The current study used a mixed-method approach over three phases to understand the indicators behind measuring institutional capacity for grantsmanship. We developed six reliable scales—promoting grant proposal writing, proposal writing (for faculty), proposal writing (for administrators), proposal writing (all respondents), submitting grant proposals, implementing grant activities, and managing awards. This study contributes to our understanding of institutional capacity and produced a reliable assessment tool to support grantsmanship. 
    more » « less
  3. Though error analysis is crucial to understanding and improving NLP models, the common practice of manual, subjective categorization of a small sample of errors can yield biased and incomplete conclusions. This paper codifies model and task agnostic principles for informative error analysis, and presents Errudite, an interactive tool for better supporting this process. First, error groups should be precisely defined for reproducibility; Errudite supports this with an expressive domain-specific language. Second, to avoid spurious conclusions, a large set of instances should be analyzed, including both positive and negative examples; Errudite enables systematic grouping of relevant instances with filtering queries. Third, hypotheses about the cause of errors should be explicitly tested; Errudite supports this via automated counterfactual rewriting. We validate our approach with a user study, finding that Errudite (1) enables users to perform high quality and reproducible error analyses with less effort, (2) reveals substantial ambiguities in prior published error analyses practices, and (3) enhances the error analysis experience by allowing users to test and revise prior beliefs. 
    more » « less
  4. Abstract Qualitative nonsimulated models (causal loop diagrams, stock‐flow diagrams, or hybrids of both) have been used since within a decade after the inception of system dynamics (SD). In this article, we assert that the well‐known weaknesses of nonsimulated models need to be balanced against the contexts, purposes, and strengths that nonsimulated models provide. We propose a framework consisting of a set of best practices for model reporting and documentation that would improve the quality, consistency, and transparency of nonsimulated models. Several high‐quality examples are described and referenced in the framework to illustrate support of each criterion. The framework's purpose is help improve the transparency around the creation and evaluation of nonsimulated models, thereby enhancing their confidence and legitimate use in SD practice. Ultimately, high‐quality nonsimulated models can offer broader access to the powerful body of SD knowledge to audiences likely never to have access to formal SD simulation models. © 2023 System Dynamics Society. 
    more » « less
  5. Motivation Annotations of biochemical models provide details of chemical species, documentation of chemical reactions, and other essential information. Unfortunately, the vast majority of biochemical models have few, if any, annotations, or the annotations provide insufficient detail to understand the limitations of the model. The quality and quantity of annotations can be improved by developing tools that recommend annotations. For example, recommender tools have been developed for annotations of genes. Although annotating genes is conceptually similar to annotating biochemical models, there are important technical differences that make it difficult to directly apply this prior work. Results We present AMAS, a system that predicts annotations for elements of models represented in the Systems Biology Markup Language (SBML) community standard. We provide a general framework for predicting model annotations for a query element based on a database of annotated reference elements and a match score function that calculates the similarity between the query element and reference elements. The framework is instantiated to specific element types (e.g., species, reactions) by specifying the reference database (e.g., ChEBI for species) and the match score function (e.g., string similarity). We analyze the computational efficiency and prediction quality of AMAS for species and reactions in BiGG and BioModels and find that it has sub-second response times and accuracy between 80% and 95% depending on specifics of what is predicted. We have incorporated AMAS into an open-source, pip-installable Python package that can run as a command-line tool that predicts and adds annotations to species and reactions to an SBML model. Availability Our project is hosted at https://github.com/sys-bio/AMAS, where we provide examples, documentation, and source code files. Our source code is licensed under the MIT open-source license. 
    more » « less