skip to main content

This content will become publicly available on April 19, 2024

Title: Aspirations and Practice of ML Model Documentation: Moving the Needle with Nudging and Traceability
The documentation practice for machine-learned (ML) models often falls short of established practices for traditional software, which impedes model accountability and inadvertently abets inappropriate or misuse of models. Recently, model cards, a proposal for model documentation, have attracted notable attention, but their impact on the actual practice is unclear. In this work, we systematically study the model documentation in the field and investigate how to encourage more responsible and accountable documentation practice. Our analysis of publicly available model cards reveals a substantial gap between the proposal and the practice. We then design a tool named DocML aiming to (1) nudge the data scientists to comply with the model cards proposal during the model development, especially the sections related to ethics, and (2) assess and manage the documentation quality. A lab study reveals the benefit of our tool towards long-term documentation quality and accountability.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
Page Range / eLocation ID:
1 to 17
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Recently, there have been increasing calls for computer science curricula to complement existing technical training with topics related to Fairness, Accountability, Transparency and Ethics (FATE). In this paper, we present Value Cards, an educational toolkit to inform students and practitioners the social impacts of different machine learning models via deliberation. This paper presents an early use of our approach in a college-level computer science course. Through an in-class activity, we report empirical data for the initial effectiveness of our approach. Our results suggest that the use of the Value Cards toolkit can improve students' understanding of both the technical definitions and trade-offs of performance metrics and apply them in real-world contexts, help them recognize the significance of considering diverse social values in the development and deployment of algorithmic systems, and enable them to communicate, negotiate and synthesize the perspectives of diverse stakeholders. Our study also demonstrates a number of caveats we need to consider when using the different variants of the Value Cards toolkit. Finally, we discuss the challenges as well as future applications of our approach. 
    more » « less
  2. Though error analysis is crucial to understanding and improving NLP models, the common practice of manual, subjective categorization of a small sample of errors can yield biased and incomplete conclusions. This paper codifies model and task agnostic principles for informative error analysis, and presents Errudite, an interactive tool for better supporting this process. First, error groups should be precisely defined for reproducibility; Errudite supports this with an expressive domain-specific language. Second, to avoid spurious conclusions, a large set of instances should be analyzed, including both positive and negative examples; Errudite enables systematic grouping of relevant instances with filtering queries. Third, hypotheses about the cause of errors should be explicitly tested; Errudite supports this via automated counterfactual rewriting. We validate our approach with a user study, finding that Errudite (1) enables users to perform high quality and reproducible error analyses with less effort, (2) reveals substantial ambiguities in prior published error analyses practices, and (3) enhances the error analysis experience by allowing users to test and revise prior beliefs. 
    more » « less
  3. Motivation Annotations of biochemical models provide details of chemical species, documentation of chemical reactions, and other essential information. Unfortunately, the vast majority of biochemical models have few, if any, annotations, or the annotations provide insufficient detail to understand the limitations of the model. The quality and quantity of annotations can be improved by developing tools that recommend annotations. For example, recommender tools have been developed for annotations of genes. Although annotating genes is conceptually similar to annotating biochemical models, there are important technical differences that make it difficult to directly apply this prior work. Results We present AMAS, a system that predicts annotations for elements of models represented in the Systems Biology Markup Language (SBML) community standard. We provide a general framework for predicting model annotations for a query element based on a database of annotated reference elements and a match score function that calculates the similarity between the query element and reference elements. The framework is instantiated to specific element types (e.g., species, reactions) by specifying the reference database (e.g., ChEBI for species) and the match score function (e.g., string similarity). We analyze the computational efficiency and prediction quality of AMAS for species and reactions in BiGG and BioModels and find that it has sub-second response times and accuracy between 80% and 95% depending on specifics of what is predicted. We have incorporated AMAS into an open-source, pip-installable Python package that can run as a command-line tool that predicts and adds annotations to species and reactions to an SBML model. Availability Our project is hosted at, where we provide examples, documentation, and source code files. Our source code is licensed under the MIT open-source license. 
    more » « less
  4. Abstract

    The role of computation in science is ever‐expanding and is enabling scientists to investigate complex phenomena in more powerful ways and tackle previously intractable problems. The growing role of computation has prompted calls to integrate computational thinking (CT) into science instruction in order to more authentically mirror contemporary science practice and to support inclusive engagement in science pathways. In this multimethods study, we present evidence for the Computational Thinking for Science (CT+S) instructional model designed to support broader participation in science, technology, engineering, and mathematics (STEM) pathways by (1) providing opportunities for students to learn CT within the regular school day, in core science classrooms; and (2) by reframing coding as a tool for developing solutions to compelling real‐world problems. We present core pedagogical strategies employed in the CT+S instructional model and describe its implementation into two 10‐lesson instructional units for middle‐school science classrooms. In the first unit, students create computational models of a coral reef ecosystem. In the second unit, students write code to create, analyze, and interpret data visualizations using a large air quality dataset from the United States Environmental Protection Agency to understand, communicate, and evaluate solutions for air quality concerns. In our investigation of the model's implementation through these two units, we found that participating students demonstrated statistically significant advancements in CT, competency beliefs for computation in STEM, and value assigned to computation in STEM. We also examine evidence for how the CT+S model's core pedagogical strategies may be contributing to observed outcomes. We discuss the implications of these findings and propose a testable theory of action for the model that can serve future researchers, evaluators, educators, and instructional designers.

    more » « less
  5. null (Ed.)
    This is a protocol for generating images to be used in 3D model building via Agisoft Metashape for coral photogrametry. This will cover underwater, field-based methods and tips to collect photographs and preprocessing of photos to improve model building. Image capture is the most important part of 3D photogrammetry because the photos taken at this point will be all that you'll have to build models and collect data. As such, you want to ensure you have enough photos to work with in the future so, in general, more is better. That being said, too many blurry or out of focus pictures will hamper model building. You can optimize your time in the field by taking enough photos from the appropriate angles, however efficiency will come with practice. This is the protocol developed and used by the Kenkel lab to phenotype Acropora cervicornis colonies as part of field operations in the Florida Keys. We incorporate Agisoft Metashape markers in this workflow to scale models and improved model building. The scaling objects used by the Kenkel lab are custom-made, adjustable PVC arrays that include unique markers and bleaching color cards, affectionately called the "Tomahawk". Specs for building a Tomahawk are included in this protocol. Filtering and pre-processing of photos is not always necessary but can be used to salvage 3D models that would be otherwise blurry or incomplete. Here, we describe photo editing in Adobe Lightroom to adjust several characteristics of hundreds of images simultaneously. For a walkthrough and scripts to run Agisoft Metashape on the command line, see For directions to phenotype coral from 3D models see our Phenotyping in MeshLab protocol. These protocols, while created for branching coral, can be applied to 3D models of any coral morphology or any object really. Our goal is to make easy-to-use protocols using accessible softwares in the hopes of creating a standardized method for 3D photogrammetry in coral biology. DOI 
    more » « less