skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Checklist Design Reconsidered: Understanding Checklist Compliance and Timing of Interactions
We examine the association between user interactions with a checklist and task performance in a time-critical medical setting. By comparing 98 logs from a digital checklist for trauma resuscitation with activity logs generated by video review, we identified three non-compliant checklist use behaviors: failure to check items for completed tasks, falsely checking items when tasks were not performed, and inaccurately checking items for incomplete tasks. Using video review, we found that user perceptions of task completion were often misaligned with clinical practices that guided activity coding, thereby contributing to non-compliant check-offs. Our analysis of associations between different contexts and the timing of check-offs showed longer delays when (1) checklist users were absent during patient arrival, (2) patients had penetrating injuries, and (3) resuscitations were assigned to the highest acuity. We discuss opportunities for reconsidering checklist designs to reduce non-compliant checklist use.  more » « less
Award ID(s):
1763509
PAR ID:
10185607
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
PubliProceedings of the 2020 CHI Conference on Human Factors in Computing Systems
Page Range / eLocation ID:
1 to 13
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on specific behaviors. Inspired by principles of behavioral testing in software engineering, we introduce CheckList, a task-agnostic methodology for testing NLP models. CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation, as well as a software tool to generate a large and diverse number of test cases quickly. We illustrate the utility of CheckList with tests for three tasks, identifying critical failures in both commercial and state-of-art models. In a user study, a team responsible for a commercial sentiment analysis model found new and actionable bugs in an extensively tested model. In another user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it. 
    more » « less
  2. During the COVID-19 pandemic, the World Health Organization provided a checklist to help people distinguish between accurate and misinformation. In controlled experiments in the United States and Germany, we investigated the utility of this ordered checklist and designed an interactive version to lower the cost of acting on checklist items. Across interventions, we observe non-trivial differences in participants’ performance in distinguishing accurate and misinformation between the two countries and discuss some possible reasons that may predict the future helpfulness of the checklist in different environments. The checklist item that provides source labels was most frequently followed and was considered most helpful. Based on our empirical findings, we recommend practitioners focus on providing source labels rather than interventions that support readers performing their own fact-checks, even though this recommendation may be influenced by the WHO’s chosen order. We discuss the complexity of providing such source labels and provide design recommendations. 
    more » « less
  3. Software testing is an essential skill for computer science students. Prior work reports that students desire support in determining what code to test and which scenarios should be tested. In response to this, we present a lightweight testing checklist that contains both tutorial information and testing strategies to guide students in what and how to test. To assess the impact of the testing checklist, we conducted an experimental, controlled A/B study with 32 undergraduate and graduate students. The study task was writing a test suite for an existing program. Students were given either the testing checklist (the experimental group) or a tutorial on a standard coverage tool with which they were already familiar (the control group). By analyzing the combination of student-written tests and survey responses, we found students with the checklist performed as well as or better than the coverage tool group, suggesting a potential positive impact of the checklist (or at minimum, a non-negative impact). This is particularly noteworthy given the control condition of the coverage tool is the state of the practice. These findings suggest that the testing tool support does not need to be sophisticated to be effective. 
    more » « less
  4. Generating regional checklists for insects is frequently based on combining data sources ranging from literature and expert assertions that merely imply the existence of an occurrence to aggregated, standard-compliant data of uniquely identified specimens. The increasing diversity of data sources also means that checklist authors are faced with new responsibilities, effectively acting as filterers to select and utilize an expert-validated subset of all available data. Authors are also faced with the technical obstacle to bring more occurrences into Darwin Core-based data aggregation, even if the corresponding specimens belong to external institutions. We illustrate these issues based on a partial update of the Kimsey et al. 2017 checklist of darkling beetles - Tenebrionidae sec. Bousquet et al. 2018 - inhabiting the Algodones Dunes of California. Our update entails 54 species-level concepts for this group and region, of which 31 concepts were found to be represented in three specimen-data aggregator portals, based on our interpretations of the aggregators' data. We reassess the distributions and biogeographic affinities of these species, focusing on taxa that are precinctive (highly geographically restricted) to the Lower Colorado River Valley in the context of recent dune formation from the Colorado River. Throughout, we apply taxonomic concept labels (taxonomic name according to source) to contextualize preferred name usages, but also show that the identification data of aggregated occurrences are very rarely well-contextualized or annotated. Doing so is a pre-requisite for publishing open, dynamic checklist versions that finely accredit incremental expert efforts spent to improve the quality of checklists and aggregated occurrence data. 
    more » « less
  5. Objective Online surveys are a common method of data collection. The use of “attention-check” questions are an effective method of identifying careless responding in surveys (Liu & Wronski, 2018; Meade & Craig 2012; Ward & Meade, 2023), which occurs in 10-12% of undergraduate samples (Meade & Craig, 2012). Instructed response type attention checks are straightforward and the most recommended (Meade & Craig, 2012; Ward & Meade, 2023). This study evaluated the effect of instructed response attention check questions on the measurement of math ability and non-cognitive factors commonly related to math (self-efficacy and math anxiety). We evaluated both level differences as well as whether check questions alter the relationship of non-cognitive factors to math. We expected that incorrect responding to check questions would lower math performance but were unable to make hypotheses about level of self-report non-cognitive factors. We predicted that incorrect responding to check questions would moderate the relationship between both math anxiety and self-efficacy to math performance. Participants and Methods Participants were 424 undergraduates (age 20.4, SD=2.7) at a large southwestern university. The sample was majority female (74%) but diverse socioeconomically and in race/ethnicity. The non-cognitive measures were researcher developed Math Anxiety (MA) and Math Self-Efficacy (MSE; Betz & Hackett, 1993) scales, with items selected directly targeting the use/manipulation of math in everyday life; both showed good reliability (α=.95). The two math scales were also researcher developed; one was a pure symbolic computational measure (EM-A) and the other consisted of word problems in an everyday context (EM-B). These measures had good reliability (α=.80 and α=.73). The four check questions were embedded in the surveys and two groupings were formed – one consisting of those who provided the correct answer for all items versus those who did not, and a second consisting of those who got all correct or only one answer incorrect versus those with more items incorrect. Correlational, ANOVA, and ANCOVA models were utilized. Results Descriptively, check questions were skewed – 75% participants answered all check questions correctly, and 8% missed only one. Relations of both MA and MSE with EM-A and EM-B were modest though significant (|r|=.22 to .37) and in the expected directions (all p<.001). Check questions were related to level of all tasks (p<.001), with incorrect responses resulting in lower math performance, lower MSE, and higher MA. Check questions did not moderate the relation of MA or MSE to either math performance, with some suggestion that MA was more strongly related to EM-B in those who missed check questions, though only when failing several. Conclusions Check questions showed a clear relation to both self-report and math performance measures. However, check questions did not alter the relation of MA or MSE to math performance in general. These results affirm extant relations of key self-perceptions to math using novel measures and highlight the need to evaluate the validity of self-report measures, even outside of objective performance indicators. Future work could examine the effect of attention checks in domains other than math and investigate other types of attention checks. 
    more » « less