skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: DASS Good: Explainable Data Mining of Spatial Cohort Data
Abstract Developing applicable clinical machine learning models is a difficult task when the data includes spatial information, for example, radiation dose distributions across adjacent organs at risk. We describe the co‐design of a modeling system, DASS, to support the hybrid human‐machine development and validation of predictive models for estimating long‐term toxicities related to radiotherapy doses in head and neck cancer patients. Developed in collaboration with domain experts in oncology and data mining, DASS incorporates human‐in‐the‐loop visual steering, spatial data, and explainable AI to augment domain knowledge with automatic data mining. We demonstrate DASS with the development of two practical clinical stratification models and report feedback from domain experts. Finally, we describe the design lessons learned from this collaborative experience.  more » « less
Award ID(s):
1854815 2320261
PAR ID:
10426643
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Computer Graphics Forum
Volume:
42
Issue:
3
ISSN:
0167-7055
Page Range / eLocation ID:
p. 283-295
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Background: Trust is a critical driver of technology usage behaviors and is essential for technology adoption. Thus, nurses’ participation in software development is critical for influencing their involvement, competency, and overall perceptions of software quality. Purpose: To engage nurses as subject matter experts to develop a machine learning (ML) Pain Recognition Automated Monitoring System. Method: Using the Human-centered Design for Embedded Machine Learning Solutions (HCDe-MLS) model, nurses informed the development of an intuitive data labeling software solution, Human-to-Artificial Intelligence (H2AI). Findings: H2AI facilitated efficient data labeling, stored labeled data to train ML models, and tracked inter-rater reliability. OpenCV provided efficient video-to-image data pre-processing for data labeling. MobileFaceNet demonstrated superior results for default landmark placement on neonatal video images. Discussion: Nurses’ engagement in clinical decision support software development is critical for ensuring the end-product addresses nurses’ priorities, reflects nurses’ actual cognitive and decision-making processes, and garners nurses’ trust and technology adoption. 
    more » « less
  2. Abstract Clinical, biomedical, and translational science has reached an inflection point in the breadth and diversity of available data and the potential impact of such data to improve human health and well‐being. However, the data are often siloed, disorganized, and not broadly accessible due to discipline‐specific differences in terminology and representation. To address these challenges, the Biomedical Data Translator Consortium has developed and tested a pilot knowledge graph‐based “Translator” system capable of integrating existing biomedical data sets and “translating” those data into insights intended to augment human reasoning and accelerate translational science. Having demonstrated feasibility of the Translator system, the Translator program has since moved into development, and the Translator Consortium has made significant progress in the research, design, and implementation of an operational system. Herein, we describe the current system’s architecture, performance, and quality of results. We apply Translator to several real‐world use cases developed in collaboration with subject‐matter experts. Finally, we discuss the scientific and technical features of Translator and compare those features to other state‐of‐the‐art, biomedical graph‐based question‐answering systems. 
    more » « less
  3. Digital pathology is a relatively new field that stands to gain from modern big data and machine learning techniques. In the United States alone, millions of pathology slides are created and interpreted by a human expert each year, suggesting that there is ample data available to support machine learning research. However, the relevant corpora that currently exist contain only hundreds of images, not enough to develop sophisticated deep learning models. This lack of publicly accessible data also hinders the advancement of clinical science. Our digital pathology corpus is an effort to place a large amount of clinical pathology images collected at Temple University Hospital into the public domain to support the development of automatic interpretation technology. The goal of this ambitious project is to create a corpus of 1M images. We have already released 10,000 images from 600 clinical cases. In this paper, we describe the corpus under development and discuss some of the underlying technology that was developed to support this project. 
    more » « less
  4. Democratizing Data Science requires a fundamental rethinking of the way data analytics and model discovery is done. Available tools for analyzing massive data sets and curating machine learning models are limited in a number of fundamental ways. First, existing tools require well-trained data scientists to select the appropriate techniques to build models and to evaluate their outcomes. Second, existing tools require heavy data preparation steps and are often too slow to give interactive feedback to domain experts in the model building process, severely limiting the possible interactions. Third, current tools do not provide adequate analysis of statistical risk factors in the model development. In this work, we present the first iteration of QuIC-M (pronounced quick-m), an interactive human-in-the-loop data exploration and model building suite. The goal is to enable domain experts to build the machine learning pipelines an order of magnitude faster than machine learning experts while having model qualities comparable to expert solutions. 
    more » « less
  5. Passively collected big data sources are increasingly used to inform critical development policy decisions in low- and middle-income countries. While prior work highlights how such approaches may reveal sensitive information, enable surveillance, and centralize power, less is known about the corresponding privacy concerns, hopes, and fears of the people directly impacted by these policies --- people sometimes referred to asexperiential experts.To understand the perspectives of experiential experts, we conducted semi-structured interviews with people living in rural villages in Togo shortly after an entirely digital cash transfer program was launched that used machine learning and mobile phone metadata to determine program eligibility. This paper documents participants' privacy concerns surrounding the introduction of big data approaches in development policy. We find that the privacy concerns of our experiential experts differ from those raised by privacy and developmentdomain experts.To facilitate a more robust and constructive account of privacy, we discuss implications for policies and designs that take seriously the privacy concerns raised by both experiential experts and domain experts. 
    more » « less