skip to main content


Search for: All records

Creators/Authors contains: "Harrison, Galen"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Current algorithmic fairness tools focus on auditing completed models, neglecting the potential downstream impacts of iterative decisions about cleaning data and training machine learning models. In response, we developed Retrograde, a JupyterLab environment extension for Python that generates real-time, contextual notifications for data scientists about decisions they are making regarding protected classes, proxy variables, missing data, and demographic differences in model performance. Our novel framework uses automated code analysis to trace data provenance in JupyterLab, enabling these notifications. In a between-subjects online experiment, 51 data scientists constructed loan-decision models with Retrograde providing notifications continuously throughout the process, only at the end, or never. Retrograde’s notifications successfully nudged participants to account for missing data, avoid using protected classes as predictors, minimize demographic differences in model performance, and exhibit healthy skepticism about their models. 
    more » « less
    Free, publicly-accessible full text available May 11, 2025
  2. Free, publicly-accessible full text available August 4, 2024
  3. null (Ed.)
    Prior work suggests that users conceptualize the organization of personal collections of digital files through the lens of similarity. However, it is unclear to what degree similar files are actually located near one another (e.g., in the same directory) in actual file collections, or whether leveraging file similarity can improve information retrieval and organization for disorganized collections of files. To this end, we conducted an online study combining automated analysis of 50 Google Drive and Dropbox users' cloud accounts with a survey asking about pairs of files from those accounts. We found that many files located in different parts of file hierarchies were similar in how they were perceived by participants, as well as in their algorithmically extractable features. Participants often wished to co-manage similar files (e.g., deleting one file implied deleting the other file) even if they were far apart in the file hierarchy. To further understand this relationship, we built regression models, finding several algorithmically extractable file features to be predictive of human perceptions of file similarity and desired file co-management. Our findings pave the way for leveraging file similarity to automatically recommend access, move, or delete operations based on users' prior interactions with similar files. 
    more » « less
  4. null (Ed.)
    There are many competing definitions of what statistical properties make a machine learning model fair. Unfortunately, research has shown that some key properties are mutually exclusive. Realistic models are thus necessarily imperfect, choosing one side of a trade-off or the other. To gauge perceptions of the fairness of such realistic, imperfect models, we conducted a between-subjects experiment with 502 Mechanical Turk workers. Each participant compared two models for deciding whether to grant bail to criminal defendants. The first model equalized one potentially desirable model property, with the other property varying across racial groups. The second model did the opposite. We tested pairwise trade-offs between the following four properties: accuracy; false positive rate; outcomes; and the consideration of race. We also varied which racial group the model disadvantaged. We observed a preference among participants for equalizing the false positive rate between groups over equalizing accuracy. Nonetheless, no preferences were overwhelming, and both sides of each trade-off we tested were strongly preferred by a non-trivial fraction of participants. We observed nuanced distinctions between participants considering a model "unbiased" and considering it "fair." Furthermore, even when a model within a trade-off pair was seen as fair and unbiased by a majority of participants, we did not observe consensus that a machine learning model was preferable to a human judge. Our results highlight challenges for building machine learning models that are perceived as fair and broadly acceptable in realistic situations. 
    more » « less
  5. In Spring 2021, the highly transmissible SARS-CoV-2 Delta variant began to cause increases in cases, hospitalizations, and deaths in parts of the United States. At the time, with slowed vaccination uptake, this novel variant was expected to increase the risk of pandemic resurgence in the US in summer and fall 2021. As part of the COVID-19 Scenario Modeling Hub, an ensemble of nine mechanistic models produced 6-month scenario projections for July–December 2021 for the United States. These projections estimated substantial resurgences of COVID-19 across the US resulting from the more transmissible Delta variant, projected to occur across most of the US, coinciding with school and business reopening. The scenarios revealed that reaching higher vaccine coverage in July–December 2021 reduced the size and duration of the projected resurgence substantially, with the expected impacts was largely concentrated in a subset of states with lower vaccination coverage. Despite accurate projection of COVID-19 surges occurring and timing, the magnitude was substantially underestimated 2021 by the models compared with the of the reported cases, hospitalizations, and deaths occurring during July–December, highlighting the continued challenges to predict the evolving COVID-19 pandemic. Vaccination uptake remains critical to limiting transmission and disease, particularly in states with lower vaccination coverage. Higher vaccination goals at the onset of the surge of the new variant were estimated to avert over 1.5 million cases and 21,000 deaths, although may have had even greater impacts, considering the underestimated resurgence magnitude from the model. 
    more » « less