NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

JupyterLab in Retrograde: Contextual Notifications That Highlight Fairness and Bias Issues for Data Scientists

Harrison, Galen; Bryson, Kevin; Bamba, Ahmad Emmanuel; Dovichi, Luca; Binion, Aleksander Herrmann; Borem, Arthur; Ur, Blase (May 2024, Proceedings of the CHI Conference on Human Factors in Computing Systems)

Current algorithmic fairness tools focus on auditing completed models, neglecting the potential downstream impacts of iterative decisions about cleaning data and training machine learning models. In response, we developed Retrograde, a JupyterLab environment extension for Python that generates real-time, contextual notifications for data scientists about decisions they are making regarding protected classes, proxy variables, missing data, and demographic differences in model performance. Our novel framework uses automated code analysis to trace data provenance in JupyterLab, enabling these notifications. In a between-subjects online experiment, 51 data scientists constructed loan-decision models with Retrograde providing notifications continuously throughout the process, only at the end, or never. Retrograde’s notifications successfully nudged participants to account for missing data, avoid using protected classes as predictors, minimize demographic differences in model performance, and exhibit healthy skepticism about their models.
more » « less
Full Text Available
Identifying Complicated Contagion Scenarios from Cascade Data

https://doi.org/10.1145/3580305.3599841

Harrison, Galen; Alabsi Aljundi, Amro; Chen, Jiangzhuo; Ravi, S.S.; Vullikanti, Anil Kumar; Marathe, Madhav V.; Adiga, Abhijin (August 2023, ACM)

Full Text Available
Synthetic Information and Digital Twins for Pandemic Science: Challenges and Opportunities

https://doi.org/10.1109/TPS-ISA58951.2023.00013

Harrison, Galen; Porebski, Przemyslaw; Chen, Jiangzhuo; Wilson, Mandy; Mortveit, Henning; Bhattacharya, Parantapa; Xie, Dawen; Hoops, Stefan; Vullikanti, Anil; Xiong, Li; et al (November 2023, IEEE)

Full Text Available
Files of a Feather Flock Together? Measuring and Modeling How Users Perceive File Similarity in Cloud Storage

https://doi.org/10.1145/3404835.3462845

Brackenbury, Will; Harrison, Galen; Chard, Kyle; Elmore, Aaron; Ur, Blase (January 2021, Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21))
null (Ed.)
Prior work suggests that users conceptualize the organization of personal collections of digital files through the lens of similarity. However, it is unclear to what degree similar files are actually located near one another (e.g., in the same directory) in actual file collections, or whether leveraging file similarity can improve information retrieval and organization for disorganized collections of files. To this end, we conducted an online study combining automated analysis of 50 Google Drive and Dropbox users' cloud accounts with a survey asking about pairs of files from those accounts. We found that many files located in different parts of file hierarchies were similar in how they were perceived by participants, as well as in their algorithmically extractable features. Participants often wished to co-manage similar files (e.g., deleting one file implied deleting the other file) even if they were far apart in the file hierarchy. To further understand this relationship, we built regression models, finding several algorithmically extractable file features to be predictive of human perceptions of file similarity and desired file co-management. Our findings pave the way for leveraging file similarity to automatically recommend access, move, or delete operations based on users' prior interactions with similar files.
more » « less
Full Text Available
An empirical study on the perceived fairness of realistic, imperfect machine learning models

https://doi.org/10.1145/3351095.3372831

Harrison, Galen; Hanson, Julia; Jacinto, Christine; Ramirez, Julio; Ur, Blase (January 2020, Conference on Fairness, Accountability, and Transparency (FAT* ’20))
null (Ed.)
There are many competing definitions of what statistical properties make a machine learning model fair. Unfortunately, research has shown that some key properties are mutually exclusive. Realistic models are thus necessarily imperfect, choosing one side of a trade-off or the other. To gauge perceptions of the fairness of such realistic, imperfect models, we conducted a between-subjects experiment with 502 Mechanical Turk workers. Each participant compared two models for deciding whether to grant bail to criminal defendants. The first model equalized one potentially desirable model property, with the other property varying across racial groups. The second model did the opposite. We tested pairwise trade-offs between the following four properties: accuracy; false positive rate; outcomes; and the consideration of race. We also varied which racial group the model disadvantaged. We observed a preference among participants for equalizing the false positive rate between groups over equalizing accuracy. Nonetheless, no preferences were overwhelming, and both sides of each trade-off we tested were strongly preferred by a non-trivial fraction of participants. We observed nuanced distinctions between participants considering a model "unbiased" and considering it "fair." Furthermore, even when a model within a trade-off pair was seen as fair and unbiased by a majority of participants, we did not observe consensus that a machine learning model was preferable to a human judge. Our results highlight challenges for building machine learning models that are perceived as fair and broadly acceptable in realistic situations.
more » « less
Full Text Available
Projected resurgence of COVID-19 in the United States in July—December 2021 resulting from the increased transmissibility of the Delta variant and faltering vaccination

https://doi.org/10.7554/eLife.73584

Truelove, Shaun; Smith, Claire P; Qin, Michelle; Mullany, Luke C; Borchering, Rebecca K; Lessler, Justin; Shea, Katriona; Howerton, Emily; Contamin, Lucie; Levander, John; et al (June 2022, eLife)

In Spring 2021, the highly transmissible SARS-CoV-2 Delta variant began to cause increases in cases, hospitalizations, and deaths in parts of the United States. At the time, with slowed vaccination uptake, this novel variant was expected to increase the risk of pandemic resurgence in the US in summer and fall 2021. As part of the COVID-19 Scenario Modeling Hub, an ensemble of nine mechanistic models produced 6-month scenario projections for July–December 2021 for the United States. These projections estimated substantial resurgences of COVID-19 across the US resulting from the more transmissible Delta variant, projected to occur across most of the US, coinciding with school and business reopening. The scenarios revealed that reaching higher vaccine coverage in July–December 2021 reduced the size and duration of the projected resurgence substantially, with the expected impacts was largely concentrated in a subset of states with lower vaccination coverage. Despite accurate projection of COVID-19 surges occurring and timing, the magnitude was substantially underestimated 2021 by the models compared with the of the reported cases, hospitalizations, and deaths occurring during July–December, highlighting the continued challenges to predict the evolving COVID-19 pandemic. Vaccination uptake remains critical to limiting transmission and disease, particularly in states with lower vaccination coverage. Higher vaccination goals at the onset of the surge of the new variant were estimated to avert over 1.5 million cases and 21,000 deaths, although may have had even greater impacts, considering the underestimated resurgence magnitude from the model.
more » « less
Full Text Available

Search for: All records