skip to main content


Search for: All records

Award ID contains: 2040898

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Social distancing remains an effective nonpharmaceutical behavioral interventions to limit the spread of COVID-19 and other airborne diseases, but monitoring and enforcement create nontrivial challenges. Several jurisdictions have turned to “311” resident complaint platforms to engage the public in reporting social distancing non-compliance, but differences in sensitivity to social distancing behaviors can lead to a mis-allocation of resources and increased health risks for vulnerable communities. Using hourly visit data to designated establishments and more than 71,000 social distancing complaints in New York City during the first wave of the pandemic, we develop a method, derived from the Weber-Fechner law, to quantify neighborhood sensitivity and assess how tolerance to social distancing infractions and complaint reporting behaviors vary with neighborhood characteristics. We find that sensitivity to non-compliance is lower in minority and low-income neighborhoods, as well as in lower density areas, resulting in fewer reported complaints than expected given measured levels of overcrowding.

     
    more » « less
  2. Weinberger, Kilian (Ed.)
    The field of fair machine learning aims to ensure that decisions guided by algorithms are equitable. Over the last decade, several formal, mathematical definitions of fairness have gained prominence. Here we first assemble and categorize these definitions into two broad families: (1) those that constrain the effects of decisions on disparities; and (2) those that constrain the effects of legally protected characteristics, like race and gender, on decisions. We then show, analytically and empirically, that both families of definitions typically result in strongly Pareto dominated decision policies. For example, in the case of college admissions, adhering to popular formal conceptions of fairness would simultaneously result in lower student-body diversity and a less academically prepared class, relative to what one could achieve by explicitly tailoring admissions policies to achieve desired outcomes. In this sense, requiring that these fairness definitions hold can, perversely, harm the very groups they were designed to protect. In contrast to axiomatic notions of fairness, we argue that the equitable design of algorithms requires grappling with their context-specific consequences, akin to the equitable design of policy. We conclude by listing several open challenges in fair machine learning and offering strategies to ensure algorithms are better aligned with policy goals. 
    more » « less
  3. With an increased focus on incorporating fairness in machine learning models, it becomes imperative not only to assess and mitigate bias at each stage of the machine learning pipeline but also to understand the downstream impacts of bias across stages. Here we consider a general, but realistic, scenario in which a predictive model is learned from (potentially biased) training data, and model predictions are assessed post-hoc for fairness by some auditing method. We provide a theoretical analysis of how a specific form of data bias, differential sampling bias, propagates from the data stage to the prediction stage. Unlike prior work, we evaluate the downstream impacts of data biases quantitatively rather than qualitatively and prove theoretical guarantees for detection. Under reasonable assumptions, we quantify how the amount of bias in the model predictions varies as a function of the amount of differential sampling bias in the data, and at what point this bias becomes provably detectable by the auditor. Through experiments on two criminal justice datasets– the well-known COMPAS dataset and historical data from NYPD’s stop and frisk policy– we demonstrate that the theoretical results hold in practice even when our assumptions are relaxed. 
    more » « less
  4. Field studies in many domains have found evidence of decision fatigue, a phenomenon describing how decision quality can be impaired by the act of making previous decisions. Debate remains, however, over posited psychological mechanisms underlying decision fatigue, and the size of effects in high-stakes settings. We examine an extensive set of pretrial arraignments in a large, urban court system to investigate how judicial release and bail decisions are influenced by the time an arraignment occurs. We find that release rates decline modestly in the hours before lunch and before dinner, and these declines persist after statistically adjusting for an extensive set of observed covariates. However, we find no evidence that arraignment time affects pretrial release rates in the remainder of each decision-making session. Moreover, we find that release rates remain unchanged after a meal break even though judges have the opportunity to replenish their mental and physical resources by resting and eating. In a complementary analysis, we find that the rate at which judges concur with prosecutorial bail requests does not appear to be influenced by either arraignment time or a meal break. Taken together, our results imply that to the extent that decision fatigue plays a role in pretrial release judgments, effects are small and inconsistent with previous explanations implicating psychological depletion processes. 
    more » « less
  5. We generalize the spatial and subset scan statistics from the single to the multiple subset case. The two main approaches to defining the log-likelihood ratio statistic in the single subset case—the population-based and expectation-based scan statistics—are considered, leading to risk partitioning and multiple cluster detection scan statistics, respectively. We show that, for distributions in a separable exponential family, the risk partitioning scan statistic can be expressed as a scaled f-divergence of the normalized count and baseline vectors, and the multiple cluster detection scan statistic as a sum of scaled Bregman divergences. In either case, however, maximization of the scan statistic by exhaustive search over all partitionings of the data requires exponential time. To make this optimization computationally feasible, we prove sufficient conditions under which the optimal partitioning is guaranteed to be consecutive. This Consecutive Partitions Property generalizes the linear-time subset scanning property from two partitions (the detected subset and the remaining data elements) to the multiple partition case. While the number of consecutive partitionings of n elements into t partitions scales as O(n^(t−1)), making it computationally expensive for large t, we present a dynamic programming approach which identifies the optimal consecutive partitioning in O(n^2 t) time, thus allowing for the exact and efficient solution of large-scale risk partitioning and multiple cluster detection problems. Finally, we demonstrate the detection performance and practical utility of partition scan statistics using simulated and real-world data. Supplementary materials for this article are available online. 
    more » « less
  6. Chaudhuri, Kamalika ; Jegelka, Stefanie ; Song, Le ; Szepesvari, Csaba ; Niu, Gang ; Sabato, Sivan (Ed.)
    Recent work highlights the role of causality in designing equitable decision-making algorithms. It is not immediately clear, however, how existing causal conceptions of fairness relate to one another, or what the consequences are of using these definitions as design principles. Here, we first assemble and categorize popular causal definitions of algorithmic fairness into two broad families: (1) those that constrain the effects of decisions on counterfactual disparities; and (2) those that constrain the effects of legally protected characteristics, like race and gender, on decisions. We then show, analytically and empirically, that both families of definitions almost always—in a measure theoretic sense—result in strongly Pareto dominated decision policies, meaning there is an alternative, unconstrained policy favored by every stakeholder with preferences drawn from a large, natural class. For example, in the case of college admissions decisions, policies constrained to satisfy causal fairness definitions would be disfavored by every stakeholder with neutral or positive preferences for both academic preparedness and diversity. Indeed, under a prominent definition of causal fairness, we prove the resulting policies require admitting all students with the same probability, regardless of academic qualifications or group membership. Our results highlight formal limitations and potential adverse consequences of common mathematical notions of causal fairness. 
    more » « less