skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2007932

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Decisions involving algorithmic rankings affect our lives in many ways, from product recommendations, receiving scholarships, to securing jobs. While tools have been developed for interactively constructing fair consensus rankings from a handful of rankings, addressing the more complex real‐world scenario— where diverse opinions are represented by a larger collection of rankings— remains a challenge. In this paper, we address these challenges by reformulating the exploration of rankings as a dimension reduction problem in a system called FairSpace. FairSpace provides new views, including Fair Divergence View and Cluster Views, by juxtaposing fairness metrics of different local and alternative global consensus rankings to aid ranking analysis tasks. We illustrate the effectiveness of FairSpace through a series of use cases, demonstrating via interactive workflows that users are empowered to create local consensuses by grouping rankings similar in their fairness or utility properties, followed by hierarchically aggregating local consensuses into a global consensus through direct manipulation. We discuss how FairSpace opens the possibility for advances in dimension reduction visualization to benefit the research area of supporting fair decision‐making in ranking based decision‐making contexts. Code, datasets and demo video available at:osf.io/d7cwk 
    more » « less
    Free, publicly-accessible full text available June 1, 2026
  2. The plethora of fairness metrics developed for ranking-based decision-making raises the question: which metrics align best with people’s perceptions of fairness, and why? Most prior studies examining people’s perceptions of fairness metrics tend to use ordinal rating scales (e.g., Likert scales). However, such scales can be ambiguous in their interpretation across participants, and can be influenced by interface features used to capture responses.We address this gap by exploring the use of two-alternative forced choice methodologies— used extensively outside the fairness community for comparing visual stimuli— to quantitatively compare participant perceptions across fairness metrics and ranking characteristics. We report a crowdsourced experiment with 224 participants across four conditions: two alternative rank fairness metrics, ARP and NDKL, and two ranking characteristics, lists of 20 and 100 candidates, resulting in over 170,000 individual judgments. Quantitative results show systematic differences in how people interpert these metrics, and surprising exceptions where fairness metrics disagree with people’s perceptions. Qualitative analyses of participant comments reveals an interplay between cognitive and visual strategies that affects people’s perceptions of fairness. From these results, we discuss future work in aligning fairness metrics with people’s perceptions, and highlight the need and benefits of expanding methodologies for fairness studies. 
    more » « less
    Free, publicly-accessible full text available October 20, 2026
  3. Rated preference aggregation is conventionally performed by averaging ratings from multiple evaluators to create a consensus ordering of candidates from highest to lowest average rating. Ideally, the consensus is fair, meaning critical opportunities are not withheld from marginalized groups of candidates, even if group biases may be present in the to-be-combined ratings. Prior work operationalizing fairness in preference aggregation is limited to settings where evaluators provide rankings of candidates (e.g., Joe > Jack > Jill). Yet, in practice, many evaluators assign ratings such as Likert scales or categories (e.g., yes, no, maybe) to each candidate. Ratings convey different information than rankings leading to distinct fairness issues during their aggregation. The existing literature does not characterize these fairness concerns nor provide applicable bias-mitigation solutions. Unlike the ranked setting studied previously, two unique forms of bias arise in rating aggregation. First, biased rating stems from group disparities in to-be-aggregated evaluator ratings. Second, biased tie-breaking occurs because ties in average ratings must be resolved when aggregating ratings into a consensus ranking, and this tie-breaking act can unfairly advantage certain groups. To address this gap, we define the open fair rated preference aggregation problem and introduce the corresponding Fate methodology. Fate offers the first group fairness metric specifically for rated preference data. We propose two Fate algorithms. Fate-Break works in settings when ties need to be broken, explicitly fairness-enhancing such processes without lowering consensus utility. Fate-Rate mitigates disparities in how groups are rated, by using a Markov-chain approach to generate outcomes where groups are, in as much as possible, equally represented. Our experimental study illustrates the FATE methods provide the most bias-mitigation compared to adapting prior methods to fair tie-breaking and rating aggregation. 
    more » « less
    Free, publicly-accessible full text available June 23, 2026
  4. We present FairRankTune, a multi-purpose open-source Python toolkit offering three primary services: quantifying fairness-related harms, leveraging bias mitigation algorithms, and constructing custom fairness-relevant datasets. FairRankTune provides researchers and practitioners with a self-contained resource for fairness auditing, experimentation, and advancing research. The central piece of FairRankTune is a novel fairness-tunable ranked data generator, RankTune, that streamlines the creation of custom fairness-relevant ranked datasets. FairRankTune also offers numerous fair ranking metrics and fairness-aware ranking algorithms within the same plug-and-play package. We demonstrate the key innovations of FairRankTune, focusing on features that are valuable to stakeholders via use cases highlighting workflows in the end-to-end process of mitigating bias in ranking systems. FairRankTune addresses the gap of limited publicly available datasets, auditing tools, and implementations for fair ranking. 
    more » « less
  5. As learning-to-rank models are increasingly deployed for decision-making in areas with profound life implications, the FairML community has been developing fair learning-to-rank (LTR) models. These models rely on the availability of sensitive demographic features such as race or sex. However, in practice, regulatory obstacles and privacy concerns protect this data from collection and use. As a result, practitioners may either need to promote fairness despite the absence of these features or turn to demographic inference tools to attempt to infer them. Given that these tools are fallible, this paper aims to further understand how errors in demographic inference impact the fairness performance of popular fair LTR strategies. In which cases would it be better to keep such demographic attributes hidden from models versus infer them? We examine a spectrum of fair LTR strategies ranging from fair LTR with and without demographic features hidden versus inferred to fairness-unaware LTR followed by fair re-ranking. We conduct a controlled empirical investigation modeling different levels of inference errors by systematically perturbing the inferred sensitive attribute. We also perform three case studies with real-world datasets and popular open-source inference methods. Our findings reveal that as inference noise grows, LTR-based methods that incorporate fairness considerations into the learning process may increase bias. In contrast, fair re-ranking strategies are more robust to inference errors. All source code, data, and experimental artifacts of our experimental study are available here: https://github.com/sewen007/hoiltr.git 
    more » « less
  6. Poster. 
    more » « less
  7. Algorithmic decision-making using rankings— prevalent in areas from hiring and bail to university admissions— raises concerns of potential bias. In this paper, we explore the alignment between people’s perceptions of fairness and two popular fairness metrics designed for rankings. In a crowdsourced experiment with 480 participants, people rated the perceived fairness of a hypothetical scholarship distribution scenario. Results suggest a strong inclination towards relying on explicit score values. There is also evidence of people’s preference for one fairness metric, NDKL, over the other metric, ARP. Qualitative results paint a more complex picture: some participants endorse meritocratic award schemes and express concerns about fairness metrics being used to modify rankings; while other participants acknowledge socio-economic factors in score-based rankings as justification for adjusting rankings. In summary, we find that operationalizing algorithmic fairness in practice is a balancing act between mitigating harms towards marginalized groups and societal conventions of leveraging traditional performance scores such as grades in decision-making contexts. 
    more » « less
  8. Preference aggregation mechanisms help decision-makers combine diverse preference rankings produced by multiple voters into a single consensus ranking. Prior work has developed methods for aggregating multiple rankings into a fair consensus over the same set of candidates. Yet few real-world problems present themselves as such precisely formulated aggregation tasks with each voter fully ranking all candidates. Instead, preferences are often expressed as rankings over partial and even disjoint subsets of candidates. For instance, hiring committee members typically opt to rank their top choices instead of exhaustively ordering every single job applicant. However, the existing literature does not offer a framework for characterizing nor ensuring group fairness in such partial preference aggregation tasks. Unlike fully ranked settings, partial preferences imply both a selection decision of whom to rank plus an ordering decision of how to rank the selected candidates. Our work fills this gap by conceptualizing the open problem of fair partial preference aggregation. We introduce an impossibility result for fair selection from partial preferences and design a computational framework showing how we can navigate this obstacle. Inspired by Single Transferable Voting, our proposed solution PreFair produces consensus rankings that are fair in the selection of candidates and also in their relative ordering. Our experimental study demonstrates that PreFair achieves the best performance in this dual fairness objective compared to state-of-the-art alternatives adapted to this new problem while still satisfying voter preferences. 
    more » « less