Algorithmic decision-making using rankings— prevalent in areas from hiring and bail to university admissions— raises concerns of potential bias. In this paper, we explore the alignment between people’s perceptions of fairness and two popular fairness metrics designed for rankings. In a crowdsourced experiment with 480 participants, people rated the perceived fairness of a hypothetical scholarship distribution scenario. Results suggest a strong inclination towards relying on explicit score values. There is also evidence of people’s preference for one fairness metric, NDKL, over the other metric, ARP. Qualitative results paint a more complex picture: some participants endorse meritocratic award schemes and express concerns about fairness metrics being used to modify rankings; while other participants acknowledge socio-economic factors in score-based rankings as justification for adjusting rankings. In summary, we find that operationalizing algorithmic fairness in practice is a balancing act between mitigating harms towards marginalized groups and societal conventions of leveraging traditional performance scores such as grades in decision-making contexts. 
                        more » 
                        « less   
                    This content will become publicly available on October 20, 2026
                            
                            Exploring “Just Noticeable” Group Fairness in Rankings
                        
                    
    
            The plethora of fairness metrics developed for ranking-based decision-making raises the question: which metrics align best with people’s perceptions of fairness, and why? Most prior studies examining people’s perceptions of fairness metrics tend to use ordinal rating scales (e.g., Likert scales). However, such scales can be ambiguous in their interpretation across participants, and can be influenced by interface features used to capture responses.We address this gap by exploring the use of two-alternative forced choice methodologies— used extensively outside the fairness community for comparing visual stimuli— to quantitatively compare participant perceptions across fairness metrics and ranking characteristics. We report a crowdsourced experiment with 224 participants across four conditions: two alternative rank fairness metrics, ARP and NDKL, and two ranking characteristics, lists of 20 and 100 candidates, resulting in over 170,000 individual judgments. Quantitative results show systematic differences in how people interpert these metrics, and surprising exceptions where fairness metrics disagree with people’s perceptions. Qualitative analyses of participant comments reveals an interplay between cognitive and visual strategies that affects people’s perceptions of fairness. From these results, we discuss future work in aligning fairness metrics with people’s perceptions, and highlight the need and benefits of expanding methodologies for fairness studies. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2007932
- PAR ID:
- 10634931
- Publisher / Repository:
- Association for the Advancement of Artificial Intelligence (www.aaai.org).
- Date Published:
- Format(s):
- Medium: X
- Location:
- AIES 2025, Madrid, Spain
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            For applications where multiple stakeholders provide recommendations, a fair consensus ranking must not only ensure that the preferences of rankers are well represented, but must also mitigate disadvantages among socio-demographic groups in the final result. However, there is little empirical guidance on the value or challenges of visualizing and integrating fairness metrics and algorithms into human-in-the-loop systems to aid decision-makers. In this work, we design a study to analyze the effectiveness of integrating such fairness metrics-based visualization and algorithms. We explore this through a task-based crowdsourced experiment comparing an interactive visualization system for constructing consensus rankings, ConsensusFuse, with a similar system that includes visual encodings of fairness metrics and fair-rank generation algorithms, FairFuse. We analyze the measure of fairness, agreement of rankers’ decisions, and user interactions in constructing the fair consensus ranking across these two systems. In our study with 200 participants, results suggest that providing these fairness-oriented support features nudges users to align their decision with the fairness metrics while minimizing the tedious process of manually having to amend the consensus ranking. We discuss the implications of these results for the design of next-generation fairness oriented-systems and along with emerging directions for future research.more » « less
- 
            With the increasing prevalence of automatic decision-making systems, concerns regarding the fairness of these systems also arise. Without a universally agreed-upon definition of fairness, given an automated decision-making scenario, researchers often adopt a crowdsourced approach to solicit people’s preferences across multiple fairness definitions. However, it is often found that crowdsourced fairness preferences are highly context-dependent, making it intriguing to explore the driving factors behind these preferences. One plausible hypothesis is that people’s fairness preferences reflect their perceived risk levels for different decision-making mistakes, such that the fairness definition that equalizes across groups the type of mistakes that are perceived as most serious will be preferred. To test this conjecture, we conduct a human-subject study (𝑁 =213) to study people’s fairness perceptions in three societal contexts. In particular, these three societal contexts differ on the expected level of risk associated with different types of decision mistakes, and we elicit both people’s fairness preferences and risk perceptions for each context. Our results show that people can often distinguish between different levels of decision risks across different societal contexts. However, we find that people’s fairness preferences do not vary significantly across the three selected societal contexts, except for within a certain subgroup of people (e.g., people with a certain racial background). As such, we observe minimal evidence suggesting that people’s risk perceptions of decision mistakes correlate with their fairness preference. These results highlight that fairness preferences are highly subjective and nuanced, and they might be primarily affected by factors other than the perceived risks of decision mistakes.more » « less
- 
            The field of machine learning fairness has developed metrics, methodologies, and data sets for experimenting with classification algorithms. However, equivalent research is lacking in the area of personalized recommender systems. This 180-minute hands-on tutorial will introduce participants to concepts in fairness-aware recommendation, and metrics and methodologies in evaluating recommendation fairness. Participants will also gain hands-on experience with conducting fairness-aware recommendation experiments with the LibRec recommendation system using the librec-auto scripting platform, and learn the steps required to configure their own experiments, incorporate their own data sets, and design their own algorithms and metrics.more » « less
- 
            null (Ed.)Visual characteristics of urban environments influence human perception and behavior, including choices for living, recreation and modes of transportation. Although geospatial visualizations hold great potential to better inform urban planning and design, computational methods are lacking to realistically measure and model urban and parkland viewscapes at sufficiently fine-scale resolution. In this study, we develop and evaluate an integrative approach to measuring and modeling fine-scale viewscape characteristics of a mixed-use urban environment, a city park. Our viewscape approach improves the integration of geospatial and perception elicitation techniques by combining high-resolution lidar-based digital surface models, visual obstruction, and photorealistic immersive virtual environments (IVEs). We assessed the realism of our viewscape models by comparing metrics of viewscape composition and configuration to human subject evaluations of IVEs across multiple landscape settings. We found strongly significant correlations between viewscape metrics and participants’ perceptions of viewscape openness and naturalness, and moderately strong correlations with landscape complexity. These results suggest that lidar-enhanced viewscape models can adequately represent visual characteristics of fine-scale urban environments. Findings also indicate the existence of relationships between human perception and landscape pattern. Our approach allows urban planners and designers to model and virtually evaluate high-resolution viewscapes of urban parks and natural landscapes with fine-scale details never before demonstrated.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
