Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Algorithmic decision-making using rankings— prevalent in areas from hiring and bail to university admissions— raises concerns of potential bias. In this paper, we explore the alignment between people’s perceptions of fairness and two popular fairness metrics designed for rankings. In a crowdsourced experiment with 480 participants, people rated the perceived fairness of a hypothetical scholarship distribution scenario. Results suggest a strong inclination towards relying on explicit score values. There is also evidence of people’s preference for one fairness metric, NDKL, over the other metric, ARP. Qualitative results paint a more complex picture: some participants endorse meritocratic award schemes and express concerns about fairness metrics being used to modify rankings; while other participants acknowledge socio-economic factors in score-based rankings as justification for adjusting rankings. In summary, we find that operationalizing algorithmic fairness in practice is a balancing act between mitigating harms towards marginalized groups and societal conventions of leveraging traditional performance scores such as grades in decision-making contexts.more » « lessFree, publicly-accessible full text available June 3, 2025
-
Visualization literacy is an essential skill for accurately interpreting data to inform critical decisions. Consequently, it is vital to understand the evolution of this ability and devise targeted interventions to enhance it, requiring concise and repeatable assessments of visualization literacy for individuals. However, current assessments, such as the Visualization Literacy Assessment Test ( vlat ), are time-consuming due to their fixed, lengthy format. To address this limitation, we develop two streamlined computerized adaptive tests ( cats ) for visualization literacy, a-vlat and a-calvi , which measure the same set of skills as their original versions in half the number of questions. Specifically, we (1) employ item response theory (IRT) and non-psychometric constraints to construct adaptive versions of the assessments, (2) finalize the configurations of adaptation through simulation, (3) refine the composition of test items of a-calvi via a qualitative study, and (4) demonstrate the test-retest reliability (ICC: 0.98 and 0.98) and convergent validity (correlation: 0.81 and 0.66) of both CATS via four online studies. We discuss practical recommendations for using our CATS and opportunities for further customization to leverage the full potential of adaptive assessments. All supplemental materials are available at https://osf.io/a6258/ .more » « less
-
reVISit is an open-source software toolkit and framework for creating, deploying, and monitoring empirical visualization studies. Running a quality empirical study in visualization can be demanding and resource-intensive, requiring substantial time, cost, and technical expertise from the research team. These challenges are amplified as research norms trend towards more complex and rigorous study methodologies, alongside a growing need to evaluate more complex interactive visualizations. reVISit aims to ameliorate these challenges by introducing a domain-specific language for study set-up, and a series of software components, such as UI elements, behavior provenance, and an experiment monitoring and management interface. Together with interactive or static stimuli provided by the experimenter, these are compiled to a ready-to-deploy web-based experiment. We demonstrate reVISit's functionality by re-implementing two studies --- a graphical perception task and a more complex, interactive study. reVISit is an open-source community project, available at https://revisit.dev/.more » « less
-
For applications where multiple stakeholders provide recommendations, a fair consensus ranking must not only ensure that the preferences of rankers are well represented, but must also mitigate disadvantages among socio-demographic groups in the final result. However, there is little empirical guidance on the value or challenges of visualizing and integrating fairness metrics and algorithms into human-in-the-loop systems to aid decision-makers. In this work, we design a study to analyze the effectiveness of integrating such fairness metrics-based visualization and algorithms. We explore this through a task-based crowdsourced experiment comparing an interactive visualization system for constructing consensus rankings, ConsensusFuse, with a similar system that includes visual encodings of fairness metrics and fair-rank generation algorithms, FairFuse. We analyze the measure of fairness, agreement of rankers’ decisions, and user interactions in constructing the fair consensus ranking across these two systems. In our study with 200 participants, results suggest that providing these fairness-oriented support features nudges users to align their decision with the fairness metrics while minimizing the tedious process of manually having to amend the consensus ranking. We discuss the implications of these results for the design of next-generation fairness oriented-systems and along with emerging directions for future research.more » « less
-
Fair consensus building combines the preferences of multiple rankers into a single consensus ranking, while ensuring any group defined by a protected attribute (such as race or gender) is not disadvantaged compared to other groups. Manually generating a fair consensus ranking is time-consuming and impractical- even for a fairly small number of candidates. While algorithmic approaches for auditing and generating fair consensus rankings have been developed, these have not been operationalized in interactive systems. To bridge this gap, we introduce FairFuse, a visualization system for generating, analyzing, and auditing fair consensus rankings. We construct a data model which includes base rankings entered by rankers, augmented with measures of group fairness, and algorithms for generating consensus rankings with varying degrees of fairness. We design novel visualizations that encode these measures in a parallel-coordinates style rank visualization, with interactions for generating and exploring fair consensus rankings. We describe use cases in which FairFuse supports a decision-maker in ranking scenarios in which fairness is important, and discuss emerging challenges for future efforts supporting fairness-oriented rank analysis. Code and demo videos available at https://osf.io/hd639/.more » « less
-
Graphical perception studies typically measure visualization encoding effectiveness using the error of an “average observer”, leading to canonical rankings of encodings for numerical attributes: e.g., position > area > angle > volume. Yet different people may vary in their ability to read different visualization types, leading to variance in this ranking across individuals not captured by population-level metrics using “average observer” models. One way we can bridge this gap is by recasting classic visual perception tasks as tools for assessing individual performance, in addition to overall visualization performance. In this article we replicate and extend Cleveland and McGill's graphical comparison experiment using Bayesian multilevel regression, using these models to explore individual differences in visualization skill from multiple perspectives. The results from experiments and modeling indicate that some people show patterns of accuracy that credibly deviate from the canonical rankings of visualization effectiveness. We discuss implications of these findings, such as a need for new ways to communicate visualization effectiveness to designers, how patterns in individuals’ responses may show systematic biases and strategies in visualization judgment, and how recasting classic visual perception tasks as tools for assessing individual performance may offer new ways to quantify aspects of visualization literacy. Experiment data, source code, and analysis scripts are available at the following repository: https://osf.io/8ub7t/?view_only=9be4798797404a4397be3c6fc2a68cc0 .more » « less
-
Interactive web-based applications play an important role for both service providers and consumers. However, web applications tend to be complex, produce high-volume data, and are often ripe for attack. Attack analysis and remediation are complicated by adversary obfuscation and the difficulty in assembling and analyzing logs. In this work, we explore the web application analysis task through log file fusion, distillation, and visualization. Our approach consists of visualizing the logs of web and database traffic with detailed function execution traces. We establish causal links between events and their associated behaviors. We evaluate the effectiveness of this process using data volume reduction statistics, user interaction models, and usage scenarios. Across a set of scenarios, we find that our techniques can filter at least 97.5% of log data and reduce analysis time by 93-96%.more » « less
-
Quantifying user performance with metrics such as time and accuracy does not show the whole picture when researchers evaluate complex, interactive visualization tools. In such systems, performance is often influenced by different analysis strategies that statistical analysis methods cannot account for. To remedy this lack of nuance, we propose a novel analysis methodology for evaluating complex interactive visualizations at scale. We implement our analysis methods in reVISit, which enables analysts to explore participant interaction performance metrics and responses in the context of users' analysis strategies. Replays of participant sessions can aid in identifying usability problems during pilot studies and make individual analysis processes salient. To demonstrate the applicability of reVISit to visualization studies, we analyze participant data from two published crowdsourced studies. Our findings show that reVISit can be used to reveal and describe novel interaction patterns, to analyze performance differences between different analysis strategies, and to validate or challenge design decisions.more » « less