- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (AIES ’21)
- Sponsoring Org:
- National Science Foundation
More Like this
Information access systems, such as search and recommender systems, often use ranked lists to present results believed to be relevant to the user’s information need. Evaluating these lists for their fairness along with other traditional metrics provide a more complete understanding of an information access system’s behavior beyond accuracy or utility constructs. To measure the (un)fairness of rankings, particularly with respect to protected group(s) of producers or providers, several metrics have been proposed in the last several years. However, an empirical and comparative analyses of these metrics showing the applicability to specific scenario or real data, conceptual similarities, and differences is still lacking. We aim to bridge the gap between theoretical and practical application of these metrics. In this paper we describe several fair ranking metrics from the existing literature in a common notation, enabling direct comparison of their approaches and assumptions, and empirically compare them on the same experimental setup and data sets in the context of three information access tasks. We also provide a sensitivity analysis to assess the impact of the design choices and parameter settings that go in to these metrics and point to additional work needed to improve fairness measurement.
A variety of fairness constraints have been proposed in the literature to mitigate group-level statistical bias. Their impacts have been largely evaluated for different groups of populations corresponding to a set of sensitive attributes, such as race or gender. Nonetheless, the community has not observed sufficient explorations for how imposing fairness constraints fare at an instance level. Building on the concept of influence function, a measure that characterizes the impact of a training example on the target model and its predictive performance, this work studies the influence of training examples when fairness constraints are imposed. We find out that under certain assumptions, the influence function with respect to fairness constraints can be decomposed into a kernelized combination of training examples. One promising application of the proposed fairness influence function is to identify suspicious training examples that may cause model discrimination by ranking their influence scores. We demonstrate with extensive experiments that training on a subset of weighty data examples leads to lower fairness violations with a trade-off of accuracy.
Where machine-learned predictive risk scores inform high-stakes decisions, such as bail and sentencing in criminal justice, fairness has been a serious concern. Recent work has characterized the disparate impact that such risk scores can have when used for a binary classification task. This may not account, however, for the more diverse downstream uses of risk scores and their non-binary nature. To better account for this, in this paper, we investigate the fairness of predictive risk scores from the point of view of a bipartite ranking task, where one seeks to rank positive examples higher than negative ones. We introduce the xAUC disparity as a metric to assess the disparate impact of risk scores and define it as the difference in the probabilities of ranking a random positive example from one protected group above a negative one from another group and vice versa. We provide a decomposition of bipartite ranking loss into components that involve the discrepancy and components that involve pure predictive ability within each group. We use xAUC analysis to audit predictive risk scores for recidivism prediction, income prediction, and cardiac arrest prediction, where it describes disparities that are not evident from simply comparing within-group predictive performance.
INTRODUCTION: Quadriceps tendon autografts have experienced a rapid rise in popularity for anterior cruciate ligament (ACL) reconstruction due to advantages in graft sizing and potential improvement in biomechanics. While there is a growing body of literature on use of quadriceps tendon grafts, deeper investigation into the biomechanical properties of stitch techniques in this construct has been limited. The purpose of this study was to evaluate the performance of a novel suture needle against different conventional suture needles by comparing the biomechanical properties of two commonly used stitch methods, a whip stitch, and a locking stitch in quadriceps tendon. It was hypothesized that the new device would be capable of creating both whip stitches and locking stitches that are biomechanically equivalent to similar stitch techniques performed with conventional needle products. METHODS: This was a controlled biomechanical study. A total of 24 matched pair cadaveric knees were dissected and a total of 48 quadriceps tendons were harvested and tested. All tendon grafts were standardized to the same size. Samples were then randomized into the following groups, keeping the matched pairs together: (Group 1, n=16) consisted of Company W’s novel two-part suture needle design, (Group 2, n=16) consisted of Company A suture, andmore »
House sparrows ( Passer domesticus ) adjusted hypothalamic‐pituitary‐adrenal axis negative feedback and perch hopping activities in response to a single repeated stimulus
Chronic stress has been extensively studied in both laboratory and field settings; however, a conclusive and consistent phenotype has not been reached. Several studies have reported attenuation of the hypothalamic–pituitary–adrenal axis during experiments intended to cause chronic stress. We sought to determine whether this attenuation could be indicative of habituation. Importantly, we were not investigating habituation to a specific stimulus—as many stress physiology studies do—but rather we assessed how the underlying physiology and behavior changed in response to repeated stressor presentation. We exposed house sparrows (
Passer domesticus) to a single stimulus twice per day at random times for 8 consecutive days. We predicted that this period of time would be long enough for the birds to determine that these acute stressors were not, in fact, dangerous and they would, therefore, acclimate. A second control group remained undisturbed for the same period of time. We measured baseline, stress‐induced, negative feedback strength, and maximum production of corticosterone as well as neophobic behavior before, during, and after this 8‐day experiment. When birds experienced a stimulus for 4 days, their negative feedback strength was significantly diminished, but recovered after the second 4 days. Additionally, perch hopping decreased and recovered in this same timemore »