Automated hiring systems are among the fastest-developing of all high-stakes AI systems. Among these are algorithmic personality tests that use insights from psychometric testing, and promise to surface personality traits indicative of future success based on job seekers' resumes or social media profiles. We interrogate the reliability of such systems using stability of the outputs they produce, noting that reliability is a necessary, but not a sufficient, condition for validity. We develop a methodology for an external audit of stability of algorithmic personality tests, and instantiate this methodology in an audit of two systems, Humantic AI and Crystal. Rather than challenging or affirming the assumptions made in psychometric testing --- that personality traits are meaningful and measurable constructs, and that they are indicative of future success on the job --- we frame our methodology around testing the underlying assumptions made by the vendors of the algorithmic personality tests themselves. In our audit of Humantic AI and Crystal, we find that both systems show substantial instability on key facets of measurement, and so cannot be considered valid testing instruments. For example, Crystal frequently computes different personality scores if the same resume is given in PDF vs. in raw text, violating the assumption that the output of an algorithmic personality test is stable across job-irrelevant input variations. Among other notable findings is evidence of persistent --- and often incorrect --- data linkage by Humantic AI. An open-source implementation of our auditing methodology, and of the audits of Humantic AI and Crystal, is available at https://github.com/DataResponsibly/hiring-stability-audit.
more »
« less
An external stability audit framework to test the validity of personality prediction in AI hiring
Abstract Automated hiring systems are among the fastest-developing of all high-stakes AI systems. Among these are algorithmic personality tests that use insights from psychometric testing, and promise to surface personality traits indicative of future success based on job seekers’ resumes or social media profiles. We interrogate the validity of such systems using stability of the outputs they produce, noting that reliability is a necessary, but not a sufficient, condition for validity. Crucially, rather than challenging or affirming the assumptions made in psychometric testing — that personality is a meaningful and measurable construct, and that personality traits are indicative of future success on the job — we frame our audit methodology around testing the underlying assumptions made by the vendors of the algorithmic personality tests themselves. Our main contribution is the development of a socio-technical framework for auditing the stability of algorithmic systems. This contribution is supplemented with an open-source software library that implements the technical components of the audit, and can be used to conduct similar stability audits of algorithmic systems. We instantiate our framework with the audit of two real-world personality prediction systems, namely, Humantic AI and Crystal. The application of our audit framework demonstrates that both these systems show substantial instability with respect to key facets of measurement, and hence cannot be considered valid testing instruments.
more »
« less
- PAR ID:
- 10372227
- Publisher / Repository:
- Springer Science + Business Media
- Date Published:
- Journal Name:
- Data Mining and Knowledge Discovery
- Volume:
- 36
- Issue:
- 6
- ISSN:
- 1384-5810
- Format(s):
- Medium: X Size: p. 2153-2193
- Size(s):
- p. 2153-2193
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This research paper delves into the evolving landscape of fine-tuning large language models (LLMs) to align with human users, extending beyond basic alignment to propose "personality alignment" for language models in organizational settings. Acknowledging the impact of training methods on the formation of undefined personality traits in AI models, the study draws parallels with human fitting processes using personality tests. Through an original case study, we demonstrate the necessity of personality fine-tuning for AIs and raise intriguing questions about applying human-designed tests to AIs, engineering specialized AI personality tests, and shaping AI personalities to suit organizational roles. The paper serves as a starting point for discussions and developments in the burgeoning field of AI personality alignment, offering a foundational anchor for future exploration in human-machine teaming and co-existence.more » « less
-
The aggregation of individual personality tests to predict team performance is widely accepted in management theory but has significant limitations: the isolated nature of individual personality surveys fails to capture much of the team dynamics that drive real-world team performance. Artificial Swarm Intelligence (ASI), a technology that enables networked teams to think together in real-time and answer questions as a unified system, promises a solution to these limitations by enabling teams to take personality tests together, whereby the team uses ASI to converge upon answers that best represent the group’s disposition. In the present study, the group personality of 94 small teams was assessed by having teams take a standard Big Five Inventory (BFI) test both as individuals, and as a real-time system enabled by an ASI technology known as Swarm AI. The predictive accuracy of each personality assessment method was assessed by correlating the BFI personality traits to a range of real-world performance metrics. The results showed that assessments of personality generated using Swarm AI were far more predictive of team performance than the traditional survey-based method, showing a significant improvement in correlation with at least 25% of performance metrics, and in no case showing a significant decrease in predictive performance. This suggests that Swarm AI technology may be used as a highly effective team personality assessment tool that more accurately predicts future team performance than traditional survey approaches.more » « less
-
This study investigates how high school-aged youth engage in algorithm auditing to identify and understand biases in artificial intelligence and machine learning (AI/ML) tools they encounter daily. With AI/ML technologies being increasingly integrated into young people’s lives, there is an urgent need to equip teenagers with AI literacies that build both technical knowledge and awareness of social impacts. Algorithm audits (also called AI audits) have traditionally been employed by experts to assess potential harmful biases, but recent research suggests that non-expert users can also participate productively in auditing. We conducted a two-week participatory design workshop with 14 teenagers (ages 14–15), where they audited the generative AI model behind TikTok’s Effect House, a tool for creating interactive TikTok filters. We present a case study describing how teenagers approached the audit, from deciding what to audit to analyzing data using diverse strategies and communicating their results. Our findings show that participants were engaged and creative throughout the activities, independently raising and exploring new considerations, such as age-related biases, that are uncommon in professional audits. We drew on our expertise in algorithm auditing to triangulate their findings as a way to examine if the workshop supported participants to reach coherent conclusions in their audit. Although the resulting number of changes in race, gender, and age representation uncovered by the teens were slightly different from ours, we reached similar conclusions. This study highlights the potential for auditing to inspire learning activities to foster AI literacies, empower teenagers to critically examine AI systems, and contribute fresh perspectives to the study of algorithmic harms.more » « less
-
This study investigates how high school-aged youth engage in algorithm auditing to identify and understand biases in artificial intelligence and machine learning (AI/ML) tools they encounter daily. With AI/ML technologies being increasingly integrated into young people’s lives, there is an urgent need to equip teenagers with AI literacies that build both technical knowledge and awareness of social impacts. Algorithm audits (also called AI audits) have traditionally been employed by experts to assess potential harmful biases, but recent research suggests that non-expert users can also participate productively in auditing. We conducted a two-week participatory design workshop with 14 teenagers (ages 14–15), where they audited the generative AI model behind TikTok’s Effect House, a tool for creating interactive TikTok filters. We present a case study describing how teenagers approached the audit, from deciding what to audit to analyzing data using diverse strategies and communicating their results. Our findings show that participants were engaged and creative throughout the activities, independently raising and exploring new considerations, such as age-related biases, that are uncommon in professional audits. We drew on our expertise in algorithm auditing to triangulate their findings as a way to examine if the workshop supported participants to reach coherent conclusions in their audit. Although the resulting number of changes in race, gender, and age representation uncovered by the teens were slightly different from ours, we reached similar conclusions. This study highlights the potential for auditing to inspire learning activities to foster AI literacies, empower teenagers to critically examine AI systems, and contribute fresh perspectives to the study of algorithmic harms.more » « less
An official website of the United States government
