Automated hiring systems are among the fastest-developing of all high-stakes AI systems. Among these are algorithmic personality tests that use insights from psychometric testing, and promise to surface personality traits indicative of future success based on job seekers’ resumes or social media profiles. We interrogate the validity of such systems using stability of the outputs they produce, noting that reliability is a necessary, but not a sufficient, condition for validity. Crucially, rather than challenging or affirming the assumptions made in psychometric testing — that personality is a meaningful and measurable construct, and that personality traits are indicative of future success on the job — we frame our audit methodology around testing the underlying assumptions made by the vendors of the algorithmic personality tests themselves. Our main contribution is the development of a socio-technical framework for auditing the stability of algorithmic systems. This contribution is supplemented with an open-source software library that implements the technical components of the audit, and can be used to conduct similar stability audits of algorithmic systems. We instantiate our framework with the audit of two real-world personality prediction systems, namely, Humantic AI and Crystal. The application of our audit framework demonstrates that both these systems more »
- Publication Date:
- NSF-PAR ID:
- 10372227
- Journal Name:
- Data Mining and Knowledge Discovery
- Volume:
- 36
- Issue:
- 6
- Page Range or eLocation-ID:
- p. 2153-2193
- ISSN:
- 1384-5810
- Publisher:
- Springer Science + Business Media
- Sponsoring Org:
- National Science Foundation
More Like this
-
Automated hiring systems are among the fastest-developing of all high-stakes AI systems. Among these are algorithmic personality tests that use insights from psychometric testing, and promise to surface personality traits indicative of future success based on job seekers' resumes or social media profiles. We interrogate the reliability of such systems using stability of the outputs they produce, noting that reliability is a necessary, but not a sufficient, condition for validity. We develop a methodology for an external audit of stability of algorithmic personality tests, and instantiate this methodology in an audit of two systems, Humantic AI and Crystal. Rather than challenging or affirming the assumptions made in psychometric testing --- that personality traits are meaningful and measurable constructs, and that they are indicative of future success on the job --- we frame our methodology around testing the underlying assumptions made by the vendors of the algorithmic personality tests themselves. In our audit of Humantic AI and Crystal, we find that both systems show substantial instability on key facets of measurement, and so cannot be considered valid testing instruments. For example, Crystal frequently computes different personality scores if the same resume is given in PDF vs. in raw text, violating themore »
-
Drawing, as a skill, is closely tied to many creative fields and it is a unique practice for every individual. Drawing has been shown to improve cognitive and communicative abilities, such as visual communication, problem-solving skills, students’ academic achievement, awareness of and attention to surrounding details, and sharpened analytical skills. Drawing also stimulates both sides of the brain and improves peripheral skills of writing, 3-D spatial recognition, critical thinking, and brainstorming. People are often exposed to drawing as children, drawing their families, their houses, animals, and, most notably, their imaginative ideas. These skills develop over time naturally to some extent, however, while the base concept of drawing is a basic skill, the mastery of this skill requires extensive practice and it can often be significantly impacted by the self-efficacy of an individual. Sketchtivity is an AI tool developed by Texas A&M University to facilitate the growth of drawing skills and track their performance. Sketching skill development depends in part on students’ self-efficacy associated with their drawing abilities. Gauging the drawing self-efficacy of individuals is critical in understanding the impact that this drawing practice has had with this new novel instrument, especially in contrast to traditional practicing methods. It may alsomore »
-
The DeepLearningEpilepsyDetectionChallenge: design, implementation, andtestofanewcrowd-sourced AIchallengeecosystem Isabell Kiral*, Subhrajit Roy*, Todd Mummert*, Alan Braz*, Jason Tsay, Jianbin Tang, Umar Asif, Thomas Schaffter, Eren Mehmet, The IBM Epilepsy Consortium◊ , Joseph Picone, Iyad Obeid, Bruno De Assis Marques, Stefan Maetschke, Rania Khalaf†, Michal Rosen-Zvi† , Gustavo Stolovitzky† , Mahtab Mirmomeni† , Stefan Harrer† * These authors contributed equally to this work † Corresponding authors: rkhalaf@us.ibm.com, rosen@il.ibm.com, gustavo@us.ibm.com, mahtabm@au1.ibm.com, sharrer@au.ibm.com ◊ Members of the IBM Epilepsy Consortium are listed in the Acknowledgements section J. Picone and I. Obeid are with Temple University, USA. T. Schaffter is with Sage Bionetworks, USA. E. Mehmet is with the University of Illinois at Urbana-Champaign, USA. All other authors are with IBM Research in USA, Israel and Australia. Introduction This decade has seen an ever-growing number of scientific fields benefitting from the advances in machine learning technology and tooling. More recently, this trend reached the medical domain, with applications reaching from cancer diagnosis [1] to the development of brain-machine-interfaces [2]. While Kaggle has pioneered the crowd-sourcing of machine learning challenges to incentivise data scientists from around the world to advance algorithm and model design, the increasing complexity of problem statements demands of participants to be expert datamore »
-
This work-in-progress research paper stems from a larger project where we are developing and gathering validity evidence for an instrument to measure undergraduate students' perceptions of support in science, technology, engineering, and mathematics (STEM). The refinement of our instrument functions to extend, operationalize, and empirically test the model of co-curricular support (MCCS). The MCCS is a conceptual framework of student support that explains how a student's interactions with the professional, academic and social systems within a college could influence their success more broadly in an undergraduate STEM degree program. Our goal is to create an instrument that functions diagnostically to help colleges effectively allocate resources for the various financial, physical, and human capital support provided to undergraduate students in STEM. While testing the validity of our newly developed instrument, an analysis of the data revealed differences in perceived support among College of Engineering (COE) and College of Science (COS) students. In this work-in-progress paper, we examine these differences at one institution using descriptive statistics and Welch's t-tests to identify trends and patterns of support among different student groups.
-
This paper is an initial report of our fair AI design project by a small research team made up of anthropologists and computer scientists. Our collaborative project was developed in response to the recent debates on AI's ethical and social issues (Elish and boyd 2018). We share this understanding that "numbers don't speak for themselves," but data enters into research projects already "fully cooked" (D'Ignazio and Klein 2020). Therefore, we take an anthropological approach to observing, recording, understanding, and reflecting upon the process of machine learning algorithm design from the first steps of choosing and coding datasets for training and building algorithms. We tease apart the encoding of social-cultural paradigms in the generation and use of datasets in algorithm design and testing. By doing so, we rediscover the human in data to challenge the methodological and social assumptions in data use and then to adjust the model and parameters of our algorithms. This paper centers on tracing the social trajectory of the Correctional Offender Management Profiling for Alternative Sanctions, known as the COMPAS dataset. This dataset contains data of over 10,000 criminal defendants in Broward County in Florida, the U.S. Since its publication, it has become a benchmark dataset inmore »