Abstract Qualitative coding, or content analysis, is more than just labeling text: it is a reflexive interpretive practice that shapes research questions, refines theoretical insights, and illuminates subtle social dynamics. As large language models (LLMs) become increasingly adept at nuanced language tasks, questions arise about whether—and how—they can assist in large-scale coding without eroding the interpretive depth that distinguishes qualitative analysis from traditional machine learning and other quantitative approaches to natural language processing. In this paper, we present a hybrid approach that preserves hermeneutic value while incorporating LLMs to scale the application of codes to large data sets that are impractical for manual coding. Our workflow retains the traditional cycle of codebook development and refinement, adding an iterative step to adapt definitions for machine comprehension, before ultimately replacing manual with automated text categorization. We demonstrate how to rewrite code descriptions for LLM-interpretation, as well as how structured prompts and prompting the model to explain its coding decisions (chain-of-thought) can substantially improve fidelity. Empirically, our case study of socio-historical codes highlights the promise of frontier AI language models to reliably interpret paragraph-long passages representative of a humanistic study. Throughout, we emphasize ethical and practical considerations, preserving space for critical reflection, and the ongoing need for human researchers’ interpretive leadership. These strategies can guide both traditional and computational scholars aiming to harness automation effectively and responsibly—maintaining the creative, reflexive rigor of qualitative coding while capitalizing on the efficiency afforded by LLMs.
more »
« less
Putting Tools in Their Place: The Role of Time and Perspective in Human-AI Collaboration for Qualitative Analysis
Large datasets or 'big data' corpora are typically the domain of quantitative scholars, who work with computational tools to derive numerical and descriptive insights. However, recent work asks how computational tools and other technologies, such as AI, can support qualitative scholars in developing deep and complex insights from large amounts of data. Addressing this question, Jiang et al. found that qualitative scholars are generally opposed to incorporating AI in their practices of data analysis. In this paper, we provide nuance to these earlier findings, showing that the stage of qualitative analysis matters for how scholars believe AI can and should be used. Through interviews with 15 CSCW and HCI qualitative researchers, we explore how AI can be included throughout different stages of qualitative analysis. We find that qualitative scholars are amenable to working with AI in diverse ways, such as for data exploration and coding, as long as it assists rather than automates their analytic work practice. Based on our analysis, we discuss how incorporating AI into qualitative research can shift some analytic practices, and how designing for human-AI collaboration in qualitative analysis necessitates considering tradeoffs in scale, abstraction, and task delegation.
more »
« less
- Award ID(s):
- 1764089
- PAR ID:
- 10601889
- Publisher / Repository:
- Association for Computing Machinery (ACM)
- Date Published:
- Journal Name:
- Proceedings of the ACM on Human-Computer Interaction
- Volume:
- 5
- Issue:
- CSCW2
- ISSN:
- 2573-0142
- Format(s):
- Medium: X Size: p. 1-25
- Size(s):
- p. 1-25
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract As the scale of biological data generation has increased, the bottleneck of research has shifted from data generation to analysis. Researchers commonly need to build computational workflows that include multiple analytic tools and require incremental development as experimental insights demand tool and parameter modifications. These workflows can produce hundreds to thousands of intermediate files and results that must be integrated for biological insight. Data-centric workflow systems that internally manage computational resources, software, and conditional execution of analysis steps are reshaping the landscape of biological data analysis and empowering researchers to conduct reproducible analyses at scale. Adoption of these tools can facilitate and expedite robust data analysis, but knowledge of these techniques is still lacking. Here, we provide a series of strategies for leveraging workflow systems with structured project, data, and resource management to streamline large-scale biological analysis. We present these practices in the context of high-throughput sequencing data analysis, but the principles are broadly applicable to biologists working beyond this field.more » « less
-
Activists, journalists, and scholars have long raised critical questions about the relationship between diversity, representation, and structural exclusions in data-intensive tools and services. We build on work mapping the emergent landscape of corporate AI ethics to center one outcome of these conversations: the incorporation of diversity and inclusion in corporate AI ethics activities. Using interpretive document analysis and analytic tools from the values in design field, we examine how diversity and inclusion work is articulated in public-facing AI ethics documentation produced by three companies that create application and services layer AI infrastructure: Google, Microsoft, and Salesforce. We find that as these documents make diversity and inclusion more tractable to engineers and technical clients, they reveal a drift away from civil rights justifications that resonates with the “managerialization of diversity” by corporations in the mid-1980s. The focus on technical artifacts — such as diverse and inclusive datasets — and the replacement of equity with fairness make ethical work more actionable for everyday practitioners. Yet, they appear divorced from broader DEI initiatives and relevant subject matter experts that could provide needed context to nuanced decisions around how to operationalize these values and new solutions. Finally, diversity and inclusion, as configured by engineering logic, positions firms not as “ethics owners” but as ethics allocators; while these companies claim expertise on AI ethics, the responsibility of defining who diversity and inclusion are meant to protect and where it is relevant is pushed downstream to their customers.more » « less
-
null (Ed.)Multiple methods have been used to study how social values and ethics are implicated in technology design and use, including empirical qualitative studies of technologists’ work. Recently, more experimental approaches such as design fiction explore these themes through fictional worldbuilding. This paper combines these approaches by adapting design fictions as a form of memoing, a qualitative analysis technique. The paper uses design fiction memos to analyze and reflect on ethnographic interviews and observational data about how user experience (UX) professionals at large technology companies engage with values and ethical issues in their work. The design fictions help explore and articulate themes about the values work practices and relationships of power that UX professionals grapple with. Through these fictions, the paper contributes a case study showing how design fiction can be used for qualitative analysis, and provides insights into the role of organizational and power dynamics in UX professionals’ values work.more » « less
-
This essay draws on qualitative social science to propose a critical intellectual infrastructure for data science of social phenomena. Qualitative sensibilities— interpretivism, abductive reasoning, and reflexivity in particular—could address methodological problems that have emerged in data science and help extend the frontiers of social knowledge. First, an interpretivist lens—which is concerned with the construction of meaning in a given context—can enable the deeper insights that are requisite to understanding high-level behavioral patterns from digital trace data. Without such contextual insights, researchers often misinterpret what they find in large-scale analysis. Second, abductive reasoning—which is the process of using observations to generate a new explanation, grounded in prior assumptions about the world—is common in data science, but its application often is not systematized. Incorporating norms and practices from qualitative traditions for executing, describing, and evaluating the application of abduction would allow for greater transparency and accountability. Finally, data scientists would benefit from increased reflexivity—which is the process of evaluating how researchers’ own assumptions, experiences, and relationships influence their research. Studies demonstrate such aspects of a researcher’s experience that typically are unmentioned in quantitative traditions can influence research findings. Qualitative researchers have long faced these same concerns, and their training in how to deconstruct and document personal and intellectual starting points could prove instructive for data scientists. We believe these and other qualitative sensibilities have tremendous potential to facilitate the production of data science research that is more meaningful, reliable, and ethical.more » « less
An official website of the United States government
