skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
Attention:The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 7:00 AM ET to 7:30 AM ET on Friday, April 24 due to maintenance. We apologize for the inconvenience.


Title: Artificial intelligence in qualitative analysis: a practical guide and reflections based on results from using GPT to analyze interview data in a substance use program
Language-based text provide valuable insights into people’s lived experiences. While traditional qualitative analysis is used to capture these nuances, new paradigms are needed to scale qualitative research effectively. Artificial intelligence presents an unprecedented opportunity to expand the sale of analysis for obtaining such nuances. The study tests the application of GPT-4—a large language modeling—in qualitative data analysis using an existing set of text data derived from 60 qualitative interviews. Specifically, the study provides a practical guide for social and behavioral researchers, illustrating core elements and key processes, demonstrating its reliability by comparing GPT-generated codes with researchers’ codes, and evaluating its capacity for theory-driven qualitative analysis. The study followed a three-step approach: (1) prompt engineering, (2) reliability assessment by comparison of GPT-generated codes with researchers’ codes, and (3) evaluation of theory-driven thematic analysis on psychological constructs. The study underscores the utility of GPT’s capabilities in coding and analyzing text data with established qualitative methods while highlighting the need for qualitative expertise to guide GPT applications. Recommendations for further exploration are also discussed.  more » « less
Award ID(s):
2449011
PAR ID:
10590730
Author(s) / Creator(s):
;
Publisher / Repository:
Springer Nature
Date Published:
Journal Name:
Quality & Quantity
ISSN:
0033-5177
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Qualitative coding, or content analysis, is more than just labeling text: it is a reflexive interpretive practice that shapes research questions, refines theoretical insights, and illuminates subtle social dynamics. As large language models (LLMs) become increasingly adept at nuanced language tasks, questions arise about whether—and how—they can assist in large-scale coding without eroding the interpretive depth that distinguishes qualitative analysis from traditional machine learning and other quantitative approaches to natural language processing. In this paper, we present a hybrid approach that preserves hermeneutic value while incorporating LLMs to scale the application of codes to large data sets that are impractical for manual coding. Our workflow retains the traditional cycle of codebook development and refinement, adding an iterative step to adapt definitions for machine comprehension, before ultimately replacing manual with automated text categorization. We demonstrate how to rewrite code descriptions for LLM-interpretation, as well as how structured prompts and prompting the model to explain its coding decisions (chain-of-thought) can substantially improve fidelity. Empirically, our case study of socio-historical codes highlights the promise of frontier AI language models to reliably interpret paragraph-long passages representative of a humanistic study. Throughout, we emphasize ethical and practical considerations, preserving space for critical reflection, and the ongoing need for human researchers’ interpretive leadership. These strategies can guide both traditional and computational scholars aiming to harness automation effectively and responsibly—maintaining the creative, reflexive rigor of qualitative coding while capitalizing on the efficiency afforded by LLMs. 
    more » « less
  2. Large Language Models (LLMs) have gained attention in research and industry, aiming to streamline processes and enhance text analysis performance. Thematic Analysis (TA), a prevalent qualitative method for analyzing interview content, often requires at least two human experts to review and analyze data. This study demonstrates the feasibility of LLM-Assisted Thematic Analysis (LATA) using GPT-4 and Gemini. Specifically, we conducted semi-structured interviews with 14 researchers to gather insights on their experiences generating and analyzing Online Social Network (OSN) communications datasets. Following Braun and Clarke's six-phase TA framework with an inductive approach, we initially analyzed our interview transcripts with human experts. Subsequently, we iteratively designed prompts to guide LLMs through a similar process. We compare and discuss the manually analyzed outcomes with responses generated by LLMs and achieve a cosine similarity score up to 0.76, demonstrating a promising prospect for LATA. Additionally, the study delves into researchers' experiences navigating the complexities of collecting and analyzing OSN data, offering recommendations for future research and application designers. 
    more » « less
  3. This study explores the application of artificial intelligence (AI) in qualitative research, specifically examining how large language models (LLMs) can be utilized to code qualitative data and identify relationships among coder-defined themes. The approach is particularly useful for cases where researchers have previously-identified themes and hypotheses but lack the resources to code a large corpus of data manually. We outline a multi-step methodological framework grounded in qualitative research traditions, whereby researchers first conduct manual coding using a grounded theory approach (Charmaz, 2006; Glaser & Strauss, 1967) on a subset of the data. The resulting codes are then applied to the remaining data using a model-assisted process that integrates natural language processing, AI-based text classification (Noah et al., 2024), and topic identification. Lastly, this is followed by statistical analyses to test hypotheses and expected patterns, providing a robust approach to ensure reliability and accuracy. We illustrate this process through the systematic application of locally-run AI for coding interview transcripts related to graduate students’ experiences in four Ph.D. programs at a large research university. We demonstrate how AI can improve the efficiency, consistency, and scalability of qualitative research without sacrificing confidentiality. This study highlights the potential for AI to enhance qualitative research processes while addressing challenges related to nuance and interpretation. 
    more » « less
  4. This study explores the potential of the large language model GPT-4 as an automated tool for qualitative data analysis by educational researchers, exploring which techniques are most successful for different types of constructs. Specifically, we assess three different prompt engineering strategies — Zero-shot, Few-shot, and Few-shot with contextual information — as well as the use of embeddings. We do so in the context of qualitatively coding three distinct educational datasets: Algebra I semi-personalized tutoring session transcripts, student observations in a game-based learning environment, and debugging behaviours in an introductory programming course. We evaluated the performance of each approach based on its inter-rater agreement with human coders and explored how different methods vary in effectiveness depending on a construct’s degree of clarity, concreteness, objectivity, granularity, and specificity. Our findings suggest that while GPT-4 can code a broad range of constructs, no single method consistently outperforms the others, and the selection of a particular method should be tailored to the specific properties of the construct and context being analyzed. We also found that GPT-4 has the most difficulty with the same constructs than human coders find more difficult to reach inter-rater reliability on. 
    more » « less
  5. Does prompting a large language model (LLM) like GPT-3 with explanations improve in-context learning? We study this question on two NLP tasks that involve reasoning over text, namely question answering and natural language inference. We test the performance of four LLMs on three textual reasoning datasets using prompts that include explanations in multiple different styles. For these tasks, we find that including explanations in the prompts for OPT, GPT-3 (davinci), and InstructGPT (text-davinci-001) only yields small to moderate accuracy improvements over standard few-show learning. However, text-davinci-002 is able to benefit more substantially. We further show that explanations generated by the LLMs may not entail the models' predictions nor be factually grounded in the input, even on simple tasks with extractive explanations. However, these flawed explanations can still be useful as a way to verify LLMs' predictions post-hoc. Through analysis in our three settings, we show that explanations judged by humans to be good---logically consistent with the input and the prediction---more likely cooccur with accurate predictions. Following these observations, we train calibrators using automatically extracted scores that assess the reliability of explanations, allowing us to improve performance post-hoc across all of our datasets. 
    more » « less