skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Understanding Graduate School Through AI: A Scalable Approach to Thematic Coding
This study explores the application of artificial intelligence (AI) in qualitative research, specifically examining how large language models (LLMs) can be utilized to code qualitative data and identify relationships among coder-defined themes. The approach is particularly useful for cases where researchers have previously-identified themes and hypotheses but lack the resources to code a large corpus of data manually. We outline a multi-step methodological framework grounded in qualitative research traditions, whereby researchers first conduct manual coding using a grounded theory approach (Charmaz, 2006; Glaser & Strauss, 1967) on a subset of the data. The resulting codes are then applied to the remaining data using a model-assisted process that integrates natural language processing, AI-based text classification (Noah et al., 2024), and topic identification. Lastly, this is followed by statistical analyses to test hypotheses and expected patterns, providing a robust approach to ensure reliability and accuracy. We illustrate this process through the systematic application of locally-run AI for coding interview transcripts related to graduate students’ experiences in four Ph.D. programs at a large research university. We demonstrate how AI can improve the efficiency, consistency, and scalability of qualitative research without sacrificing confidentiality. This study highlights the potential for AI to enhance qualitative research processes while addressing challenges related to nuance and interpretation.  more » « less
Award ID(s):
1954923
PAR ID:
10662589
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Sage Journals
Date Published:
Journal Name:
International Journal of Qualitative Methods
Volume:
24
ISSN:
1609-4069
Subject(s) / Keyword(s):
artificial intelligence, large language models, qualitative analysis, thematic analysis, statistical validation, graduate student experiences
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Qualitative coding, or content analysis, is more than just labeling text: it is a reflexive interpretive practice that shapes research questions, refines theoretical insights, and illuminates subtle social dynamics. As large language models (LLMs) become increasingly adept at nuanced language tasks, questions arise about whether—and how—they can assist in large-scale coding without eroding the interpretive depth that distinguishes qualitative analysis from traditional machine learning and other quantitative approaches to natural language processing. In this paper, we present a hybrid approach that preserves hermeneutic value while incorporating LLMs to scale the application of codes to large data sets that are impractical for manual coding. Our workflow retains the traditional cycle of codebook development and refinement, adding an iterative step to adapt definitions for machine comprehension, before ultimately replacing manual with automated text categorization. We demonstrate how to rewrite code descriptions for LLM-interpretation, as well as how structured prompts and prompting the model to explain its coding decisions (chain-of-thought) can substantially improve fidelity. Empirically, our case study of socio-historical codes highlights the promise of frontier AI language models to reliably interpret paragraph-long passages representative of a humanistic study. Throughout, we emphasize ethical and practical considerations, preserving space for critical reflection, and the ongoing need for human researchers’ interpretive leadership. These strategies can guide both traditional and computational scholars aiming to harness automation effectively and responsibly—maintaining the creative, reflexive rigor of qualitative coding while capitalizing on the efficiency afforded by LLMs. 
    more » « less
  2. Language-based text provide valuable insights into people’s lived experiences. While traditional qualitative analysis is used to capture these nuances, new paradigms are needed to scale qualitative research effectively. Artificial intelligence presents an unprecedented opportunity to expand the sale of analysis for obtaining such nuances. The study tests the application of GPT-4—a large language modeling—in qualitative data analysis using an existing set of text data derived from 60 qualitative interviews. Specifically, the study provides a practical guide for social and behavioral researchers, illustrating core elements and key processes, demonstrating its reliability by comparing GPT-generated codes with researchers’ codes, and evaluating its capacity for theory-driven qualitative analysis. The study followed a three-step approach: (1) prompt engineering, (2) reliability assessment by comparison of GPT-generated codes with researchers’ codes, and (3) evaluation of theory-driven thematic analysis on psychological constructs. The study underscores the utility of GPT’s capabilities in coding and analyzing text data with established qualitative methods while highlighting the need for qualitative expertise to guide GPT applications. Recommendations for further exploration are also discussed. 
    more » « less
  3. We explore the possibility of using natural language processing (NLP) and generative artificial intelligence (GAI) to streamline the process of thematic analysis (TA) for qualitative research. We followed traditional TA phases to demonstrate areas of alignment and discordance between (a) steps one might take with NLP and GAI and (b) traditional thematic analysis. Using a case study, we illustrate the application of this workflow to a real-world dataset. We start with processes involved in data analysis and translate those into analogous steps in a workflow that uses NLP and GAI. We then discuss the potential benefits and limitations of these NLP and GAI techniques, highlighting points of convergence and divergence with thematic analysis. Then, we highlight the importance of the central role of researchers during the process of NLP and GAI-assisted thematic analysis. Finally, we conclude with a discussion of the implications of this approach for qualitative research and suggestions for future work. Researchers who are interested in AI-assisted methods can benefit from the roadmap we provide in this study to understand the current landscape of NLP and GAI models for qualitative research. 
    more » « less
  4. Fields in the social sciences, such as education research, have started to expand the use of computer-based research methods to supplement traditional research approaches. Natural language processing techniques, such as topic modeling, may support qualitative data analysis by providing early categories that researchers may interpret and refine. This study contributes to this body of research and answers the following research questions: (RQ1) What is the relative coverage of the latent Dirichlet allocation (LDA) topic model and human coding in terms of the breadth of the topics/themes extracted from the text collection? (RQ2) What is the relative depth or level of detail among identified topics using LDA topic models and human coding approaches? A dataset of student reflections was qualitatively analyzed using LDA topic modeling and human coding approaches, and the results were compared. The findings suggest that topic models can provide reliable coverage and depth of themes present in a textual collection comparable to human coding but require manual interpretation of topics. The breadth and depth of human coding output is heavily dependent on the expertise of coders and the size of the collection; these factors are better handled in the topic modeling approach. 
    more » « less
  5. Despite the elevated importance of Data Science in Statistics, there exists limited research investigating how students learn the computing concepts and skills necessary for carrying out data science tasks. Computer Science educators have investigated how students debug their own code and how students reason through foreign code. While these studies illuminate different aspects of students’ programming behavior or conceptual understanding, a method has yet to be employed that can shed light on students’ learning processes. This type of inquiry necessitates qualitative methods, which allow for a holistic description of the skills a student uses throughout the computing code they produce, the organization of these descriptions into themes, and a comparison of the emergent themes across students or across time. In this article we share how to conceptualize and carry out the qualitative coding process with students’ computing code. Drawing on the Block Model to frame our analysis, we explore two types of research questions which could be posed about students’ learning. Supplementary materials for this article are available online. 
    more » « less