skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Leveraging Generative Text Models and Natural Language Processing to Perform Traditional Thematic Data Analysis
We explore the possibility of using natural language processing (NLP) and generative artificial intelligence (GAI) to streamline the process of thematic analysis (TA) for qualitative research. We followed traditional TA phases to demonstrate areas of alignment and discordance between (a) steps one might take with NLP and GAI and (b) traditional thematic analysis. Using a case study, we illustrate the application of this workflow to a real-world dataset. We start with processes involved in data analysis and translate those into analogous steps in a workflow that uses NLP and GAI. We then discuss the potential benefits and limitations of these NLP and GAI techniques, highlighting points of convergence and divergence with thematic analysis. Then, we highlight the importance of the central role of researchers during the process of NLP and GAI-assisted thematic analysis. Finally, we conclude with a discussion of the implications of this approach for qualitative research and suggestions for future work. Researchers who are interested in AI-assisted methods can benefit from the roadmap we provide in this study to understand the current landscape of NLP and GAI models for qualitative research.  more » « less
Award ID(s):
2113631
PAR ID:
10657634
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
SAGE Publications
Date Published:
Journal Name:
International Journal of Qualitative Methods
Volume:
24
ISSN:
1609-4069
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Large Language Models (LLMs) have gained attention in research and industry, aiming to streamline processes and enhance text analysis performance. Thematic Analysis (TA), a prevalent qualitative method for analyzing interview content, often requires at least two human experts to review and analyze data. This study demonstrates the feasibility of LLM-Assisted Thematic Analysis (LATA) using GPT-4 and Gemini. Specifically, we conducted semi-structured interviews with 14 researchers to gather insights on their experiences generating and analyzing Online Social Network (OSN) communications datasets. Following Braun and Clarke's six-phase TA framework with an inductive approach, we initially analyzed our interview transcripts with human experts. Subsequently, we iteratively designed prompts to guide LLMs through a similar process. We compare and discuss the manually analyzed outcomes with responses generated by LLMs and achieve a cosine similarity score up to 0.76, demonstrating a promising prospect for LATA. Additionally, the study delves into researchers' experiences navigating the complexities of collecting and analyzing OSN data, offering recommendations for future research and application designers. 
    more » « less
  2. Thematic Analysis (TA) is a fundamental method in healthcare research for analyzing transcript data, but it is resource-intensive and difficult to scale for large, complex datasets. This study investigates the potential of large language models (LLMs) to augment the inductive TA process in high-stakes healthcare settings. Focusing on interview transcripts from parents of children with Anomalous Aortic Origin of a Coronary Artery (AAOCA), a rare congenital heart disease, we propose an LLM-Enhanced Thematic Analysis (LLM-TA) pipeline. Our pipeline integrates an affordable state-of-the-art LLM (GPT-4o mini), LangChain, and prompt engineering with chunking techniques to analyze nine detailed transcripts following the inductive TA framework. We evaluate the LLM-generated themes against human-generated results using thematic similarity metrics, LLM-assisted assessments, and expert reviews. Results demonstrate that our pipeline outperforms existing LLM-assisted TA methods significantly. While the pipeline alone has not yet reached human-level quality in inductive TA, it shows great potential to improve scalability, efficiency, and accuracy while reducing analyst workload when working collaboratively with domain experts. We provide practical recommendations for incorporating LLMs into high-stakes TA workflows and emphasize the importance of close collaboration with domain experts to address challenges related to real-world applicability and dataset complexity. 
    more » « less
  3. Language-based text provide valuable insights into people’s lived experiences. While traditional qualitative analysis is used to capture these nuances, new paradigms are needed to scale qualitative research effectively. Artificial intelligence presents an unprecedented opportunity to expand the sale of analysis for obtaining such nuances. The study tests the application of GPT-4—a large language modeling—in qualitative data analysis using an existing set of text data derived from 60 qualitative interviews. Specifically, the study provides a practical guide for social and behavioral researchers, illustrating core elements and key processes, demonstrating its reliability by comparing GPT-generated codes with researchers’ codes, and evaluating its capacity for theory-driven qualitative analysis. The study followed a three-step approach: (1) prompt engineering, (2) reliability assessment by comparison of GPT-generated codes with researchers’ codes, and (3) evaluation of theory-driven thematic analysis on psychological constructs. The study underscores the utility of GPT’s capabilities in coding and analyzing text data with established qualitative methods while highlighting the need for qualitative expertise to guide GPT applications. Recommendations for further exploration are also discussed. 
    more » « less
  4. The emergence of generative artificial intelligence (GAI) has started to introduce a fundamental reexamination of established teaching methods. These GAI systems offer a chance for both educators and students to reevaluate their academic endeavors. Reevaluation of current practices is particularly pertinent in assessment within engineering instruction, where advanced generative text algorithms are proficient in addressing intricate challenges like those found in engineering courses. While this juncture presents a moment to revisit general assessment methods, the actual response of faculty to the incorporation of GAI in their evaluative techniques remains unclear. To investigate this, we have initiated a study delving into the mental constructs that engineering faculty hold about evaluation, focusing on their evolving attitudes and responses to GAI, as reported in the Fall of 2023. Adopting a long-term data-gathering strategy, we conducted a series of surveys, interviews, and recordings targeting the evaluative decision-making processes of a varied group of engineering educators across the United States. This paper presents the data collection process, our participants’ demographics, our data analysis plan, and initial findings based on the participants’ backgrounds, followed by our future work and potential implications. The analysis of the collected data will utilize qualitative thematic analysis in the next step of our study. Once we complete our study, we believe our findings will sketch the early stages of this emerging paradigm shift in the assessment of undergraduate engineering education, offering a novel perspective on the discourse surrounding evaluation strategies in the field. These insights are vital for stakeholders such as policymakers, educational leaders, and instructors, as they have significant ramifications for policy development, curriculum planning, and the broader dialogue on integrating GAI into educational evaluation. 
    more » « less
  5. In recent years, there has been a florescence of cross-cultural research using ethnographic and qualitative data. This cutting-edge work confronts a range of significant methodological challenges, but has not yet addressed how thematic analysis can be modified for use in cross-cultural ethnography. Thematic analysis is widely used in qualitative and mixed-methods research, yet is not currently well-adapted to cross-cultural ethnographic designs. We build on existing thematic analysis techniques to discuss a method to inductively identify metathemes (defined here as themes that occur across cultures). Identifying metathemes in cross-cultural research is important because metathemes enable researchers to use systematic comparisons to identify significant patterns in cross-cultural datasets and to describe those patterns in rich, contextually-specific ways. We demonstrate this method with data from a collaborative cross-cultural ethnographic research project (exploring weight-related stigma) that used the same sampling frame, interview protocol, and analytic process in four cross-cultural research sites in Samoa, Paraguay, Japan, and the United States. Detecting metathemes that transcend data collected in different languages, cultures, and sites, we discuss the benefits and challenges of qualitative metatheme analysis. 
    more » « less