skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on October 17, 2026

Title: Scale, Engage, or Both?: Potential and Perils of Applying Large Language Models in Interview and Conversation-Based Research
An increasing number of studies apply tools powered by large language models (LLMs) to interview and conversation-based research, one of the most commonly used research methods in CSCW. This panel invites the CSCW community to critically debate the role of LLMs in reshaping interview-based methods. We aim to explore how these tools might (1) address persistent challenges in conversation-based research, such as limited scalability and participant engagement, (2) introduce novel methodological possibilities, and (3) surface additional practical, technical, and ethical concerns. The panel discussion will be grounded on the panelists’ prior experience applying LLMs to their own interview and conversation-based research. We ask whether LLMs offer unique advantages to enhance interview research, beyond automating certain aspects of the research process. Through this discussion, we encourage researchers to reflect on how applying LLM tools may require rethinking research design, conversational protocols, and ethical practices.  more » « less
Award ID(s):
2009003
PAR ID:
10654085
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
ACM
Date Published:
Page Range / eLocation ID:
95 to 98
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The general public and scientific community alike are abuzz over the release of ChatGPT and GPT-4. Among many concerns being raised about the emergence and widespread use of tools based on large language models (LLMs) is the potential for them to propagate biases and inequities. We hope to open a conversation within the environmental data science community to encourage the circumspect and responsible use of LLMs. Here, we pose a series of questions aimed at fostering discussion and initiating a larger dialogue. To improve literacy on these tools, we provide background information on the LLMs that underpin tools like ChatGPT. We identify key areas in research and teaching in environmental data science where these tools may be applied, and discuss limitations to their use and points of concern. We also discuss ethical considerations surrounding the use of LLMs to ensure that as environmental data scientists, researchers, and instructors, we can make well-considered and informed choices about engagement with these tools. Our goal is to spark forward-looking discussion and research on how as a community we can responsibly integrate generative AI technologies into our work. 
    more » « less
  2. The rapid development and deployment of generative AI technologies creates a design challenge of how to proactively understand the implications of productizing and deploying these new technologies, especially with regard to negative design implications. This is especially concerning in CSCW applications, where AI agents can introduce misunderstandings or even misdirections with the people interacting with the agent. In this panel, researchers from academia and industry will reflect on their experiences with ideas, methods, and processes to enable designers to proactively shape the responsible design of genAI in collaborative applications. The panelists represent a range of different approaches, including speculative fiction, design activities, design toolkits, and process guides. We hope that the panel encourages a discussion in the CSCW community around techniques we can put into practice today to enable the responsible design of genAI. 
    more » « less
  3. Purpose This conversation presents the reflections from five prominent disaster scholars and practitioners on the opportunities and challenges associated with research following disasters and explores the importance of ethics in disaster research. Design/methodology/approach This paper is based on the conversations that took place on Disasters: Deconstructed Podcast livestream on the 11th of June 2021. Findings The prominent themes in this conversation include ethical approaches to research, how we–as disaster researchers and practitioners–collaborate, engage, and cooperate, and whose voices are centred in a post-disaster research context. Originality/value The conversation contributes to ongoing discussions around the conduct and practice of disaster research. 
    more » « less
  4. Although Large Language Models (LLMs) succeed in human-guided conversations such as instruction following and question answering, the potential of LLM-guided conversations—where LLMs direct the discourse and steer the conversation’s objectives—remains largely untapped. In this study, we provide an exploration of the LLM-guided conversation paradigm. Specifically, we first characterize LLM-guided conversation into three fundamental properties: (i) Goal Navigation; (ii) Context Management; (iii) Empathetic Engagement, and propose GUIDELLM as a general framework for LLM-guided conversation. We then implement an autobiography interviewing environment as one of the demonstrations of GuideLLM, which is a common practice in Reminiscence Therapy. In this environment, various techniques are integrated with GUIDELLM to enhance the autonomy of LLMs, such as Verbalized Interview Protocol (VIP) and Memory Graph Extrapolation (MGE) for goal navigation, and therapy strategies for empathetic engagement. We compare GUIDELLM with baseline LLMs, such as GPT-4-turbo and GPT-4o, from the perspective of interviewing quality, conversation quality, and autobiography generation quality. Experimental results encompassing both LLM-as-a-judge evaluations and human subject experiments involving 45 participants indicate that GUIDELLM significantly outperforms baseline LLMs in the autobiography interviewing task. 
    more » « less
  5. Few studies have compared Large Language Models (LLMs) to traditional Machine Learning (ML)-based automated scoring methods in terms of accuracy, ethics, and economics. Using a corpus of 1000 expert-scored and interview-validated scientific explanations derived from the ACORNS instrument, this study employed three LLMs and the ML-based scoring engine, EvoGrader. We measured scoring reliability (percentage agreement, kappa, precision, recall, F1), processing time, and explored contextual factors like ethics and cost. Results showed that with very basic prompt engineering, ChatGPT-4o achieved the highest performance across LLMs. Proprietary LLMs outperformed open-weight LLMs for most concepts. GPT-4o achieved robust but less accurate scoring than EvoGrader (~500 additional scoring errors). Ethical concerns over data ownership, reliability, and replicability over time were LLM limitations. EvoGrader offered superior accuracy, reliability, and replicability, but required, in its development a large, high-quality, human-scored corpus, domain expertise, and restricted assessment items. These findings highlight the diversity of considerations that should be used when considering LLM and ML scoring in science education. Despite impressive LLM advances, ML approaches may remain valuable in some contexts, particularly those prioritizing precision, reliability, replicability, privacy, and controlled implementation. 
    more » « less