skip to main content

This content will become publicly available on May 1, 2025

Title: Can AI serve as a substitute for human subjects in software engineering research?
Research within sociotechnical domains, such as Software Engineering, fundamentally requires the human perspective. Nevertheless, traditional qualitative data collection methods suffer from difficulties in participant recruitment, scaling, and labor intensity. This vision paper proposes a novel approach to qualitative data collection in software engineering research by harnessing the capabilities of artificial intelligence (AI), especially large language models (LLMs) like ChatGPT and multimodal foundation models. We explore the potential of AI-generated synthetic text as an alternative source of qualitative data, discussing how LLMs can replicate human responses and behaviors in research settings. We discuss AI applications in emulating humans in interviews, focus groups, surveys, observational studies, and user evaluations. We discuss open problems and research opportunities to implement this vision. In the future, an integrated approach where both AI and human-generated data coexist will likely yield the most effective outcomes.  more » « less
Award ID(s):
2303042 2236198 2303043 2235601
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Date Published:
Journal Name:
Automated Software Engineering
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Changing Electrical and Computer Engineering Department Culture from the Bottom Up: Action Plans Generated from Faculty Interviews We prefer a Lessons Learned Paper. In a collaborative effort between a RED: Revolutionizing Engineering and Computer Science Departments (RED) National Science Foundation grant awarded to an electrical and computer engineering department (ECpE) and a broader, university-wide ADVANCE program, ECpE faculty were invited to participate in focus groups to evaluate the culture of their department, to further department goals, and to facilitate long-term planning. Forty-four ECpE faculty members from a large Midwestern university participated in these interviews, which were specifically focused on departmental support and challenges, distribution of resources, faculty workload, career/family balance, mentoring, faculty professional development, productivity, recruitment, and diversity. Faculty were interviewed in groups according to rank, and issues important to particular subcategories of faculty (e.g., rank, gender, etc.) were noted. Data were analyzed by a social scientist using the full transcript of each interview/focus group and the NVivo 12 Qualitative Research Software Program. She presented the written report to the entire faculty. Based on the results of the focus groups, the ECpE department developed an action plan with six main thrusts for improving departmental culture and encouraging departmental change and transformation. 1. Department Interactions – Encourage open dialogue and consider department retreats. Academic areas should be held accountable for the working environment and encouraged to discuss department-related issues. 2. Mentoring, Promotion, and Evaluation – Continue mentoring junior faculty. Improve the clarity of P&T operational documents and seek faculty input on the evaluation system. 3. Teaching Loads – Investigate teaching assistant (TA) allocation models and explore models for teaching loads. Develop a TA performance evaluation system and return TA support to levels seen in the 2010 timeframe. Improvements to teaching evaluations should consider differential workloads, clarifying expectations for senior advising, and hiring more faculty for undergraduate-heavy areas. 4. Diversity, Equity, and Inclusion – Enact an explicit focus on diversity in hiring. Review departmental policies on inclusive teaching and learning environments. 5. Building – Communicate with upper administration about the need for a new building. Explore possibilities for collaborations with Computer Science on a joint building. 6. Support Staff – Increase communication with the department regarding new service delivery models. Request additional support for Human Resources, communications, and finance. Recognize staff excellence at the annual department banquet and through college/university awards. 
    more » « less
  2. In graph machine learning, data collection, sharing, and analysis often involve multiple parties, each of which may require varying levels of data security and privacy. To this end, preserving privacy is of great importance in protecting sensitive information. In the era of big data, the relationships among data entities have become unprecedentedly complex, and more applications utilize advanced data structures (i.e., graphs) that can support network structures and relevant attribute information. To date, many graph-based AI models have been proposed (e.g., graph neural networks) for various domain tasks, like computer vision and natural language processing. In this paper, we focus on reviewing privacypreserving techniques of graph machine learning. We systematically review related works from the data to the computational aspects. We rst review methods for generating privacy-preserving graph data. Then we describe methods for transmitting privacy-preserved information (e.g., graph model parameters) to realize the optimization-based computation when data sharing among multiple parties is risky or impossible. In addition to discussing relevant theoretical methodology and software tools, we also discuss current challenges and highlight several possible future research opportunities for privacy-preserving graph machine learning. Finally, we envision a uni ed and comprehensive secure graph machine learning system. 
    more » « less
  3. This research explores a novel human-in-the-loop approach that goes beyond traditional prompt engineering approaches to harness Large Language Models (LLMs) with chain-of-thought prompting for grading middle school students’ short answer formative assessments in science and generating useful feedback. While recent efforts have successfully applied LLMs and generative AI to automatically grade assignments in secondary classrooms, the focus has primarily been on providing scores for mathematical and programming problems with little work targeting the generation of actionable insight from the student responses. This paper addresses these limitations by exploring a human-in-the-loop approach to make the process more intuitive and more effective. By incorporating the expertise of educators, this approach seeks to bridge the gap between automated assessment and meaningful educational support in the context of science education for middle school students. We have conducted a preliminary user study, which suggests that (1) co-created models improve the performance of formative feedback generation, and (2) educator insight can be integrated at multiple steps in the process to inform what goes into the model and what comes out. Our findings suggest that in-context learning and human-in-the-loop approaches may provide a scalable approach to automated grading, where the performance of the automated LLM-based grader continually improves over time, while also providing actionable feedback that can support students’ open-ended science learning. 
    more » « less
  4. This paper assesses trending AI foundation models, especially emerging computer vision foundation models and their performance in natural landscape feature segmentation. While the term foundation model has quickly garnered interest from the geospatial domain, its definition remains vague. Hence, this paper will first introduce AI foundation models and their defining characteristics. Built upon the tremendous success achieved by Large Language Models (LLMs) as the foundation models for language tasks, this paper discusses the challenges of building foundation models for geospatial artificial intelligence (GeoAI) vision tasks. To evaluate the performance of large AI vision models, especially Meta’s Segment Anything Model (SAM), we implemented different instance segmentation pipelines that minimize the changes to SAM to leverage its power as a foundation model. A series of prompt strategies were developed to test SAM’s performance regarding its theoretical upper bound of predictive accuracy, zero-shot performance, and domain adaptability through fine-tuning. The analysis used two permafrost feature datasets, ice-wedge polygons and retrogressive thaw slumps because (1) these landform features are more challenging to segment than man-made features due to their complicated formation mechanisms, diverse forms, and vague boundaries; (2) their presence and changes are important indicators for Arctic warming and climate change. The results show that although promising, SAM still has room for improvement to support AI-augmented terrain mapping. The spatial and domain generalizability of this finding is further validated using a more general dataset EuroCrops for agricultural field mapping. Finally, we discuss future research directions that strengthen SAM’s applicability in challenging geospatial domains.

    more » « less
  5. Artificial Intelligence (AI) is a transformative force in communication and messaging strategy, with potential to disrupt traditional approaches. Large language models (LLMs), a form of AI, are capable of generating high-quality, humanlike text. We investigate the persuasive quality of AI-generated messages to understand how AI could impact public health messaging. Specifically, through a series of studies designed to characterize and evaluate generative AI in developing public health messages, we analyze COVID-19 pro-vaccination messages generated by GPT-3, a state-of-the-art instantiation of a large language model. Study 1 is a systematic evaluation of GPT-3's ability to generate pro-vaccination messages. Study 2 then observed peoples' perceptions of curated GPT-3-generated messages compared to human-authored messages released by the CDC (Centers for Disease Control and Prevention), finding that GPT-3 messages were perceived as more effective, stronger arguments, and evoked more positive attitudes than CDC messages. Finally, Study 3 assessed the role of source labels on perceived quality, finding that while participants preferred AI-generated messages, they expressed dispreference for messages that were labeled as AI-generated. The results suggest that, with human supervision, AI can be used to create effective public health messages, but that individuals prefer their public health messages to come from human institutions rather than AI sources. We propose best practices for assessing generative outputs of large language models in future social science research and ways health professionals can use AI systems to augment public health messaging.

    more » « less