Human actions or lack thereof contribute to a large majority of cybersecurity incidents. Traditionally, when looking for advice on cybersecurity questions, people have turned to search engines or social sites like Reddit. The rapid adoption of chatbot technologies is offering a potentially more direct way of getting similar advice. Initial research suggests, however, that while chatbot answers to common cybersecurity questions tend to be fairly accurate, they may not be very effective as they often fall short on other desired qualities such as understandability, actionability, or motivational power. Research in this area thus far has been limited to the evaluation by researchers themselves on a small number of synthetic questions. This article reports on what we believe to be the first in situ evaluation of a cybersecurity Question Answering (QA) assistant. We also evaluate a prompt engineered to help the cybersecurity QA assistant generate more effective answers. The study involved a 10-day deployment of a cybersecurity QA assistant in the form of a Chrome extension. Collectively, participants (N=51) evaluated answers generated by the assistant to over 1,000 cybersecurity questions they submitted as part of their regular day-to-day activities. The results suggest that a majority of participants found the assistant useful and often took actions based on the answers they received. In particular, the study indicates that prompting successfully improved the effectiveness of answers and, in particular, the likelihood that users follow their recommendations (fraction of participants who actually followed the advice was 0.514 with prompting vs. 0.402 without prompting, p=4.61E-04), an impact on people’s actual behavior. We provide a detailed analysis of data collected in this study, discuss their implications, and outline next steps in the development and deployment of effective cybersecurity QA assistants that offer the promise of changing actual user behavior and of reducing human-related security incidents.
more »
« less
This content will become publicly available on February 24, 2026
Can a Cybersecurity Question Answering Assistant Help Change User Behavior? An In Situ Study
Human actions or lack thereof contribute to a large majority of cybersecurity incidents. Traditionally, when looking for advice on cybersecurity questions, people have turned to search engines or social sites like Reddit. The rapid adoption of chatbot technologies is offering a potentially more direct way of getting similar advice. Initial research suggests, however, that while chatbot answers to common cybersecurity questions tend to be fairly accurate, they may not be very effective as they often fall short on other desired qualities such as understandability, actionability, or motivational power. Research in this area thus far has been limited to the evaluation by researchers themselves on a small number of synthetic questions. This article reports on what we believe to be the first in situ evaluation of a cybersecurity Question Answering (QA) assistant. We also evaluate a prompt engineered to help the cybersecurity QA assistant generate more effective answers. The study involved a 10-day deployment of a cybersecurity QA assistant in the form of a Chrome extension. Collectively, participants (N=51) evaluated answers generated by the assistant to over 1,000 cybersecurity questions they submitted as part of their regular day-to-day activities. The results suggest that a majority of participants found the assistant useful and often took actions based on the answers they received. In particular, the study indicates that prompting successfully improved the effectiveness of answers and, in particular, the likelihood that users follow their recommendations (fraction ofparticipants who actually followed the advice was 0.514 with prompting vs. 0.402 without prompting, p=4.61E-04), an impacton people’s actual behavior. We provide a detailed analysis of data collected in this study, discuss their implications, and outline next steps in the development and deployment of effective cybersecurity QA assistants that offer the promise of changing actual user behavior and of reducing human-related security incidents.
more »
« less
- Award ID(s):
- 1914486
- PAR ID:
- 10598526
- Publisher / Repository:
- Symposium on Usable Security and Privacy (USEC) 2025 - NDSS Symposium
- Date Published:
- ISBN:
- 979-8-9919276-5-9
- Subject(s) / Keyword(s):
- Cybersecurity Question Answering In-Situ Pilot LLM Protection Motivation Theory Answer Effectiveness
- Format(s):
- Medium: X
- Location:
- San Diego, CA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Building a skilled cybersecurity workforce is paramount to building a safer digital world. However, the diverse skill set, constantly emerging vulnerabilities, and deployment of new cyber threats make learning cybersecurity challenging. Traditional education methods struggle to cope with cybersecurity's rapidly evolving landscape and keep students engaged and motivated. Different studies on students' behaviors show that an interactive mode of education by engaging through a question-answering system or dialoguing is one of the most effective learning methodologies. There is a strong need to create advanced AI-enabled education tools to promote interactive learning in cybersecurity. Unfortunately, there are no publicly available standard question-answer datasets to build such systems for students and novice learners to learn cybersecurity concepts, tools, and techniques. The education course material and online question banks are unstructured and need to be validated and updated by domain experts, which is tedious when done manually. In this paper, we propose CyberGen, a novel unification of large language models (LLMs) and knowledge graphs (KG) to generate the questions and answers for cybersecurity automatically. Augmenting the structured knowledge from knowledge graphs in prompts improves factual reasoning and reduces hallucinations in LLMs. We used the knowledge triples from cybersecurity knowledge graphs (AISecKG) to design prompts for ChatGPT and generate questions and answers using different prompting techniques. Our question-answer dataset, CyberQ, contains around 4k pairs of questions and answers. The domain expert manually evaluated the random samples for consistency and correctness. We train the generative model using the CyberQ dataset for question answering task.more » « less
-
Human users are often the weakest link in cybersecurity, with a large percentage of security breaches attributed to some kind of human error. When confronted with everyday cybersecurity questions - or any other question for that matter, users tend to turn to their search engines, online forums, and, recently, chatbots. We report on a study on the effectiveness of answers generated by two popular chatbots to an initial set of questions related to typical cybersecurity challenges faced by users (e.g., phishing, use of VPN, multi-factor authentication). The study does not only look at the accuracy of the answers generated by the chatbots but also at whether these answers are understandable, whether they are likely to motivate users to follow any provided recommendations, and whether these recommendations are actionable. Surprisingly enough, this initial study suggests that state-of-the-art chatbots are already reasonably good at providing accurate answers to common cybersecurity questions. Yet the study also suggests that the chatbots are not very effective when it comes to generating answers that are relevant, actionable, and, most importantly, likely to motivate users to heed their recommendations. The study proceeds with the design and evaluation of prompt engineering techniques intended to improve the effectiveness of answers generated by the chatbots. Initial results suggest that it is possible to improve the effectiveness of answers and, in particular, their likelihood of motivating users to heed recommendations, and their ability to act upon these recommendations without diminishing their accuracy. We discuss the implications of these initial results and plans for future work in this area.more » « less
-
A flaw in QA evaluation is that annotations often only provide one gold answer. Thus, model predictions semantically equivalent to the answer but superficially different are considered incorrect. This work explores mining alias entities from knowledge bases and using them as additional gold answers (i.e., equivalent answers). We incorporate answers for two settings: evaluation with additional answers and model training with equivalent answers. We analyse three QA benchmarks: Natural Questions, TriviaQA and SQuAD. Answer expansion increases the exact match score on all datasets for evaluation, while incorporating it helps model training over real-world datasets. We ensure the additional answers are valid through a human post hoc evaluation.more » « less
-
Naturally-occurring information-seeking questions often contain questionable assumptions -- assumptions that are false or unverifiable. Questions containing questionable assumptions are challenging because they require a distinct answer strategy that deviates from typical answers to information-seeking questions. For instance, the question "When did Marie Curie discover Uranium?" cannot be answered as a typical when question without addressing the false assumption "Marie Curie discovered Uranium". In this work, we propose (QA)2 (Question Answering with Questionable Assumptions), an open-domain evaluation dataset consisting of naturally-occurring search engine queries that may or may not contain questionable assumptions. To be successful on (QA)2, systems must be able to detect questionable assumptions and also be able to produce adequate responses for both typical information-seeking questions and ones with questionable assumptions. We find that current models do struggle with handling questionable assumptions -- the best performing model achieves 59% human rater acceptability on abstractive QA with (QA)2 questions, leaving substantial headroom for progress.more » « less
An official website of the United States government
