skip to main content


Title: Evolution and differentiation of the cybersecurity communities in three social question and answer sites: A mixed-methods analysis
Cybersecurity affects us all in our daily lives. New knowledge on best practices, new vulnerabilities, and timely fixes for cybersecurity issues is growing super-linearly, and is spread across numerous, heterogeneous sources. Because of that, community contribution-based, question and answer sites have become clearinghouses for cybersecurity-related inquiries, as they have for many other topics. Historically, Stack Overflow has been the most popular platform for different kinds of technical questions, including for cybersecurity. That has been changing, however, with the advent of Security Stack Exchange, a site specifically designed for cybersecurity-related questions and answers. More recently, some cybersecurity-related subreddits of Reddit, have become hubs for cybersecurity-related questions and discussions. The availability of multiple overlapping communities has created a complex terrain to navigate for someone looking for an answer to a cybersecurity question. In this paper, we investigate how and why people choose among three prominent, overlapping, question and answer communities, for their cybersecurity knowledge needs. We aggregated data of several consecutive years of cybersecurity-related questions from Stack Overflow, Security Stack Exchange, and Reddit, and performed statistical, linguistic, and longitudinal analysis. To triangulate the results, we also conducted user surveys. We found that the user behavior across those three communities is different, in most cases. Likewise, cybersecurity-related questions asked on the three sites are different, more technical on Security Stack Exchange and Stack Overflow, and more subjective and personal on Reddit. Moreover, there appears to have been a differentiation of the communities along the same lines, accompanied by overall popularity trends suggestive of Stack Overflow’s decline and Security Stack Exchange’s rise within the cybersecurity community. Reddit is addressing the more subjective, discussion type needs of the lay community, and is growing rapidly.  more » « less
Award ID(s):
1840191
NSF-PAR ID:
10393465
Author(s) / Creator(s):
; ;
Editor(s):
Haldorai, Anandakumar
Date Published:
Journal Name:
PLOS ONE
Volume:
16
Issue:
12
ISSN:
1932-6203
Page Range / eLocation ID:
e0261954
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Directed graphs have been widely used in Community Question Answering services (CQAs) to model asymmetric relationships among different types of nodes in CQA graphs, e.g., question, answer, user. Asymmetric transitivity is an essential property of directed graphs, since it can play an important role in downstream graph inference and analysis. Question difficulty and user expertise follow the characteristic of asymmetric transitivity. Maintaining such properties, while reducing the graph to a lower dimensional vector embedding space, has been the focus of much recent research. In this paper, we tackle the challenge of directed graph embedding with asymmetric transitivity preservation and then leverage the proposed embedding method to solve a fundamental task in CQAs: how to appropriately route and assign newly posted questions to users with the suitable expertise and interest in CQAs. The technique incorporates graph hierarchy and reachability information naturally by relying on a nonlinear transformation that operates on the core reachability and implicit hierarchy within such graphs. Subsequently, the methodology levers a factorization-based approach to generate two embedding vectors for each node within the graph, to capture the asymmetric transitivity. Extensive experiments show that our framework consistently and significantly outperforms the state-of-the-art baselines on three diverse realworld tasks: link prediction, and question difficulty estimation and expert finding in online forums like Stack Exchange. Particularly, our framework can support inductive embedding learning for newly posted questions (unseen nodes during training), and therefore can properly route and assign these kinds of questions to experts in CQAs. 
    more » « less
  2. The paper presents results from a pilot questionnaire-based study on ten Stack Overflow (SO) questions. Eleven developers were tasked with determining if the SO question sentiment was positive, negative or neutral. The results from the questionnaire indicate that developers mostly rated the sentiment of SO questions as neutral, stating that they received little or no emotional feedback from the questions. Tools that were designed to analyze Software Engineering related texts (SentiStrength-SE, SentiCR, and Senti4SD) were on average more closely aligned with developer ratings for a majority of the questions than general purpose tools for detecting SO question sentiment. We discuss cases where tools and developer sentiment differ along with implications of the results. Overall, the sentiment tool output on the question title and body is more aligned with the developer rating than just the title alone. Since SO is a very common medium of technical exchange, we also report that adding code snippets, short titles, and multiple tags were top three features developers prefer in SO questions in order for it to be answered quickly. 
    more » « less
  3. Large-scale quantitative analyses have shown that individuals frequently talk to each other about similar things in different online spaces. Why do these overlapping communities exist? We provide an answer grounded in the analysis of 20 interviews with active participants in clusters of highly related subreddits. Within a broad topical area, there are a diversity of benefits an online community can confer. These include (a) specific information and discussion, (b) socialization with similar others, and (c) attention from the largest possible audience. A single community cannot meet all three needs. Our findings suggest that topical areas within an online community platform tend to become populated by groups of specialized communities with diverse sizes, topical boundaries, and rules. Compared with any single community, such systems of overlapping communities are able to provide a greater range of benefits. 
    more » « less
  4. The ARQMath Lab at CLEF 2020 considers the problem of finding answers to new mathematical questions among posted answers on a community question answering site (Math Stack Exchange). Queries are question postings held out from the test collection, each containing both text and at least one formula. We expect this to be a challenging task, as both math and text may be needed to find relevant answer posts. While several models have been proposed for text question answering, math question answering is in an earlier stage of development. To advance math-aware search and mathematical question answering systems, we will create a standard test collection for researchers to use for benchmarking. ARQMath will also include a formula retrieval sub-task: individual formulas from question posts are used to locate formulas in earlier answer posts, with relevance determined by narrative fields created based on the original question. We will use these narrative fields to explore diverse information needs for formula search (e.g., alternative notation, applications in specific fields or definition). 
    more » « less
  5. Schmorrow, D. ; Fidopiastis, C. (Ed.)
    As security measures to protect against cyberattacks increase, hackers have begun to target the weakest link in the cybersecurity chain–people. Such attacks are categorized as Social Engineering and rely on the manipulation and deception of people rather than technical security flaws [4]. This study attempts to examine the relationship between people and their vulnerability to Social Engineering attacks by posing the following questions: (1) what relationship, if any, exists between personality traits and Social Engineering vulnerability, and (2) what relationship, if any, exists between personality traits and the speed at which an individual makes cybersecurity-related decisions. To answer these questions, 79 undergraduate students at the University of Hawaii were surveyed to measure their personality traits and cybersecurity awareness. The survey results indicated that there was no significant correlation between the measured personality traits and measured vulnerability. The relationship between different personality traits and the elapsed time to complete the survey was slightly more significant; how-ever, it was still statistically insignificant overall. 
    more » « less