skip to main content


Title: Building Knowledge Graphs from Unstructured Texts: Applications and Impact Analyses in Cybersecurity Education
Knowledge graphs gained popularity in recent years and have been useful for concept visualization and contextual information retrieval in various applications. However, constructing a knowledge graph by scraping long and complex unstructured texts for a new domain in the absence of a well-defined ontology or an existing labeled entity-relation dataset is difficult. Domains such as cybersecurity education can harness knowledge graphs to create a student-focused interactive and learning environment to teach cybersecurity. Learning cybersecurity involves gaining the knowledge of different attack and defense techniques, system setup and solving multi-facet complex real-world challenges that demand adaptive learning strategies and cognitive engagement. However, there are no standard datasets for the cybersecurity education domain. In this research work, we present a bottom-up approach to curate entity-relation pairs and construct knowledge graphs and question-answering models for cybersecurity education. To evaluate the impact of our new learning paradigm, we conducted surveys and interviews with students after each project to find the usefulness of bot and the knowledge graphs. Our results show that students found these tools informative for learning the core concepts and they used knowledge graphs as a visual reference to cross check the progress that helped them complete the project tasks.  more » « less
Award ID(s):
2114789
NSF-PAR ID:
10401615
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Information
Volume:
13
Issue:
11
ISSN:
2078-2489
Page Range / eLocation ID:
526
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Agrawal, Garima (Ed.)
    Cybersecurity education is exceptionally challenging as it involves learning the complex attacks; tools and developing critical problem-solving skills to defend the systems. For a student or novice researcher in the cybersecurity domain, there is a need to design an adaptive learning strategy that can break complex tasks and concepts into simple representations. An AI-enabled automated cybersecurity education system can improve cognitive engagement and active learning. Knowledge graphs (KG) provide a visual representation in a graph that can reason and interpret from the underlying data, making them suitable for use in education and interactive learning. However, there are no publicly available datasets for the cybersecurity education domain to build such systems. The data is present as unstructured educational course material, Wiki pages, capture the flag (CTF) writeups, etc. Creating knowledge graphs from unstructured text is challenging without an ontology or annotated dataset. However, data annotation for cybersecurity needs domain experts. To address these gaps, we made three contributions in this paper. First, we propose an ontology for the cybersecurity education domain for students and novice learners. Second, we develop AISecKG, a triple dataset with cybersecurity-related entities and relations as defined by the ontology. This dataset can be used to construct knowledge graphs to teach cybersecurity and promote cognitive learning. It can also be used to build downstream applications like recommendation systems or self-learning question-answering systems for students. The dataset would also help identify malicious named entities and their probable impact. Third, using this dataset, we show a downstream application to extract custom-named entities from texts and educational material on cybersecurity. 
    more » « less
  2. Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks and is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG). There is a strong need to develop community-accessible datasets to train existing AI-based cybersecurity pipelines to efficiently and accurately extract meaningful insights from CTI. We have created an initial unstructured CTI corpus from a variety of open sources that we are using to train and test cybersecurity entity models using the spaCy framework and exploring self-learning methods to automatically recognize cybersecurity entities. We also describe methods to apply cybersecurity domain entity linking with existing world knowledge from Wikidata. Our future work will survey and test spaCy NLP tools, and create methods for continuous integration of new information extracted from text. 
    more » « less
  3. Today there is a significant amount of fake cybersecurity related intelligence on the internet. To filter out such information, we build a system to capture the provenance information and represent it along with the captured Cyber Threat Intelligence (CTI). In the cybersecurity domain, such CTI is stored in Cybersecurity Knowledge Graphs (CKG). We enhance the exiting CKG model to incorporate intelligence provenance and fuse provenance graphs with CKG. This process includes modifying traditional approaches to entity and relation extraction. CTI data is considered vital in securing our cyberspace. Knowledge graphs containing CTI information along with its provenance can provide expertise to dependent Artificial Intelligence (AI) systems and human analysts. 
    more » « less
  4. Knowledge graphs (KGs) are of great importance in various artificial intelligence systems, such as question answering, relation extraction, and recommendation. Nevertheless, most real-world KGs are highly incomplete, with many missing relations between entities. To discover new triples (i.e., head entity, relation, tail entity), many KG completion algorithms have been proposed in recent years. However, a vast majority of existing studies often require a large number of training triples for each relation, which contradicts the fact that the frequency distribution of relations in KGs often follows a long tail distribution, meaning a majority of relations have only very few triples. Meanwhile, since most existing large-scale KGs are constructed automatically by extracting information from crowd-sourcing data using heuristic algorithms, plenty of errors could be inevitably incorporated due to the lack of human verification, which greatly reduces the performance for KG completion. To tackle the aforementioned issues, in this paper, we study a novel problem of error-aware few-shot KG completion and present a principled KG completion framework REFORM. Specifically, we formulate the problem under the few-shot learning framework, and our goal is to accumulate meta-knowledge across different meta-tasks and generalize the accumulated knowledge to the meta-test task for error-aware few-shot KG completion. To address the associated challenges resulting from insufficient training samples and inevitable errors, we propose three essential modules neighbor encoder, cross-relation aggregation, and error mitigation in each meta-task. Extensive experiments on three widely used KG datasets demonstrate the superiority of the proposed framework REFORM over competitive baseline methods. 
    more » « less
  5. Hands-on practice is a critical component of cybersecurity education. Most of the existing hands-on exercises or labs materials are usually managed in a problem-centric fashion, while it lacks a coherent way to manage existing labs and provide productive lab exercising plans for cybersecurity learners. With the advantages of big data and natural language processing (NLP) technologies, constructing a large knowledge graph and mining concepts from unstructured text becomes possible, which motivated us to construct a machine learning based lab exercising plan for cybersecurity education. In the research presented by this paper, we have constructed a knowledge graph in the cybersecurity domain using NLP technologies including machine learning based word embedding and hyperlink-based concept mining. We then utilized the knowledge graph during the regular learning process based on the following approaches: 1. We constructed a web-based front-end to visualize the knowledge graph, which allows students to browse and search cybersecurity-related concepts and the corresponding interdependence relations; 2. We created a personalized knowledge graph for each student based on their learning progress and status; 3.We built a personalized lab recommendation system by suggesting more relevant labs based on students’ past learning history to maximize their learning outcomes. To measure the effectiveness of the proposed solution, we have conducted a use case study and collected survey data from a graduate-level cybersecurity class. Our study shows that, by leveraging the knowledge graph for the cybersecurity area study, students tend to benefit more and show more interests in cybersecurity area. 
    more » « less