Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks and is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG). There is a strong need to develop community-accessible datasets to train existing AI-based cybersecurity pipelines to efficiently and accurately extract meaningful insights from CTI. We have created an initial unstructured CTI corpus from a variety of open sources that we are using to train and test cybersecurity entity models using the spaCy framework and exploring self-learning methods to automatically recognize cybersecurity entities. We also describe methods to apply cybersecurity domain entity linking with existing world knowledge from Wikidata. Our future work will survey and test spaCy NLP tools, and create methods for continuous integration of new information extracted from text.
more »
« less
Combating Fake Cyber Threat Intelligence using Provenance in Cybersecurity Knowledge Graphs
Today there is a significant amount of fake cybersecurity related intelligence on the internet. To filter out such information, we build a system to capture the provenance information and represent it along with the captured Cyber Threat Intelligence (CTI). In the cybersecurity domain, such CTI is stored in Cybersecurity Knowledge Graphs (CKG). We enhance the exiting CKG model to incorporate intelligence provenance and fuse provenance graphs with CKG. This process includes modifying traditional approaches to entity and relation extraction. CTI data is considered vital in securing our cyberspace. Knowledge graphs containing CTI information along with its provenance can provide expertise to dependent Artificial Intelligence (AI) systems and human analysts.
more »
« less
- Award ID(s):
- 2133190
- PAR ID:
- 10321535
- Date Published:
- Journal Name:
- 2021 IEEE International Conference on Big Data (Big Data)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Security engineers and researchers use their disparate knowledge and discretion to identify malware present in a system. Sometimes, they may also use previously extracted knowledge and available Cyber Threat Intelligence (CTI) about known attacks to establish a pattern. To aid in this process, they need knowledge about malware behavior mapped to the available CTI. Such mappings enrich our representations and also helps verify the information. In this paper, we describe how we retrieve malware samples and execute them in a local system. The tracked malware behavior is represented in our Cybersecurity Knowledge Graph (CKG), so that a security professional can reason with behavioral information present in the graph and draw parallels with that information. We also merge the behavioral information with knowledge extracted from the text in CTI sources like technical reports and blogs about the same malware to improve the reasoning capabilities of our CKG significantly.more » « less
-
Cyber-defense systems are being developed to automatically ingest Cyber Threat Intelligence (CTI) that contains semi-structured data and/or text to populate knowledge graphs. A potential risk is that fake CTI can be generated and spread through Open-Source Intelligence (OSINT) communities or on the Web to effect a data poisoning attack on these systems. Adversaries can use fake CTI examples as training input to subvert cyber defense systems, forcing the model to learn incorrect inputs to serve their malicious needs. In this paper, we automatically generate fake CTI text descriptions using transformers. We show that given an initial prompt sentence, a public language model like GPT-2 with fine-tuning, can generate plausible CTI text with the ability of corrupting cyber-defense systems. We utilize the generated fake CTI text to perform a data poisoning attack on a Cybersecurity Knowledge Graph (CKG) and a cybersecurity corpus. The poisoning attack introduced adverse impacts such as returning incorrect reasoning outputs, representation poisoning, and corruption of other dependent AI-based cyber defense systems. We evaluate with traditional approaches and conduct a human evaluation study with cybersecurity professionals and threat hunters. Based on the study, professional threat hunters were equally likely to consider our fake generated CTI as true.more » « less
-
null (Ed.)The implementation of Internet of Things (IoT) devices in medical environments, has introduced a growing list of security vulnerabilities and threats. The lack of an extensible big data resource that captures medical device vulnerabilities limits the use of Artificial Intelligence (AI) based cyber defense systems in capturing, detecting, and preventing known and future attacks. We describe a system that generates a repository of Cyber Threat Intelligence (CTI) about various medical devices and their known vulnerabilities from sources such as manufacturer and ICS-CERT vulnerability alerts. We augment the intelligence repository with data sources such as Wikidata and public medical databases. The combined resources are integrated with threat intelligence in our Cybersecurity Knowledge Graph (CKG) from previous research. The augmented graph embeddings are useful in querying relevant information and can help in various AI assisted cybersecurity tasks. Given the integration of multiple resources, we found the augmented CKG produced higher quality graph representations. The augmented CKG produced a 31% increase in the Mean Average Precision (MAP) value, computed over an information retrieval task.more » « less
-
Despite significant contributions to various aspects of cybersecurity, cyber-attacks remain on the unfortunate rise. Increasingly, internationally recognized entities such as the National Science Foundation and National Science & Technology Council have noted Artificial Intelligence can help analyze billions of log files, Dark Web data, malware, and other data sources to help execute fundamental cybersecurity tasks. Our objective for the 1st Workshop on Artificial Intelligence-enabled Cybersecurity Analytics (half-day; co-located with ACM KDD) was to gather academic and practitioners to contribute recent work pertaining to AI-enabled cybersecurity analytics. We composed an outstanding, inter-disciplinary Program Committee with significant expertise in various aspects of AI-enabled Cybersecurity Analytics to evaluate the submitted work. Significant contributions to the half-day workshop were made in the areas of CTI, vulnerability assessment, and malware analysis.more » « less