skip to main content


Title: LINKING EXPLOITS FROM THE DARK WEB TO KNOWN VULNERABILITIES FOR PROACTIVE CYBER THREAT INTELLIGENCE: AN ATTENTION-BASED DEEP STRUCTURED SEMANTIC MODEL1
Black hat hackers use malicious exploits to circumvent security controls and take advantage of system vulnerabilities worldwide, costing the global economy over $450 billion annually. While many organizations are increasingly turning to cyber threat intelligence (CTI) to help prioritize their vulnerabilities, extant CTI processes are often criticized as being reactive to known exploits. One promising data source that can help develop proactive CTI is the vast and ever-evolving Dark Web. In this study, we adopted the computational design science paradigm to design a novel deep learning (DL)-based exploit-vulnerability attention deep structured semantic model (EVA-DSSM) that includes bidirectional processing and attention mechanisms to automatically link exploits from the Dark Web to vulnerabilities. We also devised a novel device vulnerability severity metric (DVSM) that incorporates the exploit post date and vulnerability severity to help cybersecurity professionals with their device prioritization and risk management efforts. We rigorously evaluated the EVA-DSSM against state-of-the-art non-DL and DL-based methods for short text matching on 52,590 exploit-vulnerability linkages across four testbeds: web application, remote, local, and denial of service. Results of these evaluations indicate that the proposed EVA-DSSM achieves precision at 1 scores 20% - 41% higher than non-DL approaches and 4% - 10% higher than DL-based approaches. We demonstrated the EVA-DSSM’s and DVSM’s practical utility with two CTI case studies: openly accessible systems in the top eight U.S. hospitals and over 20,000 Supervisory Control and Data Acquisition (SCADA) systems worldwide. A complementary user evaluation of the case study results indicated that 45 cybersecurity professionals found the EVA-DSSM and DVSM results more useful for exploit-vulnerability linking and risk prioritization activities than those produced by prevailing approaches. Given the rising cost of cyberattacks, the EVA-DSSM and DVSM have important implications for analysts in security operations centers, incident response teams, and cybersecurity vendors.  more » « less
Award ID(s):
1917117 1921485 1303362
NSF-PAR ID:
10336823
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
MIS quarterly
Volume:
46
Issue:
2
ISSN:
0276-7783
Page Range / eLocation ID:
911-946
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Black hat hackers use malicious exploits to circumvent security controls and take advantage of system vulnerabilities worldwide, costing the global economy over $450 billion annually. While many organizations are increasingly turning to cyber threat intelligence (CTI) to help prioritize their vulnerabilities, extant CTI processes are often criticized as being reactive to known exploits. One promising data source that can help develop proactive CTI is the vast and ever-evolving Dark Web. In this study, we adopted the computational design science paradigm to design a novel deep learning (DL)-based exploit-vulnerability attention deep structured semantic model (EVA-DSSM) that includes bidirectional processing and attention mechanisms to automatically link exploits from the Dark Web to vulnerabilities. We also devised a novel device vulnerability severity metric (DVSM) that incorporates the exploit post date and vulnerability severity to help cybersecurity professionals with their device prioritization and risk management efforts. We rigorously evaluated the EVA-DSSM against state-of-the-art non-DL and DL-based methods for short text matching on 52,590 exploit-vulnerability linkages across four testbeds: web application, remote, local, and denial of service. Results of these evaluations indicate that the proposed EVA-DSSM achieves precision at 1 scores 20%-41% higher than non-DL approaches and 4%-10% higher than DL-based approaches. We demonstrated the EVA-DSSM's and DVSM's practical utility with two CTI case studies: openly accessible systems in the top eight U.S. hospitals and over 20,000 Supervisory Control and Data Acquisition (SCADA) systems worldwide. A complementary user evaluation of the case study results indicated that 45 cybersecurity professionals found the EVA-DSSM and DVSM results more useful for exploit-vulnerability linking and risk prioritization activities than those produced by prevailing approaches. Given the rising cost of cyberattacks, the EVA-DSSM and DVSM have important implications for analysts in security operations centers, incident response teams, and cybersecurity vendors. 
    more » « less
  2. Industrial control systems (ICS) include systems that control industrial processes in critical infrastructure such as electric grids, nuclear power plants, manufacturing plans, water treatment systems, pharmaceutical plants, and building automation systems. ICS represent complex systems that contain an abundance of unique devices all of which may hold different types of software, including applications, firmware and operating systems. Due to their ability to control physical infrastructure, ICS have more and more become targets of cyber-attacks, increasing the risk of serious damage, negative financial impact, disruption to business operations, disruption to communities, and even the loss of life. Ethical hacking represents one way to test the security of ICS. Ethical hacking consists of using a cyber-attacker's perspective and a variety of cybersecurity tools to actively discover vulnerabilities and entry points for potential cyber-attacks. However, ICS ethical hacking represents a difficult task due to the wide variety of devices found on ICS networks. Most ethical hackers do not hold expertise or knowledge about ICS hardware, device computing elements, protocols, vulnerabilities found on these elements, and exploits used to exploit these vulnerabilities. Effective approaches are needed to reduce the complexity of ICS ethical hacking tasks. In this study, we use ontology modeling, a knowledge representation approach in artificial intelligence (AI), to model data that represent ethical hacking tasks of building automation systems. With ontology modeling, information is stored and represented in the form of semantic graphs that express individuals, their properties, and the relations between multiple individuals. Data are drawn from sources such as the National Vulnerability Database, ExploitDB, Common Weakness Enumeration (CWE), the Common Attack Pattern and Enumeration Classification (CAPEC), and others. We show, through semantic queries, how the ontology model can automatically link together entities such as software names and versions of ICS software, vulnerabilities found on those software instances, vulnerabilities found on the protocols used by the software, exploits found on those vulnerabilities, weaknesses that represent those vulnerabilities, and attacks that can exploit those weaknesses. The ontology modeling of ICS ethical hacking and the semantic queries run over the model can reduce the complexity of ICS hacking tasks. 
    more » « less
  3. Cyber threat intelligence (CTI) is an actionable information or insight an organization uses to understand potential vulnerabilities it does have and threats it is facing. One important CTI for proactive cyber defense is exploit type with possible values system, web, network, website or Mobile. This study compares the performance of machine learning algorithms in predicating exploit types using form posts in the dark web, which is a semi- structured dataset collected from dark web. The study uses the CRISP data science approach. The results of the study show that machine learning algorithms which are function-based including support vector machine and deep-learning using artificial neural network are more accurate than those algorithms which are based on tree including Random Forest and Decision-Tree for CTI discovery from semi-structured dataset. Future research will include the use of high-performance computing and advanced deep-learning algorithms. 
    more » « less
  4. null (Ed.)
    The implementation of Internet of Things (IoT) devices in medical environments, has introduced a growing list of security vulnerabilities and threats. The lack of an extensible big data resource that captures medical device vulnerabilities limits the use of Artificial Intelligence (AI) based cyber defense systems in capturing, detecting, and preventing known and future attacks. We describe a system that generates a repository of Cyber Threat Intelligence (CTI) about various medical devices and their known vulnerabilities from sources such as manufacturer and ICS-CERT vulnerability alerts. We augment the intelligence repository with data sources such as Wikidata and public medical databases. The combined resources are integrated with threat intelligence in our Cybersecurity Knowledge Graph (CKG) from previous research. The augmented graph embeddings are useful in querying relevant information and can help in various AI assisted cybersecurity tasks. Given the integration of multiple resources, we found the augmented CKG produced higher quality graph representations. The augmented CKG produced a 31% increase in the Mean Average Precision (MAP) value, computed over an information retrieval task. 
    more » « less
  5. Operating systems play a crucial role in computer systems, serving as the fundamental infrastructure that supports a wide range of applications and services. However, they are also prime targets for malicious actors seeking to exploit vulnerabilities and compromise system security. This is a crucial area that requires active research; however, OS vulnerabilities have not been actively studied in recent years. Therefore, we conduct a comprehensive analysis of OS vulnerabilities, aiming to enhance the understanding of their trends, severity, and common weaknesses. Our research methodology encompasses data preparation, sampling of vulnerable OS categories and versions, and an in-depth analysis of trends, severity levels, and types of OS vulnerabilities. We scrape the high-level data from reliable and recognized sources to generate two refined OS vulnerability datasets: one for OS categories and another for OS versions. Our study reveals the susceptibility of popular operating systems such as Windows, Windows Server, Debian Linux, and Mac OS. Specifically, Windows 10, Windows 11, Android (v11.0, v12.0, v13.0), Windows Server 2012, Debian Linux (v10.0, v11.0), Fedora 37, and HarmonyOS 2, are identified as the most vulnerable OS versions in recent years (2021–2022). Notably, these vulnerabilities exhibit a high severity, with maximum CVSS scores falling into the 7–8 and 9–10 range. Common vulnerability types, including CWE-119, CWE-20, CWE-200, and CWE-787, are prevalent in these OSs and require specific attention from OS vendors. The findings on trends, severity, and types of OS vulnerabilities from this research will serve as a valuable resource for vendors, security professionals, and end-users, empowering them to enhance OS security measures, prioritize vulnerability management efforts, and make informed decisions to mitigate risks associated with these vulnerabilities. 
    more » « less