Software vulnerabilities have become a serious problem with the emergence of new applications that contain potentially vulnerable or malicious code that can compromise the system. The growing volume and complexity of software source codes have opened a need for vulnerability detection methods to successfully predict malicious codes before being the prey of cyberattacks. As leveraging humans to check sources codes requires extensive time and resources and preexisting static code analyzers are unable to properly detect vulnerable codes. Thus, artificial intelligence techniques, mainly deep learning models, have gained traction to detect source code vulnerability. A systematic review is carried out to explore and understand the various deep learning methods employed for the task and their efficacy as a prediction model. Additionally, a summary of each process and its characteristics are examined and its implementation on specific data sets and their evaluation will be discussed.
more »
« less
Detecting Malicious Browser Extensions by Combining Machine Learning and Feature Engineering
As the popularity of the internet continues to grow, along with the use of web browsers and browser extensions, the threat of malicious browser extensions has increased and therefore demands an effective way to detect and in turn prevent the installation of these malicious extensions. These extensions compromise private user information (including usernames and passwords) and are also able to compromise the user’s computer in the form of Trojans and other malicious software. This paper presents a method which combines machine learning and feature engineering to detect malicious browser extensions. By analyzing the static code of browser extensions and looking for features in the static code, the method predicts whether a browser extension is malicious or benign with a machine learning algorithm. Four machine learning algorithms (SVM, RF, KNN, and XGBoost) were tested with a dataset collected by ourselves in this study. Their detection performance in terms of different performance metrics are discussed.
more »
« less
- Award ID(s):
- 2150145
- PAR ID:
- 10412991
- Editor(s):
- Latifi, S.
- Date Published:
- Journal Name:
- Advances in intelligent systems and computing
- Volume:
- 1445
- ISSN:
- 2194-5365
- Page Range / eLocation ID:
- 105-113
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Software vulnerabilities have become a serious problem with the emergence of new applications that contain potentially vulnerable or malicious code that can compromise the system. The growing volume and complexity of software source codes have opened a need for vulnerability detection methods to successfully predict malicious codes before being the prey of cyberattacks. As leveraging humans to check sources codes requires extensive time and resources and preexisting static code analyzers are unable to properly detect vulnerable codes. Thus, artificial intelligence techniques, mainly deep learning models, have gained traction to detect source code vulnerability. A systematic review is carried out to explore and understand the various deep learning methods employed for the task and their efficacy as a prediction model. Additionally, a summary of each process and its characteristics are examined and its implementation on specific data sets and their evaluation will be discussed.more » « less
-
In recent years, there has been a notable increase in the prevalence of malicious websites, leading to a majority of cyber-attacks and data breaches. Malicious websites often incorporate JavaScript code to execute attacks on web browsers. Despite existing methodologies documented in the literature, the analysis and detection of malicious JavaScript pose significant challenges due to the dynamic nature of JavaScript and the use of advanced evasion techniques. These challenges motivate the need for an innovative and efficient approach to comprehensively analyze the code to identify its malicious intent. In this paper, we introduce a monitoring approach for analyzing JavaScript code, which can capture all of the code’s features at runtime. Our method leverages the security reference monitor technique to mediate JavaScript security-sensitive executions, including function calls and property accesses. Therefore, the proposed method can capture behaviors at runtime regardless of how the code is written, even with recent advanced evasion techniques like WebAssembly diversification. We have implemented our approach as a JavaScript dynamic analysis framework called JSMBox in a Chromium-based browser extension. Our experiments demonstrated that JSMBox is capable of effectively countering sophisticated evasion techniques found in modern malicious JavaScript code, including WebAssembly diversification. We have also evaluated the framework’s ability to classify malicious behaviors based on a large-scale raw dataset comprising about 20,000 malicious and benign webpages. Our developed tool automatically launches the browser to execute these webpages, records JavaScript code execution events, and captures their execution frequency as extracted features. We have tested the extracted dataset with various machine-learning models, yielding promising experimental results that confirm the effectiveness of our approach and achieve a high accuracy rate.more » « less
-
Existing malicious code detection techniques demand the integration of multiple tools to detect different malware patterns, often suffering from high misclassification rates. Therefore, malicious code detection techniques could be enhanced by adopting advanced, more automated approaches to achieve high accuracy and a low misclassification rate. The goal of this study is to aid security analysts in detecting malicious packages by empirically studying the effectiveness of Large Language Models (LLMs) in detecting malicious code. We present SocketAI, a malicious code review workflow to detect malicious code. To evaluate the effectiveness SocketAI, we leverage a benchmark dataset of 5,115 npm packages, of which 2,180 packages have malicious code. We conducted a baseline comparison of GPT-3 and GPT-4 models with the state-of-the-art CodeQL static analysis tool, using 39 custom CodeQL rules developed in prior research to detect malicious Javascript code. We also compare the effectiveness of static analysis as a pre-screener with SocketAI workflow, measuring the number of files that need to be analyzed and the associated costs. Additionally, we performed a qualitative study to understand the types of malicious packages detected or missed by our workflow. Our baseline comparison demonstrates a 16% and 9% improvement over static analysis in precision and F1 scores, respectively. GPT-4 achieves higher accuracy with 99% precision and 97% F1 scores, while GPT-3 offers a more cost-effective balance at 91% precision and 94% F1 scores. Prescreening files with a static analyzer reduces the number of files requiring LLM analysis by 77.9% and decreases costs by 60.9% for GPT-3 and 76.1% for GPT-4. Our qualitative analysis identified data theft, execution of arbitrary code, and suspicious domain categories as the top detected malicious packages.more » « less
-
Password managers provide significant security benefits to users. However, malicious client-side scripts and browser extensions can steal passwords after the manager has autofilled them into the web page. In this paper, we extend prior work by Stock and Johns, showing how password autofill can be hardened to prevent these local attacks. We implement our design in the Firefox browser and conduct experiments demonstrating that our defense successfully protects passwords from XSS attacks and malicious extensions. We also show that our implementation is compatible with 97% of the Alexa top 1000 websites. Next, we generalize our design, creating a second defense that prevents recently discovered local attacks against the FIDO2 protocols. We implement this second defense into Firefox, demonstrating that it protects the FIDO2 protocol against XSS attacks and malicious extensions. This defense is compatible with all websites, though it does require a small change (2–3 lines) to web servers implementing FIDO2.more » « less
An official website of the United States government

