Abstract The Internet has become a vital part of our daily lives, serving as a hub for global connectivity and a facilitator for seamless communication and information exchange. However, the rise of malicious domains presents a serious challenge, undermining the reliability of the Internet and posing risks to user safety. These malicious activities exploit the Domain Name System (DNS) to deceive users, leading to harmful activities such as spreading drive-by-download malware, operating botnets, creating phishing sites, and sending spam. In response to this growing threat, the application of Machine Learning (ML) techniques has proven to be highly effective. These methods excel in quickly and accurately detecting, classifying, and analyzing such threats. This paper explores the latest developments in using transfer learning for the classification of malicious domains, with a focus on image visualization as a key methodological approach. Our proposed solution has achieved a remarkable testing accuracy rate of 98.67%, demonstrating its effectiveness in detecting and classifying malicious domains.
more »
« less
Thriving on chaos: Proactive detection of command and control domains in internet of things‐scale botnets using DRIFT
Abstract In this paper, we introduce DRIFT, a system for detecting command and control (C2) domain names in Internet of Things–scale botnets. Using an intrinsic feature of malicious domain name queries prior to their registration (perhaps due to clock drift), we devise a difference‐based lightweight feature for malicious C2 domain name detection. Using NXDomain query and response of a popular malware, we establish the effectiveness of our detector with 99% accuracy and as early as more than 48 hours before they are registered. Our technique serves as a tool of detection where other techniques relying on entropy or domain generating algorithms reversing are impractical.
more »
« less
- Award ID(s):
- 1809000
- PAR ID:
- 10073645
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Transactions on Emerging Telecommunications Technologies
- Volume:
- 30
- Issue:
- 4
- ISSN:
- 2161-3915
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Phishing websites remain a persistent security threat. Thus far, machine learning approaches appear to have the best potential as defenses. But, there are two main concerns with existing machine learning approaches for phishing detection. The first is the large number of training features used and the lack of validating arguments for these feature choices. The second concern is the type of datasets used in the literature that are inadvertently biased with respect to the features based on the website URL or content. To address these concerns, we put forward the intuition that the domain name of phishing websites is the tell-tale sign of phishing and holds the key to successful phishing detection. Accordingly, we design features that model the relationships, visual as well as statistical, of the domain name to the key elements of a phishing website, which are used to snare the end-users. The main value of our feature design is that, to bypass detection, an attacker will find it very difficult to tamper with the visual content of the phishing website without arousing the suspicion of the end user. Our feature set ensures that there is minimal or no bias with respect to a dataset. Our learning model trains with only seven features and achieves a true positive rate of 98% and a classification accuracy of 97%, on sample dataset. Compared to the state-of-the-art work, our per data instance classification is 4 times faster for legitimate websites and 10 times faster for phishing websites. Importantly, we demonstrate the shortcomings of using features based on URLs as they are likely to be biased towards specific datasets. We show the robustness of our learning algorithm by testing on unknown live phishing URLs and achieve a high detection accuracy of 99.7%.more » « less
-
Abstract In recent years, deep learning gained proliferating popularity in the cybersecurity application domain, since when being compared to traditional machine learning methods, it usually involves less human efforts, produces better results, and provides better generalizability. However, the imbalanced data issue is very common in cybersecurity, which can substantially deteriorate the performance of the deep learning models. This paper introduces a transfer learning based method to tackle the imbalanced data issue in cybersecurity using return-oriented programming payload detection as a case study. We achieved 0.0290 average false positive rate, 0.9705 average F1 score and 0.9521 average detection rate on 3 different target domain programs using 2 different source domain programs, with 0 benign training data sample in the target domain. The performance improvement compared to the baseline is a trade-off between false positive rate and detection rate. Using our approach, the total number of false positives is reduced by 23.16%, and as a trade-off, the number of detected malicious samples decreases by 0.68%.more » « less
-
Malware detection plays a vital role in computer security. Modern machine learning approaches have been centered around domain knowledge for extracting malicious features. However, many potential features can be used, and it is time consuming and difficult to manually identify the best features, especially given the diverse nature of malware. In this paper, we propose Neurlux, a neural network for malware detection. Neurlux does not rely on any feature engineering, rather it learns automatically from dynamic analysis reports that detail behavioral information. Our model borrows ideas from the field of document classification, using word sequences present in the reports to predict if a report is from a malicious binary or not. We investigate the learned features of our model and show which components of the reports it tends to give the highest importance. Then, we evaluate our approach on two different datasets and report formats, showing that Neurlux improves on the state of the art and can effectively learn from the dynamic analysis reports. Furthermore, we show that our approach is portable to other malware analysis environments and generalizes to different datasets.more » « less
-
null (Ed.)We identify over a quarter of a million domains used by medium and large companies within the .com registry. We find that for around 7% of these companies very similar domain names have been registered with character changes that are intended to be indistinguishable at a casual glance. These domains would be suitable for use in Business Email Compromise frauds. Using historical registration and name server data we identify the timing, rate, and movement of these look-alike domains over a ten year period. This allows us to identify clusters of registrations which are quite clearly malicious and show how the criminals have moved their activity over time in response to countermeasures. Although the malicious activity peaked in 2016, there is still sufficient ongoing activity to cause concern.more » « less