skip to main content


Title: Thriving on chaos: Proactive detection of command and control domains in internet of things‐scale botnets using DRIFT
Abstract

In this paper, we introduce DRIFT, a system for detecting command and control (C2) domain names in Internet of Things–scale botnets. Using an intrinsic feature of malicious domain name queries prior to their registration (perhaps due to clock drift), we devise a difference‐based lightweight feature for malicious C2 domain name detection. Using NXDomain query and response of a popular malware, we establish the effectiveness of our detector with 99% accuracy and as early as more than 48 hours before they are registered. Our technique serves as a tool of detection where other techniques relying on entropy or domain generating algorithms reversing are impractical.

 
more » « less
Award ID(s):
1809000
NSF-PAR ID:
10073645
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Transactions on Emerging Telecommunications Technologies
Volume:
30
Issue:
4
ISSN:
2161-3915
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Phishing websites remain a persistent security threat. Thus far, machine learning approaches appear to have the best potential as defenses. But, there are two main concerns with existing machine learning approaches for phishing detection. The first is the large number of training features used and the lack of validating arguments for these feature choices. The second concern is the type of datasets used in the literature that are inadvertently biased with respect to the features based on the website URL or content. To address these concerns, we put forward the intuition that the domain name of phishing websites is the tell-tale sign of phishing and holds the key to successful phishing detection. Accordingly, we design features that model the relationships, visual as well as statistical, of the domain name to the key elements of a phishing website, which are used to snare the end-users. The main value of our feature design is that, to bypass detection, an attacker will find it very difficult to tamper with the visual content of the phishing website without arousing the suspicion of the end user. Our feature set ensures that there is minimal or no bias with respect to a dataset. Our learning model trains with only seven features and achieves a true positive rate of 98% and a classification accuracy of 97%, on sample dataset. Compared to the state-of-the-art work, our per data instance classification is 4 times faster for legitimate websites and 10 times faster for phishing websites. Importantly, we demonstrate the shortcomings of using features based on URLs as they are likely to be biased towards specific datasets. We show the robustness of our learning algorithm by testing on unknown live phishing URLs and achieve a high detection accuracy of 99.7%. 
    more » « less
  2. Malware detection plays a vital role in computer security. Modern machine learning approaches have been centered around domain knowledge for extracting malicious features. However, many potential features can be used, and it is time consuming and difficult to manually identify the best features, especially given the diverse nature of malware. In this paper, we propose Neurlux, a neural network for malware detection. Neurlux does not rely on any feature engineering, rather it learns automatically from dynamic analysis reports that detail behavioral information. Our model borrows ideas from the field of document classification, using word sequences present in the reports to predict if a report is from a malicious binary or not. We investigate the learned features of our model and show which components of the reports it tends to give the highest importance. Then, we evaluate our approach on two different datasets and report formats, showing that Neurlux improves on the state of the art and can effectively learn from the dynamic analysis reports. Furthermore, we show that our approach is portable to other malware analysis environments and generalizes to different datasets. 
    more » « less
  3. Abstract

    In recent years, deep learning gained proliferating popularity in the cybersecurity application domain, since when being compared to traditional machine learning methods, it usually involves less human efforts, produces better results, and provides better generalizability. However, the imbalanced data issue is very common in cybersecurity, which can substantially deteriorate the performance of the deep learning models. This paper introduces a transfer learning based method to tackle the imbalanced data issue in cybersecurity using return-oriented programming payload detection as a case study. We achieved 0.0290 average false positive rate, 0.9705 average F1 score and 0.9521 average detection rate on 3 different target domain programs using 2 different source domain programs, with 0 benign training data sample in the target domain. The performance improvement compared to the baseline is a trade-off between false positive rate and detection rate. Using our approach, the total number of false positives is reduced by 23.16%, and as a trade-off, the number of detected malicious samples decreases by 0.68%.

     
    more » « less
  4. Abstract

    This paper presents a novel, highly distinctive and robust local surface feature descriptor. Our descriptor is predicated on a simple observation: instead of describing the points in the vicinity of a feature point relative to a reference frame at the feature point, all points in the region describe the feature point relative to their own frames. Isometry invariance is a byproduct of this construction. Our descriptor is derived relative to the extended convolution – a generalization of the standard convolution that allows the filter to adaptively transform as it passes over the domain. As such, we name our descriptor the Extended Convolution Histogram of Orientations (ECHO). It exhibits superior performance compared to popular surface descriptors in both feature matching and shape correspondence experiments. In particular, the ECHO descriptor is highly stable under near‐isometric deformations and remains distinctive under significant levels of noise, tessellation, complex deformations and the kinds of interference commonly found in real data.

     
    more » « less
  5. Cheapcommercial off-the-shelf (COTS)First-Person View (FPV)drones have become widely available for consumers in recent years. Unfortunately, they also provide low-cost attack opportunities to malicious users. Thus, effective methods to detect the presence of unknown and non-cooperating drones within a restricted area are highly demanded. Approaches based on detection of drones based on emitted video stream have been proposed, but were not yet shown to work against other similar benign traffic, such as that generated by wireless security cameras. Most importantly, these approaches were not studied in the context of detecting new unprofiled drone types. In this work, we propose a novel drone detection framework, which leverages specific patterns in video traffic transmitted by drones. The patterns consist of repetitive synchronization packets (we call pivots), which we use as features for a machine learning classifier. We show that our framework can achieve up to 99% in detection accuracy over an encrypted WiFi channel using only 170 packets originated from the drone within 820ms time period. Our framework is able to identify drone transmissions even among very similar WiFi transmissions (such as video streams originated from security cameras) as well as in noisy scenarios with background traffic. Furthermore, the design of our pivot features enables the classifier to detect unprofiled drones in which the classifier has never trained on and is refined using a novel feature selection strategy that selects the features that have the discriminative power of detecting new unprofiled drones.

     

    more » « less