skip to main content

Title: Using Dirichlet Marked Hawkes Processes for Insider Threat Detection
Malicious insiders cause significant loss to organizations. Due to an extremely small number of malicious activities from insiders, insider threat is hard to detect. In this article, we present a Dirichlet Marked Hawkes Process (DMHP) to detect malicious activities from insiders in real-time. DMHP combines the Dirichlet process and marked Hawkes processes to model the sequence of user activities. The Dirichlet process is capable of detecting unbounded user modes (patterns) of infinite user activities, while, for each detected user mode, one set of marked Hawkes processes is adopted to model user activities from time and activity type (e.g., WWW visit or send email) information so that different user modes are modeled by different sets of marked Hawkes processes. To achieve real-time malicious insider activity detection, the likelihood of the most recent activity calculated by DMHP is adopted as a score to measure the maliciousness of the activity. Since the majority of user activities are benign, those activities with low likelihoods are labeled as malicious activities. Experimental results on two datasets show the effectiveness of DMHP.
; ;
Award ID(s):
Publication Date:
Journal Name:
Digital Threats: Research and Practice
Sponsoring Org:
National Science Foundation
More Like this
  1. How to cluster event sequences generated via different point processes is an interesting and important problem in statistical machine learning. To solve this problem, we propose and discuss an effective model-based clustering method based on a novel Dirichlet mixture model of a special but significant type of point processes — Hawkes process. The proposed model generates the event sequences with different clusters from the Hawkes processes with different parameters, and uses a Dirichlet distribution as the prior distribution of the clusters. We prove the identifiability of our mixture model and propose an effective variational Bayesian inference algorithm to learn our model. An adaptive inner iteration allocation strategy is designed to accelerate the convergence of our algorithm. Moreover, we investigate the sample complexity and the computational complexity of our learning algorithm in depth. Experiments on both synthetic and real-world data show that the clustering method based on our model can learn structural triggering patterns hidden in asynchronous event sequences robustly and achieve superior performance on clustering purity and consistency compared to existing methods.
  2. The massively available data about user engagement with online information service systems provides a gold mine about users' latent intents. It calls for quantitative user behavior modeling. In this paper, we study the problem by looking into users' sequential interactive behaviors. Inspired by the concepts of episodic memory and semantic memory in cognitive psychology, which describe how users' behaviors are differently influenced by past experience, we propose a Long- and Short-term Hawkes Process model. It models the short-term dependency between users' actions within a period of time via a multi-dimensional Hawkes process and the long-term dependency between actions across different periods of time via a one dimensional Hawkes process. Experiments on two real-world user activity log datasets (one from an e-commerce website and one from a MOOC website) demonstrate the effectiveness of our model in capturing the temporal dependency between actions in a sequence of user behaviors. It directly leads to improved accuracy in predicting the type and the time of the next action. Interestingly, the inferred dependency between actions in a sequence sheds light on the underlying user intent behind direct observations and provides insights for downstream applications.
  3. As organizations drastically expand their usage of collaborative systems and multi-user applications during this period of mass remote work, it is crucial to understand and manage the risks that such platforms may introduce. Improperly or carelessly deployed and configured systems hide security threats that can impact not only a single organization, but the whole economy. Cloud-based architecture is used in many collaborative systems, such as audio/video conferencing, collaborative document sharing/editing, distance learning and others. Therefore, it is important to understand that safety risk can be triggered by attacks on remote servers and confidential information might be compromised. In this paper, we present an AI powered application that aims to constantly introspect multiple virtual servers in order to detect malicious activities based on their anomalous behavior. Once the suspicious process(es) detected, the application in real-time notifies system administrator about the potential threat. Developed software is able to detect user space based keyloggers, rootkits, process hiding and other intrusion artifacts via agent-less operation, by operating directly from the host machine. Remote memory introspection means no software to install, no notice to malware to evacuate or destroy data. Conducted experiments on more than twenty different types of malicious applications provide evidence of highmore »detection accuracy« less
  4. Student procrastination and cramming for deadlines are major challenges in online learning environments, with negative educational and well-being side effects. Modeling student activities in continuous time and predicting their next study time are important problems that can help in creating personalized timely interventions to mitigate these challenges. However, previous attempts on dynamic modeling of student procrastination suffer from major issues: they are unable to predict the next activity times, cannot deal with missing activity history, are not personalized, and disregard important course properties, such as assignment deadlines, that are essential in explaining the cramming behavior. To resolve these problems, we introduce a new personalized stimuli-sensitive Hawkes process model (SSHP), by jointly modeling all student-assignment pairs and utilizing their similarities, to predict students’ next activity times even when there are no historical observations. Unlike regular point processes that assume a constant external triggering effect from the environment, we model three dynamic types of external stimuli, according to assignment availabilities, assignment deadlines, and each student’s time management habits. Our experiments on two synthetic datasets and two real-world datasets show a superior performance of future activity prediction, comparing with state-of-the-art models. Moreover, we show that our model achieves a flexible and accurate parameterizationmore »of activity intensities in students.« less
  5. The extreme bandwidth and performance of 5G mobile networks changes the way we develop and utilize digital services. Within a few years, 5G will not only touch technology and applications, but dramatically change the economy, our society and individual life. One of the emerging technologies that enables the evolution to 5G by bringing cloud capabilities near to the end users is Edge Computing or also known as Multi-Access Edge Computing (MEC) that will become pertinent towards the evolution of 5G. This evolution also entails growth in the threat landscape and increase privacy in concerns at different application areas, hence security and privacy plays a central role in the evolution towards 5G. Since MEC application instantiated in the virtualized infrastructure, in this paper we present a distributed application that aims to constantly introspect multiple virtual machines (VMs) in order to detect malicious activities based on their anomalous behavior. Once suspicious processes detected, our IDS in real-time notifies system administrator about the potential threat. Developed software is able to detect keyloggers, rootkits, trojans, process hiding and other intrusion artifacts via agent-less operation, by operating remotely or directly from the host machine. Remote memory introspection means no software to install, no notice tomore »malware to evacuate or destroy data. Experimental results of remote VMI on more than 50 different malicious code demonstrate average anomaly detection rate close to 97%. We have established wide testbed environment connecting networks of two universities Kyushu Institute of Technology and The City College of New York through secure GRE tunnel. Conducted experiments on this testbed deliver high response time of the proposed system.« less