skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Using Dirichlet Marked Hawkes Processes for Insider Threat Detection
Malicious insiders cause significant loss to organizations. Due to an extremely small number of malicious activities from insiders, insider threat is hard to detect. In this article, we present a Dirichlet Marked Hawkes Process (DMHP) to detect malicious activities from insiders in real-time. DMHP combines the Dirichlet process and marked Hawkes processes to model the sequence of user activities. The Dirichlet process is capable of detecting unbounded user modes (patterns) of infinite user activities, while, for each detected user mode, one set of marked Hawkes processes is adopted to model user activities from time and activity type (e.g., WWW visit or send email) information so that different user modes are modeled by different sets of marked Hawkes processes. To achieve real-time malicious insider activity detection, the likelihood of the most recent activity calculated by DMHP is adopted as a score to measure the maliciousness of the activity. Since the majority of user activities are benign, those activities with low likelihoods are labeled as malicious activities. Experimental results on two datasets show the effectiveness of DMHP.  more » « less
Award ID(s):
2103829
PAR ID:
10317796
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Digital Threats: Research and Practice
Volume:
3
Issue:
1
ISSN:
2692-1626
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. How to cluster event sequences generated via different point processes is an interesting and important problem in statistical machine learning. To solve this problem, we propose and discuss an effective model-based clustering method based on a novel Dirichlet mixture model of a special but significant type of point processes — Hawkes process. The proposed model generates the event sequences with different clusters from the Hawkes processes with different parameters, and uses a Dirichlet distribution as the prior distribution of the clusters. We prove the identifiability of our mixture model and propose an effective variational Bayesian inference algorithm to learn our model. An adaptive inner iteration allocation strategy is designed to accelerate the convergence of our algorithm. Moreover, we investigate the sample complexity and the computational complexity of our learning algorithm in depth. Experiments on both synthetic and real-world data show that the clustering method based on our model can learn structural triggering patterns hidden in asynchronous event sequences robustly and achieve superior performance on clustering purity and consistency compared to existing methods. 
    more » « less
  2. The massively available data about user engagement with online information service systems provides a gold mine about users' latent intents. It calls for quantitative user behavior modeling. In this paper, we study the problem by looking into users' sequential interactive behaviors. Inspired by the concepts of episodic memory and semantic memory in cognitive psychology, which describe how users' behaviors are differently influenced by past experience, we propose a Long- and Short-term Hawkes Process model. It models the short-term dependency between users' actions within a period of time via a multi-dimensional Hawkes process and the long-term dependency between actions across different periods of time via a one dimensional Hawkes process. Experiments on two real-world user activity log datasets (one from an e-commerce website and one from a MOOC website) demonstrate the effectiveness of our model in capturing the temporal dependency between actions in a sequence of user behaviors. It directly leads to improved accuracy in predicting the type and the time of the next action. Interestingly, the inferred dependency between actions in a sequence sheds light on the underlying user intent behind direct observations and provides insights for downstream applications. 
    more » « less
  3. null (Ed.)
    In the context of insiders, preventive security measures have a high likelihood of failing because insiders ought to have sufficient privileges to perform their jobs. Instead, in this paper, we propose to treat the insider threat by a detective measure that holds an insider accountable in case of violations. However, to enable accountability, we need to create causal models that support reasoning about the causality of a violation. Current security models (e.g., attack trees) do not allow that. Still, they are a useful source for creating causal models. In this paper, we discuss the value added by causal models in the security context. Then, we capture the interaction between attack trees and causal models by proposing an automated approach to extract the latter from the former. Our approach considers insider-specific attack classes such as collusion attacks and causal-model-specific properties like preemption relations. We present an evaluation of the resulting causal models’ validity and effectiveness, in addition to the efficiency of the extraction process. 
    more » « less
  4. Hawkes processes have been shown to be efficient in modeling bursty sequences in a variety of applications, such as finance and social network activity analysis. Traditionally, these models parameterize each process independently and assume that the history of each point process can be fully observed. Such models could however be inefficient or even prohibited in certain real-world applications, such as in the field of education, where such assumptions are violated. Motivated by the problem of detecting and predicting student procrastination in students Massive Open Online Courses (MOOCs) with missing and partially observed data, in this work, we propose a novel personalized Hawkes process model (RCHawkes-Gamma) that discovers meaningful student behavior clusters by jointly learning all partially observed processes simultaneously, without relying on auxiliary features. Our experiments on both synthetic and real-world education datasets show that RCHawkes-Gamma can effectively recover student clusters and their temporal procrastination dynamics, resulting in better predictive performance of future student activities. Our further analyses of the learned parameters and their association with student delays show that the discovered student clusters unveil meaningful representations of various procrastination behaviors in students. 
    more » « less
  5. null (Ed.)
    As organizations drastically expand their usage of collaborative systems and multi-user applications during this period of mass remote work, it is crucial to understand and manage the risks that such platforms may introduce. Improperly or carelessly deployed and configured systems hide security threats that can impact not only a single organization, but the whole economy. Cloud-based architecture is used in many collaborative systems, such as audio/video conferencing, collaborative document sharing/editing, distance learning and others. Therefore, it is important to understand that safety risk can be triggered by attacks on remote servers and confidential information might be compromised. In this paper, we present an AI powered application that aims to constantly introspect multiple virtual servers in order to detect malicious activities based on their anomalous behavior. Once the suspicious process(es) detected, the application in real-time notifies system administrator about the potential threat. Developed software is able to detect user space based keyloggers, rootkits, process hiding and other intrusion artifacts via agent-less operation, by operating directly from the host machine. Remote memory introspection means no software to install, no notice to malware to evacuate or destroy data. Conducted experiments on more than twenty different types of malicious applications provide evidence of high detection accuracy 
    more » « less