How to cluster event sequences generated via different point processes is an interesting and important problem in statistical machine learning. To solve this problem, we propose and discuss an effective model-based clustering method based on a novel Dirichlet mixture model of a special but significant type of point processes — Hawkes process. The proposed model generates the event sequences with different clusters from the Hawkes processes with different parameters, and uses a Dirichlet distribution as the prior distribution of the clusters. We prove the identifiability of our mixture model and propose an effective variational Bayesian inference algorithm to learn our model. An adaptive inner iteration allocation strategy is designed to accelerate the convergence of our algorithm. Moreover, we investigate the sample complexity and the computational complexity of our learning algorithm in depth. Experiments on both synthetic and real-world data show that the clustering method based on our model can learn structural triggering patterns hidden in asynchronous event sequences robustly and achieve superior performance on clustering purity and consistency compared to existing methods.
more »
« less
Using Dirichlet Marked Hawkes Processes for Insider Threat Detection
Malicious insiders cause significant loss to organizations. Due to an extremely small number of malicious activities from insiders, insider threat is hard to detect. In this article, we present a Dirichlet Marked Hawkes Process (DMHP) to detect malicious activities from insiders in real-time. DMHP combines the Dirichlet process and marked Hawkes processes to model the sequence of user activities. The Dirichlet process is capable of detecting unbounded user modes (patterns) of infinite user activities, while, for each detected user mode, one set of marked Hawkes processes is adopted to model user activities from time and activity type (e.g., WWW visit or send email) information so that different user modes are modeled by different sets of marked Hawkes processes. To achieve real-time malicious insider activity detection, the likelihood of the most recent activity calculated by DMHP is adopted as a score to measure the maliciousness of the activity. Since the majority of user activities are benign, those activities with low likelihoods are labeled as malicious activities. Experimental results on two datasets show the effectiveness of DMHP.
more »
« less
- Award ID(s):
- 2103829
- PAR ID:
- 10317796
- Date Published:
- Journal Name:
- Digital Threats: Research and Practice
- Volume:
- 3
- Issue:
- 1
- ISSN:
- 2692-1626
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The massively available data about user engagement with online information service systems provides a gold mine about users' latent intents. It calls for quantitative user behavior modeling. In this paper, we study the problem by looking into users' sequential interactive behaviors. Inspired by the concepts of episodic memory and semantic memory in cognitive psychology, which describe how users' behaviors are differently influenced by past experience, we propose a Long- and Short-term Hawkes Process model. It models the short-term dependency between users' actions within a period of time via a multi-dimensional Hawkes process and the long-term dependency between actions across different periods of time via a one dimensional Hawkes process. Experiments on two real-world user activity log datasets (one from an e-commerce website and one from a MOOC website) demonstrate the effectiveness of our model in capturing the temporal dependency between actions in a sequence of user behaviors. It directly leads to improved accuracy in predicting the type and the time of the next action. Interestingly, the inferred dependency between actions in a sequence sheds light on the underlying user intent behind direct observations and provides insights for downstream applications.more » « less
-
null (Ed.)In the context of insiders, preventive security measures have a high likelihood of failing because insiders ought to have sufficient privileges to perform their jobs. Instead, in this paper, we propose to treat the insider threat by a detective measure that holds an insider accountable in case of violations. However, to enable accountability, we need to create causal models that support reasoning about the causality of a violation. Current security models (e.g., attack trees) do not allow that. Still, they are a useful source for creating causal models. In this paper, we discuss the value added by causal models in the security context. Then, we capture the interaction between attack trees and causal models by proposing an automated approach to extract the latter from the former. Our approach considers insider-specific attack classes such as collusion attacks and causal-model-specific properties like preemption relations. We present an evaluation of the resulting causal models’ validity and effectiveness, in addition to the efficiency of the extraction process.more » « less
-
null (Ed.)As organizations drastically expand their usage of collaborative systems and multi-user applications during this period of mass remote work, it is crucial to understand and manage the risks that such platforms may introduce. Improperly or carelessly deployed and configured systems hide security threats that can impact not only a single organization, but the whole economy. Cloud-based architecture is used in many collaborative systems, such as audio/video conferencing, collaborative document sharing/editing, distance learning and others. Therefore, it is important to understand that safety risk can be triggered by attacks on remote servers and confidential information might be compromised. In this paper, we present an AI powered application that aims to constantly introspect multiple virtual servers in order to detect malicious activities based on their anomalous behavior. Once the suspicious process(es) detected, the application in real-time notifies system administrator about the potential threat. Developed software is able to detect user space based keyloggers, rootkits, process hiding and other intrusion artifacts via agent-less operation, by operating directly from the host machine. Remote memory introspection means no software to install, no notice to malware to evacuate or destroy data. Conducted experiments on more than twenty different types of malicious applications provide evidence of high detection accuracymore » « less
-
This article presents a Hawkes process model with Markovian baseline intensi- ties for high-frequency order book data modeling. We classied intraday order book trading events into a range of categories based on their order types and the price change after their arrivals. In order to capture the stimulating eects between mul- tiple types of order book events, we use multivariate Hawkes process to model the self- and mutually-exciting event arrivals. We also integrate a Markovian baseline intensities into the event arrival dynamic, by including the impacts of order book liquidity state and time factor on the baseline intensity. A regression-based non- parametric estimation procedure is adopted to estimate the model parameters in our Hawkes+Markovian model. To eliminate redundant model parameters, LASSO reg- ularization is incorporated into the estimation procedure. Besides, model selection method based on Akaike Information Criteria is applied to evaluate the eect of each part of the proposed model. An implementation example based on real LOB data is provided. Through the example we studied the empirical shapes of Hawkes excitement functions, the eects of liquidity as well as time factors, the LASSO vari- able selection, and the explanation power of Hawkes and Markovian elements to the dynamics of order book.more » « less