Mining Multivariate Discrete Event Sequences for Knowledge Discovery and Anomaly Detection
Modern physical systems deploy large numbers of sensors to record at different time-stamps the status of different systems components via measurements such as temperature, pressure, speed, but also the component's categorical state. Depending on the measurement values, there are two kinds of sequences: continuous and discrete. For continuous sequences, there is a host of state-of-the-art algorithms for anomaly detection based on time-series analysis, but there is a lack of effective methodologies that are tailored specifically to discrete event sequences. This paper proposes an analytics framework for discrete event sequences for knowledge discovery and anomaly detection. During the training phase, the framework extracts pairwise relationships among discrete event sequences using a neural machine translation model by viewing each discrete event sequence as a "natural language". The relationship between sequences is quantified by how well one discrete event sequence is "translated" into another sequence. These pairwise relationships among sequences are aggregated into a multivariate relationship graph that clusters the structural knowledge of the underlying system and essentially discovers the hidden relationships among discrete sequences. This graph quantifies system behavior during normal operation. During testing, if one or more pairwise relationships are violated, an anomaly is detected. The proposed framework is evaluated on more »
Authors:
; ; ; ;
Award ID(s):
Publication Date:
NSF-PAR ID:
10206152
Journal Name:
Proceedings of the 50th IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2020
Volume:
1
Page Range or eLocation-ID:
552 to 563
National Science Foundation
##### More Like this
1. Abstract Background

Protein–protein interaction (PPI) is vital for life processes, disease treatment, and drug discovery. The computational prediction of PPI is relatively inexpensive and efficient when compared to traditional wet-lab experiments. Given a new protein, one may wish to find whether the protein has any PPI relationship with other existing proteins. Current computational PPI prediction methods usually compare the new protein to existing proteins one by one in a pairwise manner. This is time consuming.

Results

In this work, we propose a more efficient model, called deep hash learning protein-and-protein interaction (DHL-PPI), to predict all-against-all PPI relationships in a database of proteins. First, DHL-PPI encodes a protein sequence into a binary hash code based on deep features extracted from the protein sequences using deep learning techniques. This encoding scheme enables us to turn the PPI discrimination problem into a much simpler searching problem. The binary hash code for a protein sequence can be regarded as a number. Thus, in the pre-screening stage of DHL-PPI, the string matching problem of comparing a protein sequence against a database withMproteins can be transformed into a much more simpler problem: to find a number inside a sorted array of lengthM. This pre-screening process narrows down themore »

Conclusions

The experimental results confirmed that DHL-PPI is feasible and effective. Using a dataset with strictly negative PPI examples of four species, DHL-PPI is shown to be superior or competitive when compared to the other state-of-the-art methods in terms of precision, recall or F1 score. Furthermore, in the prediction stage, the proposed DHL-PPI reduced the time complexity from$$O(M^2)$$$O\left({M}^{2}\right)$to$$O(M\log M)$$$O\left(MlogM\right)$for performing an all-against-all PPI prediction for a database withMproteins. With the proposed approach, a protein database can be preprocessed and stored for later search using the proposed encoding scheme. This can provide a more efficient way to cope with the rapidly increasing volume of protein datasets.

2. The marine-based West Antarctic Ice Sheet (WAIS) is currently retreating due to shifting wind-driven oceanic currents that transport warm waters toward the ice margin, resulting in ice shelf thinning and accelerated mass loss of the WAIS. Previous results from geologic drilling on Antarctica’s continental margins show significant variability in marine-based ice sheet extent during the late Neogene and Quaternary. Numerical models indicate a fundamental role for oceanic heat in controlling this variability over at least the past 20 My. Although evidence for past ice sheet variability has been collected in marginal settings, sedimentologic sequences from the outer continental shelf are required to evaluate the extent of past ice sheet variability and the associated oceanic forcings and feedbacks. International Ocean Discovery Program Expedition 374 drilled a latitudinal and depth transect of five drill sites from the outer continental shelf to rise in the eastern Ross Sea to resolve the relationship between climatic and oceanic change and WAIS evolution through the Neogene and Quaternary. This location was selected because numerical ice sheet models indicate that this sector of Antarctica is highly sensitive to changes in ocean heat flux. The expedition was designed for optimal data-model integration and will enable an improved understandingmore »