skip to main content

Title: Mining Multivariate Discrete Event Sequences for Knowledge Discovery and Anomaly Detection
Modern physical systems deploy large numbers of sensors to record at different time-stamps the status of different systems components via measurements such as temperature, pressure, speed, but also the component's categorical state. Depending on the measurement values, there are two kinds of sequences: continuous and discrete. For continuous sequences, there is a host of state-of-the-art algorithms for anomaly detection based on time-series analysis, but there is a lack of effective methodologies that are tailored specifically to discrete event sequences. This paper proposes an analytics framework for discrete event sequences for knowledge discovery and anomaly detection. During the training phase, the framework extracts pairwise relationships among discrete event sequences using a neural machine translation model by viewing each discrete event sequence as a "natural language". The relationship between sequences is quantified by how well one discrete event sequence is "translated" into another sequence. These pairwise relationships among sequences are aggregated into a multivariate relationship graph that clusters the structural knowledge of the underlying system and essentially discovers the hidden relationships among discrete sequences. This graph quantifies system behavior during normal operation. During testing, if one or more pairwise relationships are violated, an anomaly is detected. The proposed framework is evaluated on more » two real-world datasets: a proprietary dataset collected from a physical plant where it is shown to be effective in extracting sensor pairwise relationships for knowledge discovery and anomaly detection, and a public hard disk drive dataset where its ability to effectively predict upcoming disk failures is illustrated. « less
Authors:
; ; ; ;
Award ID(s):
1838022
Publication Date:
NSF-PAR ID:
10206152
Journal Name:
Proceedings of the 50th IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2020
Volume:
1
Page Range or eLocation-ID:
552 to 563
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    Protein–protein interaction (PPI) is vital for life processes, disease treatment, and drug discovery. The computational prediction of PPI is relatively inexpensive and efficient when compared to traditional wet-lab experiments. Given a new protein, one may wish to find whether the protein has any PPI relationship with other existing proteins. Current computational PPI prediction methods usually compare the new protein to existing proteins one by one in a pairwise manner. This is time consuming.

    Results

    In this work, we propose a more efficient model, called deep hash learning protein-and-protein interaction (DHL-PPI), to predict all-against-all PPI relationships in a database of proteins. First, DHL-PPI encodes a protein sequence into a binary hash code based on deep features extracted from the protein sequences using deep learning techniques. This encoding scheme enables us to turn the PPI discrimination problem into a much simpler searching problem. The binary hash code for a protein sequence can be regarded as a number. Thus, in the pre-screening stage of DHL-PPI, the string matching problem of comparing a protein sequence against a database withMproteins can be transformed into a much more simpler problem: to find a number inside a sorted array of lengthM. This pre-screening process narrows down themore »search to a much smaller set of candidate proteins for further confirmation. As a final step, DHL-PPI uses the Hamming distance to verify the final PPI relationship.

    Conclusions

    The experimental results confirmed that DHL-PPI is feasible and effective. Using a dataset with strictly negative PPI examples of four species, DHL-PPI is shown to be superior or competitive when compared to the other state-of-the-art methods in terms of precision, recall or F1 score. Furthermore, in the prediction stage, the proposed DHL-PPI reduced the time complexity from$$O(M^2)$$O(M2)to$$O(M\log M)$$O(MlogM)for performing an all-against-all PPI prediction for a database withMproteins. With the proposed approach, a protein database can be preprocessed and stored for later search using the proposed encoding scheme. This can provide a more efficient way to cope with the rapidly increasing volume of protein datasets.

    « less
  2. The marine-based West Antarctic Ice Sheet (WAIS) is currently retreating due to shifting wind-driven oceanic currents that transport warm waters toward the ice margin, resulting in ice shelf thinning and accelerated mass loss of the WAIS. Previous results from geologic drilling on Antarctica’s continental margins show significant variability in marine-based ice sheet extent during the late Neogene and Quaternary. Numerical models indicate a fundamental role for oceanic heat in controlling this variability over at least the past 20 My. Although evidence for past ice sheet variability has been collected in marginal settings, sedimentologic sequences from the outer continental shelf are required to evaluate the extent of past ice sheet variability and the associated oceanic forcings and feedbacks. International Ocean Discovery Program Expedition 374 drilled a latitudinal and depth transect of five drill sites from the outer continental shelf to rise in the eastern Ross Sea to resolve the relationship between climatic and oceanic change and WAIS evolution through the Neogene and Quaternary. This location was selected because numerical ice sheet models indicate that this sector of Antarctica is highly sensitive to changes in ocean heat flux. The expedition was designed for optimal data-model integration and will enable an improved understandingmore »of the sensitivity of Antarctic Ice Sheet (AIS) mass balance during warmer-than-present climates (e.g., the Pleistocene “super interglacials,” the mid-Pliocene, and the late early to middle Miocene). The principal goals of Expedition 374 were to • Evaluate the contribution of West Antarctica to far-field ice volume and sea level estimates; • Reconstruct ice-proximal atmospheric and oceanic temperatures to identify past polar amplification and assess its forcings and feedbacks; • Assess the role of oceanic forcing (e.g., sea level and temperature) on AIS stability/instability; • Identify the sensitivity of the AIS to Earth’s orbital configuration under a variety of climate boundary conditions; and • Reconstruct eastern Ross Sea paleobathymetry to examine relationships between seafloor geometry, ice sheet stability/instability, and global climate. To achieve these objectives, we will • Use data and models to reconcile intervals of maximum Neogene and Quaternary Antarctic ice advance with far-field records of eustatic sea level change; • Reconstruct past changes in oceanic and atmospheric temperatures using a multiproxy approach; • Reconstruct Neogene and Quaternary sea ice margin fluctuations in datable marine continental slope and rise records and correlate these records to existing inner continental shelf records; • Examine relationships among WAIS stability/instability, Earth’s orbital configuration, oceanic temperature and circulation, and atmospheric pCO2; and • Constrain the timing of Ross Sea continental shelf overdeepening and assess its impact on Neogene and Quaternary ice dynamics. Expedition 374 was carried out from January to March 2018, departing from Lyttelton, New Zealand. We recovered 1292.70 m of high-quality cores from five sites spanning the early Miocene to late Quaternary. Three sites were cored on the continental shelf (Sites U1521, U1522, and U1523). At Site U1521, we cored a 650 m thick sequence of interbedded diamictite, mudstone, and diatomite, penetrating the Ross Sea seismic Unconformity RSU4. The depositional reconstructions of past glacial and open-marine conditions at this site will provide unprecedented insight into environmental change on the Antarctic continental shelf during the early and middle Miocene. At Site U1522, we cored a discontinuous upper Miocene to Pleistocene sequence of glacial and glaciomarine strata from the outer shelf, with the primary objective to penetrate and date seismic Unconformity RSU3, which is interpreted to represent the first major continental shelf–wide expansion and coalescing of marine-based ice streams from both East and West Antarctica. At Site U1523, we cored a sediment drift located beneath the westerly flowing Antarctic Slope Current (ASC). Cores from this site will provide a record of the changing vigor of the ASC through time. Such a reconstruction will enable testing of the hypothesis that changes in the vigor of the ASC represent a key control on regulating heat flux onto the continental shelf, resulting in the ASC playing a fundamental role in ice sheet mass balance. We also cored two sites on the continental slope and rise. At Site U1524, we cored a Plio–Pleistocene sedimentary sequence on the continental rise on the levee of the Hillary Canyon, which is one of the largest conduits of Antarctic Bottom Water delivery from the Antarctic continental shelf into the abyssal ocean. Drilling at Site U1524 was intended to penetrate into middle Miocene and older strata but was initially interrupted by drifting sea ice that forced us to abandon coring in Hole U1524A at 399.5 m drilling depth below seafloor (DSF). We moved to a nearby alternate site on the continental slope (U1525) to core a single hole with a record complementary to the upper part of the section recovered at Site U1524. We returned to Site U1524 3 days later, after the sea ice cleared. We then cored Hole U1524C with the rotary core barrel with the intention of reaching the target depth of 1000 m DSF. However, we were forced to terminate Hole U1524C at 441.9 m DSF due to a mechanical failure with the vessel that resulted in termination of all drilling operations and a return to Lyttelton 16 days earlier than scheduled. The loss of 39% of our operational days significantly impacted our ability to achieve all Expedition 374 objectives as originally planned. In particular, we were not able to obtain the deeper time record of the middle Miocene on the continental rise or abyssal sequences that would have provided a continuous and contemporaneous archive to the high-quality (but discontinuous) record from Site U1521 on the continental shelf. The mechanical failure also meant we could not recover sediment cores from proposed Site RSCR-19A, which was targeted to obtain a high-fidelity, continuous record of upper Neogene and Quaternary pelagic/hemipelagic sedimentation. Despite our failure to recover a shelf-to-rise transect for the Miocene, a continental shelf-to-rise transect for the Pliocene to Pleistocene interval is possible through comparison of the high-quality records from Site U1522 with those from Site U1525 and legacy cores from the Antarctic Geological Drilling Project (ANDRILL).« less
  3. In successful enterprise attacks, adversaries often need to gain access to additional machines beyond their initial point of compromise, a set of internal movements known as lateral movement. We present Hopper, a system for detecting lateral movement based on commonly available enterprise logs. Hopper constructs a graph of login activity among internal machines and then identifies suspicious sequences of logins that correspond to lateral movement. To understand the larger context of each login, Hopper employs an inference algorithm to identify the broader path(s) of movement that each login belongs to and the causal user responsible for performing a path's logins. Hopper then leverages this path inference algorithm, in conjunction with a set of detection rules and a new anomaly scoring algorithm, to surface the login paths most likely to reflect lateral movement. On a 15-month enterprise dataset consisting of over 780 million internal logins, Hopper achieves a 94.5% detection rate across over 300 realistic attack scenarios, including one red team attack, while generating an average of < 9 alerts per day. In contrast, to detect the same number of attacks, prior state-of-the-art systems would need to generate nearly 8× as many false positives.
  4. The marine-based West Antarctic Ice Sheet (WAIS) is currently locally retreating because of shifting wind-driven oceanic currents that transport warm waters toward the ice margin, resulting in ice shelf thinning and accelerated mass loss. Previous results from geologic drilling on Antarctica’s continental margins show significant variability in ice sheet extent during the late Neogene and Quaternary. Climate and ice sheet models indicate a fundamental role for oceanic heat in controlling ice sheet variability over at least the past 20 My. Although evidence for past ice sheet variability is available from ice-proximal marine settings, sedimentary sequences from the continental shelf and rise are required to evaluate the extent of past ice sheet variability and the associated forcings and feedbacks. International Ocean Discovery Program Expedition 374 drilled a latitudinal and depth transect of five sites from the outer continental shelf to rise in the central Ross Sea to resolve Neogene and Quaternary relationships between climatic and oceanic change and WAIS evolution. The Ross Sea was targeted because numerical ice sheet models indicate that this sector of Antarctica responds sensitively to changes in ocean heat flux. Expedition 374 was designed for optimal data-model integration to enable an improved understanding of Antarctic Ice Sheetmore »(AIS) mass balance during warmer-than-present climates (e.g., the Pleistocene “super interglacials,” the mid-Pliocene, and the Miocene Climatic Optimum). The principal goals of Expedition 374 were to: 1. Evaluate the contribution of West Antarctica to far-field ice volume and sea level estimates; 2. Reconstruct ice-proximal oceanic and atmospheric temperatures to quantify past polar amplification; 3. Assess the role of oceanic forcing (e.g., temperature and sea level) on AIS variability; 4. Identify the sensitivity of the AIS to Earth’s orbital configuration under a variety of climate boundary conditions; and 5. Reconstruct Ross Sea paleobathymetry to examine relationships between seafloor geometry, ice sheet variability, and global climate. To achieve these objectives, postcruise studies will: 1. Use data and models to reconcile intervals of maximum Neogene and Quaternary ice advance and retreat with far-field records of eustatic sea level; 2. Reconstruct past changes in oceanic and atmospheric temperatures using a multiproxy approach; 3. Reconstruct Neogene and Quaternary sea ice margin fluctuations and correlate these records to existing inner continental shelf records; 4. Examine relationships among WAIS variability, Earth’s orbital configuration, oceanic temperature and circulation, and atmospheric pCO2; and 5. Constrain the timing of Ross Sea continental shelf overdeepening and assess its impact on Neogene and Quaternary ice dynamics. Expedition 374 departed from Lyttelton, New Zealand, in January 2018 and returned in March 2018. We recovered 1292.70 m of high-quality core from five sites spanning the early Miocene to late Quaternary. Three sites were cored on the continental shelf (Sites U1521, U1522, and U1523). At Site U1521, we cored a 650 m thick sequence of interbedded diamictite and diatom-rich mudstone penetrating seismic Ross Sea Unconformity 4 (RSU4). The depositional reconstructions of past glacial and open-marine conditions at this site will provide unprecedented insight into environmental change on the Antarctic continental shelf during the late early and middle Miocene. At Site U1522, we cored a discontinuous late Miocene to Pleistocene sequence of glacial and glaciomarine strata from the outer shelf with the primary objective of penetrating and dating RSU3, which is interpreted to reflect the first continental shelf–wide expansion of East and West Antarctic ice streams. Site U1523, located on the outer continental shelf, targeted a sediment drift beneath the westward-flowing Antarctic Slope Current (ASC) to test the hypothesis that changes in ASC vigor regulate ocean heat flux onto the continental shelf and thus ice sheet mass balance. We also cored two sites on the continental rise and slope. At Site U1524, we recovered a Plio–Pleistocene sedimentary sequence from the levee of the Hillary Canyon, one of the largest conduits of Antarctic Bottom Water from the continental shelf to the abyssal ocean. Site U1524 was designed to penetrate into middle Miocene and older strata, but coring was initially interrupted by drifting sea ice that forced us to abandon coring in Hole U1524A at 399.5 m drilling depth below seafloor (DSF). We moved to a nearby alternate site on the continental slope (Site U1525) to core a single hole designed to complement the record at Site U1524. We returned to Site U1524 after the sea ice cleared and cored Hole U1524C with the rotary core barrel system with the intention of reaching the target depth of 1000 m DSF. However, we were forced to terminate Hole U1524C at 441.9 m DSF because of a mechanical failure with the vessel that resulted in termination of all drilling operations and forced us to return to Lyttelton 16 days earlier than scheduled. The loss of 39% of our operational days significantly impacted our ability to achieve all Expedition 374 objectives. In particular, we were not able to recover continuous middle Miocene sequences from the continental rise designed to complement the discontinuous record from continental shelf Site U1521. The mechanical failure also meant we could not recover cores from proposed Site RSCR-19A, which was targeted to obtain a high-fidelity, continuous record of upper Neogene and Quaternary pelagic/hemipelagic sedimentation. Despite our failure to recover a continental shelf-to-rise Miocene transect, records from Sites U1522, U1524, and U1525 and legacy cores from the Antarctic Geological Drilling Project (ANDRILL) can be integrated to develop a shelf-to-rise Plio–Pleistocene transect.« less
  5. International Ocean Discovery Program (IODP) Expedition 357 successfully cored an east–west transect across the southern wall of Atlantis Massif on the western flank of the Mid-Atlantic Ridge (MAR) to study the links between serpentinization processes and microbial activity in the shallow subsurface of highly altered ultramafic and mafic sequences that have been uplifted to the seafloor along a major detachment fault zone. The primary goals of this expedition were to (1) examine the role of serpentinization in driving hydrothermal systems, sustaining microbial communities, and sequestering carbon; (2) characterize the tectonomagmatic processes that lead to lithospheric heterogeneities and detachment faulting; and (3) assess how abiotic and biotic processes change with variations in rock type and progressive exposure on the seafloor. To accomplish these objectives, we developed a coring and sampling strategy centered on the use of seabed drills—the first time that such systems have been used in the scientific ocean drilling programs. This technology was chosen in the hope of achieving high recovery of the carbonate cap sequences and intact contact and deformation relationships. The expedition plans also included several engineering developments to assess geochemical parameters during drilling; sample bottom water before, during, and after drilling; supply synthetic tracers during drillingmore »for contamination assessment; acquire in situ electrical resistivity and magnetic susceptibility measurements for assessing fractures, fluid flow, and extent of serpentinization; and seal boreholes to provide opportunities for future experiments. Expedition 359 was designed to address changes in sea level and currents, along with monsoon evolution in the Indian Ocean. The Maldives archipelago holds a unique and mostly unread Indian Ocean archive of the evolving Cenozoic icehouse world. Cores from eight drill sites in the Inner Sea of the Maldives provide the tropical marine record that is key for better understanding the effects of this global evolution in the Indo-Pacific realm. In addition, the bank geometries of the carbonate archipelago provide a physical record of changing sea level and ocean currents. The bank growth occurs in pulses of aggradation and progradation that are controlled by sea level fluctuations during the early and middle Miocene, including the mid-Miocene Climate Optimum. A dramatic shift in development of the carbonate edifice from a sea level–controlled to a predominantly current-controlled system appears to be directly linked to the evolving Indian monsoon. This phase led to a twofold configuration of bank development: bank growth continued in some parts of the edifice, whereas in other places, banks drowned. Drowning steps seem to coincide with onset and intensification of the monsoon-related current system and subsequent deposition of contourite fans and large-scale sediment drifts. As such, the drift deposits will provide a continuous record of Indian monsoon development in the region of the Maldives. A major focus of Expedition 359 was to date precisely the onset of the current system. This goal was successfully completed during the expedition. The second important outcome of Expedition 359 was groundtruthing the hypothesis that the dramatic, pronounced change in style of the carbonate platform sequence stacking was caused by a combination of relative sea level fluctuations and ocean current system changes. These questions are directly addressed by the shipboard scientific data. In addition, Expedition 359 cores will provide a complete Neogene δ13C record of the platform and platform margin sediments and a comparison with pelagic records over the same time period. This comparison will allow assessment of the extent to which platform carbonates record changes in the global carbon cycle and whether changes in the carbon isotopic composition of organic and inorganic components covary and the implications this has on the deep-time record. This determination is important because such records are the only type that exists in deep time.« less