skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, December 13 until 2:00 AM ET on Saturday, December 14 due to maintenance. We apologize for the inconvenience.


Title: Using Event Log Timing Information to Assist Process Scenario Discoveries
Event logs contain abundant information, such as activity names, time stamps, activity executors, etc. However, much of existing trace clustering research has been focused on applying activity names to assist process scenarios discovery. In addition, many existing trace clustering algorithms commonly used in the literature, such as k-means clustering approach, require prior knowledge about the number of process scenarios existed in the log, which sometimes are not known aprior. This paper presents a two-phase approach that obtains timing information from event logs and uses the information to assist process scenario discoveries without requiring any prior knowledge about process scenarios. We use five real-life event logs to compare the performance of the proposed two-phase approach for process scenario discoveries with the commonly used k-means clustering approach in terms of model’s harmonic mean of the weighted average fitness and precision, i.e., the F1 score. The experiment data shows that (1) the process scenario models obtained with the additional timing information have both higher fitness and precision scores than the models obtained without the timing information; (2) the two-phase approach not only removes the need for prior information related to k, but also results in a comparable F1 score compared to the optimal k-means approach with the optimal k obtained through exhaustive search.  more » « less
Award ID(s):
1952247 1952225
PAR ID:
10311280
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2020 IEEE Third International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Process mining is a technique for extracting process models from event logs. Event logs contain abundant information related to an event such as the timestamp of the event, the actions that triggers the event, etc. Much of existing process mining research has been focused on discoveries of process models behind event logs. How to uncover the timing constraints from event logs that are associated with the discovered process models is not well-studied. In this paper, we present an approach that extends existing process mining techniques to not only mine but also integrate timing constraints with process models discovered and constructed by existing process mining algorithms. The approach contains three major steps, i.e., first, for a given process model constructed by an existing process mining algorithm and represented as a workflow net, extract a time dependent set for each transition in the workflow net model. Second, based on the time dependent sets, develop an algorithm to extract timing constraints from event logs for each transition in the model. Third, extend the original workflow net into a time Petri net where the discovered timing constraints are associated with their corresponding transitions. A real-life road traffic fine management process scenario is used as a case study to show how timing constraints in the fine management process can be discovered from event logs with our approach. 
    more » « less
  2. Process Mining is a technique for extracting process models from event logs. Event logs contain abundant explicit information related to events, such as the timestamp and the actions that trigger the event. Much of the existing process mining research has focused on discovering the process models behind these event logs. However, Process Mining relies on the assumption that these event logs contain accurate representations of an ideal set of processes. These ideal sets of processes imply that the information contained within the log represents what is really happening in a given environment. However, many of these event logs might contain noisy, infrequent, missing, or false process information that is generally classified as outliers. Extending beyond process discovery, there are many research efforts towards cleaning the event logs to deal with these outliers. In this paper, we present an approach that uses hidden Markov models to filter out outliers from event logs prior to applying any process discovery algorithms. Our proposed filtering approach can detect outlier behavior, and consequently, help process discovery algorithms return models that better reflect the real processes within an organization. Furthermore, we show that this filtering method outperforms two commonly used filtering approaches, namely the Matrix Filter approach and the Anomaly Free Automation approach for both artificial event logs and real-life event logs. 
    more » « less
  3. Villazón-Terrazas, B. (Ed.)
    Given the ubiquity of unstructured biomedical data, significant obstacles still remain in achieving accurate and fast access to online biomedical content. Accompanying semantic annotations with a growing volume biomedical content on the internet is critical to enhancing search engines’ context-aware indexing, improving search speed and retrieval accuracy. We propose a novel methodology for annotation recommendation in the biomedical content authoring environment by introducing the socio-technical approach where users can get recommendations from each other for accurate and high quality semantic annotations. We performed experiments to record the system level performance with and without socio-technical features in three scenarios of different context to evaluate the proposed socio-technical approach. At a system level, we achieved 89.98% precision, 89.61% recall, and an 89.45% F1-score for semantic annotation recollection. Similarly, a high accuracy of 90% is achieved with the socio-technical approach compared to without, which obtains 73% accuracy. However almost equable precision, recall, and F1- score of 90% is gained by scenario-1 and scenario-2, whereas scenario-3 achieved relatively less precision, recall and F1-score of 88%. We conclude that our proposed socio-technical approach produces proficient annotation recommendations that could be helpful for various uses ranging from context-aware indexing to retrieval accuracy. 
    more » « less
  4. Deep neural network clustering is superior to the conventional clustering methods due to deep feature extraction and nonlinear dimensionality reduction. Nevertheless, deep neural network leads to a rough representation regarding the inherent relationship of the data points. Therefore, it is still difficult for deep neural network to exploit the effective structure for direct clustering. To address this issue,we propose a robust embedded deep K-means clustering (REDKC) method. The proposed RED-KC approach utilizes the δ-norm metric to constrain the feature mapping process of the auto-encoder network, so that data are mapped to a latent feature space, which is more conducive to the robust clustering. Compared to the existing auto-encoder networks with the fixed prior, the proposed RED-KC is adaptive during the process of feature mapping. More importantly, the proposed RED-KC embeds the clustering process with the autoencoder network, such that deep feature extraction and clustering can be performed simultaneously. Accordingly, a direct and efficient clustering could be obtained within only one step to avoid the inconvenience of multiple separate stages, namely, losing pivotal information and correlation. Consequently, extensive experiments are provided to validate the effectiveness of the proposed approach. 
    more » « less
  5. Radio channel propagation models for the millimeter wave (mmWave) spectrum are extremely important for planning future 5G wireless communication systems. Transmitted radio signals are received as clusters of multipath rays. Identifying these clusters provides better spatial and temporal characteristics of the mmWave channel. This paper deals with the clustering process and its validation across a wide range of frequencies in the mmWave spectrum below 100 GHz. By way of simulations, we show that in outdoor communication scenarios clustering of received rays is influenced by the frequency of the transmitted signal. This demonstrates the sparse characteristic of the mmWave spectrum (i.e., we obtain a lower number of rays at the receiver for the same urban scenario). We use the well-known k-means clustering algorithm to group arriving rays at the receiver. The accuracy of this partitioning is studied with both cluster validity indices (CVIs) and score fusion techniques. Finally, we analyze how the clustering solution changes with narrower-beam antennas, and we provide a comparison of the cluster characteristics for different types of antennas. 
    more » « less