skip to main content


Title: Using Event Log Timing Information to Assist Process Scenario Discoveries
Event logs contain abundant information, such as activity names, time stamps, activity executors, etc. However, much of existing trace clustering research has been focused on applying activity names to assist process scenarios discovery. In addition, many existing trace clustering algorithms commonly used in the literature, such as k-means clustering approach, require prior knowledge about the number of process scenarios existed in the log, which sometimes are not known aprior. This paper presents a two-phase approach that obtains timing information from event logs and uses the information to assist process scenario discoveries without requiring any prior knowledge about process scenarios. We use five real-life event logs to compare the performance of the proposed two-phase approach for process scenario discoveries with the commonly used k-means clustering approach in terms of model’s harmonic mean of the weighted average fitness and precision, i.e., the F1 score. The experiment data shows that (1) the process scenario models obtained with the additional timing information have both higher fitness and precision scores than the models obtained without the timing information; (2) the two-phase approach not only removes the need for prior information related to k, but also results in a comparable F1 score compared to the optimal k-means approach with the optimal k obtained through exhaustive search.  more » « less
Award ID(s):
1952247 1952225
NSF-PAR ID:
10311280
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2020 IEEE Third International Conference on Artificial Intelligence and Knowledge Engineering (AIKE)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Process mining is a technique for extracting process models from event logs. Event logs contain abundant information related to an event such as the timestamp of the event, the actions that triggers the event, etc. Much of existing process mining research has been focused on discoveries of process models behind event logs. How to uncover the timing constraints from event logs that are associated with the discovered process models is not well-studied. In this paper, we present an approach that extends existing process mining techniques to not only mine but also integrate timing constraints with process models discovered and constructed by existing process mining algorithms. The approach contains three major steps, i.e., first, for a given process model constructed by an existing process mining algorithm and represented as a workflow net, extract a time dependent set for each transition in the workflow net model. Second, based on the time dependent sets, develop an algorithm to extract timing constraints from event logs for each transition in the model. Third, extend the original workflow net into a time Petri net where the discovered timing constraints are associated with their corresponding transitions. A real-life road traffic fine management process scenario is used as a case study to show how timing constraints in the fine management process can be discovered from event logs with our approach. 
    more » « less
  2. In this paper, we present a multiple concurrent occupant identification approach through footstep-induced floor vibration sensing. Identification of human occupants is useful in a variety of indoor smart structure scenarios, with applications in building security, space allocation, and healthcare. Existing approaches leverage sensing modalities such as vision, acoustic, RF, and wearables, but are limited due to deployment constraints such as line-of-sight requirements, sensitivity to noise, dense sensor deployment, and requiring each walker to wear/carry a device. To overcome these restrictions, we use footstep-induced structural vibration sensing. Footstep-induced signals contain information about the occupants' unique gait characteristics, and propagate through the structural medium, which enables sparse and passive identification of indoor occupants. The primary research challenge is that multiple-person footstep-induced vibration responses are a mixture of structurally-codependent overlapping individual responses with unknown timing, spectral content, and mixing ratios. As such, it is difficult to determine which part of the signal corresponds to each occupant. We overcome this challenge through a recursive sparse representation approach based on cosine distance that identifies each occupant in a footstep event in the order that their signals are generated, reconstructs their portion of the signal, and removes it from the mixed response. By leveraging sparse representation, our approach can simultaneously identify and separate mixed/overlapping responses, and the use of the cosine distance error function reduces the influence of structural codependency on the multiple walkers' signals. In this way, we isolate and identify each of the multiple occupants' footstep responses. We evaluate our approach by conducting real-world walking experiments with three concurrent walkers and achieve an average F1 score for identifying all persons of 0.89 (1.3x baseline improvement), and with a 10-person "hybrid" dataset (simulated combination of single-walker real-world data), we identify 2, 3, and 4 concurrent walkers with a trace-level accuracy of 100%, 93%, and 73%, respectively, and observe as much as a 2.9x error reduction over a naive baseline approach. 
    more » « less
  3. Villazón-Terrazas, B. (Ed.)
    Given the ubiquity of unstructured biomedical data, significant obstacles still remain in achieving accurate and fast access to online biomedical content. Accompanying semantic annotations with a growing volume biomedical content on the internet is critical to enhancing search engines’ context-aware indexing, improving search speed and retrieval accuracy. We propose a novel methodology for annotation recommendation in the biomedical content authoring environment by introducing the socio-technical approach where users can get recommendations from each other for accurate and high quality semantic annotations. We performed experiments to record the system level performance with and without socio-technical features in three scenarios of different context to evaluate the proposed socio-technical approach. At a system level, we achieved 89.98% precision, 89.61% recall, and an 89.45% F1-score for semantic annotation recollection. Similarly, a high accuracy of 90% is achieved with the socio-technical approach compared to without, which obtains 73% accuracy. However almost equable precision, recall, and F1- score of 90% is gained by scenario-1 and scenario-2, whereas scenario-3 achieved relatively less precision, recall and F1-score of 88%. We conclude that our proposed socio-technical approach produces proficient annotation recommendations that could be helpful for various uses ranging from context-aware indexing to retrieval accuracy. 
    more » « less
  4. null (Ed.)
    By modelling how the probability distributions of individuals’ states evolve as new information flows through a network, belief propagation has broad applicability ranging from image correction to virus propagation to even social networks. Yet, its scant implementations confine themselves largely to the realm of small Bayesian networks. Applications of the algorithm to graphs of large scale are thus unfortunately out of reach. To promote its broad acceptance, we enable belief propagation for both small and large scale graphs utilizing GPU processing. We therefore explore a host of optimizations including a new simple yet extensible input format enabling belief propagation to operate at massive scale, along with significant workload processing updates and meticulous memory management to enable our implementation to outperform prior works in terms of raw execution time and input size on a single machine. Utilizing a suite of parallelization technologies and techniques against a diverse set of graphs, we demonstrate that our implementations can efficiently process even massive networks, achieving up to nearly 121x speedups versus our control yet optimized single threaded implementations while supporting graphs of over ten million nodes in size in contrast to previous works’ support for thousands of nodes using CPU-based multi-core and host solutions. To assist in choosing the optimal implementation for a given graph, we provide a promising method utilizing a random forest classifier and graph metadata with a nearly 95% F1-score from our initial benchmarking and is portable to different GPU architectures to achieve over an F1-score of over 72% accuracy and a speedup of nearly 183x versus our control running in this new environment. 
    more » « less
  5. Process Mining is a technique for extracting process models from event logs. Event logs contain abundant explicit information related to events, such as the timestamp and the actions that trigger the event. Much of the existing process mining research has focused on discovering the process models behind these event logs. However, Process Mining relies on the assumption that these event logs contain accurate representations of an ideal set of processes. These ideal sets of processes imply that the information contained within the log represents what is really happening in a given environment. However, many of these event logs might contain noisy, infrequent, missing, or false process information that is generally classified as outliers. Extending beyond process discovery, there are many research efforts towards cleaning the event logs to deal with these outliers. In this paper, we present an approach that uses hidden Markov models to filter out outliers from event logs prior to applying any process discovery algorithms. Our proposed filtering approach can detect outlier behavior, and consequently, help process discovery algorithms return models that better reflect the real processes within an organization. Furthermore, we show that this filtering method outperforms two commonly used filtering approaches, namely the Matrix Filter approach and the Anomaly Free Automation approach for both artificial event logs and real-life event logs. 
    more » « less