skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: NTP-Miner: Nonoverlapping Three-Way Sequential Pattern Mining
Nonoverlapping sequential pattern mining is an important type of sequential pattern mining (SPM) with gap constraints, which not only can reveal interesting patterns to users but also can effectively reduce the search space using the Apriori (anti-monotonicity) property. However, the existing algorithms do not focus on attributes of interest to users, meaning that existing methods may discover many frequent patterns that are redundant. To solve this problem, this article proposes a task called nonoverlapping three-way sequential pattern (NTP) mining, where attributes are categorized according to three levels of interest: strong, medium, and weak interest. NTP mining can effectively avoid mining redundant patterns since the NTPs are composed of strong and medium interest items. Moreover, NTPs can avoid serious deviations (the occurrence is significantly different from its pattern) since gap constraints cannot match with strong interest patterns. To mine NTPs, an effective algorithm is put forward, called NTP-Miner, which applies two main steps: support (frequency occurrence) calculation and candidate pattern generation. To calculate the support of an NTP, depth-first and backtracking strategies are adopted, which do not require creating a whole Nettree structure, meaning that many redundant nodes and parent–child relationships do not need to be created. Hence, time and space efficiency is improved. To generate candidate patterns while reducing their number, NTP-Miner employs a pattern join strategy and only mines patterns of strong and medium interest. Experimental results on stock market and protein datasets show that NTP-Miner not only is more efficient than other competitive approaches but can also help users find more valuable patterns. More importantly, NTP mining has achieved better performance than other competitive methods in clustering tasks. Algorithms and data are available at: https://github.com/wuc567/Pattern-Mining/tree/master/NTP-Miner .  more » « less
Award ID(s):
1763452 1828181
NSF-PAR ID:
10314147
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
ACM Transactions on Knowledge Discovery from Data
Volume:
16
Issue:
3
ISSN:
1556-4681
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Risk patterns are crucial in biomedical research and have served as an important factor in precision health and disease prevention. Despite recent development in parallel and high-performance computing, existing risk pattern mining methods still struggle with problems caused by large-scale datasets, such as redundant candidate generation, inability to discover long significant patterns, and prolonged post pattern filtering. In this article, we propose a novel dynamic tree structure, Risk Hierarchical Pattern Tree (RHPTree), and a top-down search method, RHPSearch, which are capable of efficiently analyzing a large volume of data and overcoming the limitations of previous works. The dynamic nature of the RHPTree avoids costly tree reconstruction for the iterative search process and dataset updates. We also introduce two specialized search methods, the extended target search (RHPSearch-TS) and the parallel search approach (RHPSearch-SD), to further speed up the retrieval of certain items of interest. Experiments on both UCI machine learning datasets and sampled datasets of the Simons Foundation Autism Research Initiative (SFARI)—Simon’s Simplex Collection (SSC) datasets demonstrate that our method is not only faster but also more effective in identifying comprehensive long risk patterns than existing works. Moreover, the proposed new tree structure is generic and applicable to other pattern mining problems. 
    more » « less
  2. null (Ed.)
    Modern Internet of Things ( IoT ) applications generate massive amounts of time-stamped data, much of it in the form of discrete, symbolic sequences. In this work, we present a new system called TOP that deTects Outlier Patterns from these sequences. To solve the fundamental limitation of existing pattern mining semantics that miss outlier patterns hidden inside of larger frequent patterns, TOP offers new pattern semantics based on contextual patterns that distinguish the independent occurrence of a pattern from its occurrence as part of its super-pattern. We present efficient algorithms for the mining of this new class of contextual patterns. In particular, in contrast to the bottom-up strategy for state-of-the-art pattern mining techniques, our top-down Reduce strategy piggy backs pattern detection with the detection of the context in which a pattern occurs. Our approach achieves linear time complexity in the length of the input sequence. Effective optimization techniques such as context-driven search space pruning and inverted index-based outlier pattern detection are also proposed to further speed up contextual pattern mining. Our experimental evaluation demonstrates the effectiveness of TOP at capturing meaningful outlier patterns in several real-world IoT use cases. We also demonstrate the efficiency of TOP, showing it to be up to 2 orders of magnitude faster than adapting state-of-the-art mining to produce this new class of contextual outlier patterns, allowing us to scale outlier pattern mining to large sequence datasets. 
    more » « less
  3. In blockchain and cryptocurrency, miners participate in a proof-of-work-based distributed consensus protocol to find and generate a valid block, process transactions, and earn the corresponding reward. Because cryptocurrency is designed to adapt to the dynamic miner network size, a miner's participation affects the block difficulty which sets the expected amount of work to find a valid block. We study the dependency between the mining power control and the block difficulty and study a rational miner utilizing such dependency to dynamically control its mining power over a longer horizon than just the impending block. More specifically, we introduce I-O Mining strategy where a miner takes advantage of the block difficulty adjustment rule and toggles between mining with full power and power off between the difficulty adjustments. In I-O Mining, the miner influences the block difficulty and mines only when the difficulty is low, gaming and violating the design integrity of the mining protocol for its profit gain. We analyze the I-O Mining's incentive/profit gain over the static-mining strategies and its negative impact on the rest of the blockchain mining network in the block/transaction scalability. Our results show that I-O Mining becomes even more effective and profitable as there are greater competitions for mining and the reward and the cost difference becomes smaller, which are the trends in cryptocurrencies. 
    more » « less
  4. Abstract

    A major goal of community ecology is to identify the patterns of species associations and the processes that shape them. Arboreal ants are extremely diverse and abundant, making them an interesting and valuable group for tackling this issue. Numerous studies have used observational data of species co‐occurrence patterns to infer underlying assembly processes, but the complexity of these communities has resulted in few solid conclusions. This study takes advantage of an observational dataset that is unusually well‐structured with respect to habitat attributes (tree species, tree sizes, and vegetation structure), to disentangle different factors influencing community organization. In particular, this study assesses the potential role of interspecific competition and habitat selection on the distribution patterns of an arboreal ant community by incorporating habitat attributes into the co‐occurrence analyses. These findings are then contrasted against species traits, to explore functional explanations for the identified community patterns. We ran a suite of null models, first accounting only for the species incidence in the community and later incorporating habitat attributes in the null models. We performed analyses with all the species in the community and then with only the most common species using both a matrix‐level approach and a pairwise‐level approach. The co‐occurrence patterns did not differ from randomness in the matrix‐level approach accounting for all ant species in the community. However, a segregated pattern was detected for the most common ant species. Moreover, with the pairwise approach, we found a significant number of negative and positive pairs of species associations. Most of the segregated associations appear to be explained by competitive interactions between species, not habitat affiliations. This was supported by comparisons of species traits for significantly associated pairs. These results suggest that competition is the most important influence on the distribution patterns of arboreal ants within the focal community. Habitat attributes, in contrast, showed no significant influence on the matrix‐wide results and affected only a few associations. In addition, the segregated pairs shared more biological characteristic in common than the aggregated and random ones.

     
    more » « less
  5. Mining algorithms for relationship-based access control policies produce policies composed of relationship-based patterns that justify the input authorizations according to a given system graph. The correct functioning of a policy mining algorithm is typically tested based on experimental evaluations, in each of which the miner is presented with a set of authorizations and a system graph, and is expected to produce the corresponding ground truth policy. In this paper, we propose formal properties that must exist between the system graph and the ground truth policy in an evaluation test so that the miner is challenged to produce the exact ground truth policy. We show that failure to verify these properties in the experiment leads to inadequate evaluation, i.e., not truly testing whether the miner can handle the complexity of the ground truth policy. We also argue that following these properties would provide a computational advantage in the evaluations. We propose algorithms to identify and correct violations of these properties in system graphs. We also present our observations regarding these properties and their enforcement using a set of experimental studies. 
    more » « less