skip to main content


Search for: All records

Creators/Authors contains: "Long, Cheng"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Finding frequent subgraph patterns in a big graph is an important problem with many applications such as classifying chemical compounds and building indexes to speed up graph queries. Since this problem is NP-hard, some recent parallel systems have been developed to accelerate the mining. However, they often have a huge memory cost, very long running time, suboptimal load balancing, and possibly inaccurate results. In this paper, we propose an efficient system called T-FSM for parallel mining of frequent subgraph patterns in a big graph. T-FSM adopts a novel task-based execution engine design to ensure high concurrency, bounded memory consumption, and effective load balancing. It also supports a new anti-monotonic frequentness measure called Fraction-Score, which is more accurate than the widely used MNI measure. Our experiments show that T-FSM is orders of magnitude faster than SOTA systems for frequent subgraph pattern mining. Our system code has been released at https://github.com/lyuheng/T-FSM. 
    more » « less
    Free, publicly-accessible full text available May 26, 2024
  2. Quasi-cliques are a type of dense subgraphs that generalize the notion of cliques, important for applications such as community/module detection in various social and biological networks. However, the existing quasi-clique definition and algorithms are only applicable to undirected graphs. In this paper, we generalize the concept of quasi-cliques to directed graphs by proposing $(\gamma_1, \gamma_2)$-quasi-cliques which have density requirements in both inbound and outbound directions of each vertex in a quasi-clique subgraph. An efficient recursive algorithm is proposed to find maximal $(\gamma_1, \gamma_2)$-quasi-cliques which integrates many effective pruning rules that are validated by ablation studies. We also study the finding of top-$k$ large quasi-cliques directly by bootstrapping the search from more compact quasi-cliques, to scale the mining to larger networks. The algorithms are parallelized with effective load balancing, and we demonstrate that they can scale up effectively with the number of CPU cores. 
    more » « less
  3. Given a data matrix 𝐷, a submatrix 𝑆 of 𝐷 is an order-preserving submatrix (OPSM) if there is a permutation of the columns of 𝑆, under which the entry values of each row in 𝑆 are strictly increasing. OPSM mining is widely used in real-life applications such as identifying coexpressed genes and finding customers with similar preference. However, noise is ubiquitous in real data matrices due to variable experimental conditions and measurement errors, which makes conventional OPSM mining algorithms inapplicable. No previous work on OPSM has ever considered uncertain value intervals using the well-established possible world semantics. We establish two different definitions of significant OPSMs based on the possible world semantics: (1) expected support-based and (2) probabilistic frequentness-based. An optimized dynamic programming approach is proposed to compute the probability that a row supports a particular column permutation, with a closed-form formula derived to efficiently handle the special case of uniform value distribution and an accurate cubic spline approximation approach that works well with any uncertain value distributions. To efficiently check the probabilistic frequentness, several effective pruning rules are designed to efficiently prune insignificant OPSMs; two approximation techniques based on the Poisson and Gaussian distributions, respectively, are proposed for further speedup. These techniques are integrated into our two OPSM mining algorithms, based on prefix-projection and Apriori, respectively. We further parallelize our prefix-projection-based mining algorithm using PrefixFPM, a recently proposed framework for parallel frequent pattern mining, and we achieve a good speedup with the number of CPU cores. Extensive experiments on real microarray data demonstrate that the OPSMs found by our algorithms have a much higher quality than those found by existing approaches. 
    more » « less
  4. null (Ed.)
    In 3D nematic liquid crystals, disclination lines have a range of geometric structures. Locally, they may resemble +1/2 or −1/2 defects in 2D nematic phases, or they may have 3D twist. Here, we analyze the structure in terms of the director deformation modes around the disclination, as well as the nematic order tensor inside the disclination core. Based on this analysis, we construct a vector to represent the orientation of the disclination, as well as tensors to represent higher-order structure. We apply this method to simulations of a 3D disclination arch, and determine how the structure changes along the contour length. We then use this geometric analysis to investigate three types of forces acting on a disclination: Peach–Koehler forces due to external stress, interaction forces between disclination lines, and active forces. These results apply to the motion of disclination lines in both conventional and active liquid crystals. 
    more » « less
  5. Pregel-like systems are popular for iterative graph processing thanks to their user-friendly vertex-centric programming model. However, existing Pregel-like systems only adopt a naïve checkpointing approach for fault tolerance, which saves a large amount of data about the state of computation and signi!cantly degrades the failure-free execution performance. Advanced fault tolerance/recovery techniques are left unexplored in the context of Pregel-like systems. This paper proposes a non-invasive lightweight checkpointing (LWCP) scheme which minimizes the data saved to each checkpoint, and additional data required for recovery are generated online from the saved data. This improvement results in 10x speedup in checkpointing, and an integration of it with a recently proposed log-based recovery approach can further speed up recovery when failure occurs. Extensive experiments veri!ed that our proposed LWCP techniques are able to signi!cantly improve the performance of both checkpointing and recovery in a Pregel-like system. 
    more » « less