skip to main content


This content will become publicly available on May 13, 2025

Title: Top-L Most Influential Community Detection Over Social Networks
In many real-world applications such as social network analysis and online marketing/advertising, community detection is a fundamental task to identify communities (subgraphs) in social networks with high structural cohesiveness. While previous works focus on detecting communities alone, they do not consider the collective influences of users in these communities on other user nodes in social networks. Inspired by this, in this paper, we investigate the influence propagation from some seed communities and their influential effects that result in the influenced communities. We propose a novel problem, named Top-L most Influential Community DEtection (TopL-ICDE) over social networks, which aims to retrieve top-L seed communities with the highest influences, having high structural cohesiveness, and containing user-specified query keywords. To efficiently tackle the TopL-ICDE problem, we design effective pruning strategies to filter out false alarms of seed communities and propose an effective index mechanism to facilitate efficient Top-L community retrieval. We develop an efficient TopL-ICDE answering algorithm by traversing the index and applying our proposed pruning strategies. We also formulate and tackle a variant of TopL-ICDE, named diversified top-L most influential community detection (DTopL-ICDE), which returns a set of L diversified communities with the highest diversity score (i.e., collaborative influences by L communities). We prove that DTopL-ICDE is NP-hard, and propose an efficient greedy algorithm with our designed diversity score pruning. Through extensive experiments, we verify the efficiency and effectiveness of our proposed TopL-ICDE and DTopL-ICDE approaches over real/synthetic social networks under various parameter settings.  more » « less
Award ID(s):
2217104
PAR ID:
10533500
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE
Date Published:
Subject(s) / Keyword(s):
Top-L Most Influential Community Detection Diversified Top-L Most Influential Community Detection
Format(s):
Medium: X
Location:
Utrecht, Netherlands
Sponsoring Org:
National Science Foundation
More Like this
  1. Decision-making on networks can be explained by both homophily and social influences. While homophily drives the formation of communities with similar characteristics, social influences occur both within and between communities. Social influences can be reasoned through role theory, which indicates that the influences among individuals depending on their roles and the behavior of interest. To operationalize these social science theories, we empirically identify the homophilous communities and use the community structures to capture such “roles”, affecting particular decision-making processes. We propose a generative model named the Stochastic Block influences Model and jointly analyzed both network formation and behavioral influences within and between different empirically-identified communities. To evaluate the performance and demonstrate the interpretability of our method, we study the adoption decisions for a microfinance product in Indian villages. We show that although individuals tend to form links within communities, there are strongly positive and negative social influences between communities, supporting the weak ties theory. Moreover, communities with shared characteristics are associated with positive influences. In contrast, communities that do not overlap are associated with negative influences. Our framework facilitates the quantification of the influences underlying decision communities and is thus a helpful tool for driving information diffusion, viral marketing, and technology adoption. 
    more » « less
  2. In many real-world applications such as social network analysis and online advertising/marketing, one of the most important and popular problems is called influence maximization (IM), which finds a set of k seed users that maximize the expected number of influenced user nodes. In practice, however, maximizing the number of influenced nodes may be far from satisfactory for real applications such as opinion promotion and collective buying. In this paper, we explore the importance of stability and triangles in social networks, and formulate a novel problem in the influence spread scenario, named triangular stability maximization , over social networks, and generalize it to a general triangle influence maximization problem, which is proved to be NP-hard. We develop an efficient reverse influence sampling (RIS) based framework for the triangle IM with theoretical guarantees. To enable unbiased estimators, it demands probabilistic sampling of triangles, that is, sampling triangles according to their probabilities. We propose an edge-based triple sampling approach, which is exactly equivalent to probabilistic sampling and avoids costly triangle enumeration and materialization. We also design several pruning and reduction techniques, as well as a cost-model-guided heuristic algorithm. Extensive experiments and a case study over real-world graphs confirm the effectiveness of our proposed algorithms and the superiority of triangular stability maximization and triangle influence maximization. 
    more » « less
  3. Illicit drug trafficking via social media sites such as Instagram have become a severe problem, thus drawing a great deal of attention from law enforcement and public health agencies. How to identify illicit drug dealers from social media data has remained a technical challenge for the following reasons. On the one hand, the available data are limited because of privacy concerns with crawling social media sites; on the other hand, the diversity of drug dealing patterns makes it difficult to reliably distinguish drug dealers from common drug users. Unlike existing methods that focus on posting-based detection, we propose to tackle the problem of illicit drug dealer identification by constructing a large-scale multimodal dataset named Identifying Drug Dealers on Instagram (IDDIG). Nearly 4,000 user accounts, of which more than 1,400 are drug dealers, have been collected from Instagram with multiple data sources including post comments, post images, homepage bio, and homepage images. We then design a quadruple-based multimodal fusion method to combine the multiple data sources associated with each user account for drug dealer identification. Experimental results on the constructed IDDIG dataset demonstrate the effectiveness of the proposed method in identifying drug dealers (almost 95% accuracy). Moreover, we have developed a hashtag-based community detection technique for discovering evolving patterns, especially those related to geography and drug types. 
    more » « less
  4. We consider the community search problem defined upon a large graph G: given a query vertex q in G, to find as output all the densely connected subgraphs of G, each of which contains the query v. As an online, query-dependent variant of the well-known community detection problem, community search enables personalized community discovery that has found widely varying applications in real-world, large-scale graphs. In this paper, we study the community search problem in the truss-based model aimed at discovering all dense and cohesive k-truss communities to which the query vertex q belongs. We introduce a novel equivalence relation, k-truss equivalence, to model the intrinsic density and cohesiveness of edges in k-truss communities. Consequently, all the edges of G can be partitioned to a series of k-truss equivalence classes that constitute a space-efficient, truss-preserving index structure, EquiTruss. Community search can be henceforth addressed directly upon EquiTruss without repeated, time-demanding accesses to the original graph, G, which proves to be theoretically optimal. In addition, EquiTruss can be efficiently updated in a dynamic fashion when G evolves with edge insertion and deletion. Experimental studies in real-world, large-scale graphs validate the efficiency and effectiveness of EquiTruss, which has achieved at least an order of magnitude speedup in community search over the state-of-the-art method, TCP-Index. 
    more » « less
  5. Quasi-cliques are a type of dense subgraphs that generalize the notion of cliques, important for applications such as community/module detection in various social and biological networks. However, the existing quasi-clique definition and algorithms are only applicable to undirected graphs. In this paper, we generalize the concept of quasi-cliques to directed graphs by proposing $(\gamma_1, \gamma_2)$-quasi-cliques which have density requirements in both inbound and outbound directions of each vertex in a quasi-clique subgraph. An efficient recursive algorithm is proposed to find maximal $(\gamma_1, \gamma_2)$-quasi-cliques which integrates many effective pruning rules that are validated by ablation studies. We also study the finding of top-$k$ large quasi-cliques directly by bootstrapping the search from more compact quasi-cliques, to scale the mining to larger networks. The algorithms are parallelized with effective load balancing, and we demonstrate that they can scale up effectively with the number of CPU cores. 
    more » « less