skip to main content


Title: Identifying Illicit Drug Dealers on Instagram with Large-scale Multimodal Data Fusion
Illicit drug trafficking via social media sites such as Instagram have become a severe problem, thus drawing a great deal of attention from law enforcement and public health agencies. How to identify illicit drug dealers from social media data has remained a technical challenge for the following reasons. On the one hand, the available data are limited because of privacy concerns with crawling social media sites; on the other hand, the diversity of drug dealing patterns makes it difficult to reliably distinguish drug dealers from common drug users. Unlike existing methods that focus on posting-based detection, we propose to tackle the problem of illicit drug dealer identification by constructing a large-scale multimodal dataset named Identifying Drug Dealers on Instagram (IDDIG). Nearly 4,000 user accounts, of which more than 1,400 are drug dealers, have been collected from Instagram with multiple data sources including post comments, post images, homepage bio, and homepage images. We then design a quadruple-based multimodal fusion method to combine the multiple data sources associated with each user account for drug dealer identification. Experimental results on the constructed IDDIG dataset demonstrate the effectiveness of the proposed method in identifying drug dealers (almost 95% accuracy). Moreover, we have developed a hashtag-based community detection technique for discovering evolving patterns, especially those related to geography and drug types.  more » « less
Award ID(s):
2203261 2027127 2209814 2203262 2217239
PAR ID:
10319616
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
ACM Transactions on Intelligent Systems and Technology
Volume:
12
Issue:
5
ISSN:
2157-6904
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Instagram, one of the most popular social media platforms among youth, has recently come under scrutiny for potentially being harmful to the safety and well-being of our younger generations. Automated approaches for risk detection may be one way to help mitigate some of these risks if such algorithms are both accurate and contextual to the types of online harms youth face on social media platforms. However, the imminent switch by Instagram to end-to-end encryption for private conversations will limit the type of data that will be available to the platform to detect and mitigate such risks. In this paper, we investigate which indicators are most helpful in automatically detecting risk in Instagram private conversations, with an eye on high-level metadata, which will still be available in the scenario of end-to-end encryption. Toward this end, we collected Instagram data from 172 youth (ages 13-21) and asked them to identify private message conversations that made them feel uncomfortable or unsafe. Our participants risk-flagged 28,725 conversations that contained 4,181,970 direct messages, including textual posts and images. Based on this rich and multimodal dataset, we tested multiple feature sets (metadata, linguistic cues, and image features) and trained classifiers to detect risky conversations. Overall, we found that the metadata features (e.g., conversation length, a proxy for participant engagement) were the best predictors of risky conversations. However, for distinguishing between risk types, the different linguistic and media cues were the best predictors. Based on our findings, we provide design implications for AI risk detection systems in the presence of end-to-end encryption. More broadly, our work contributes to the literature on adolescent online safety by moving toward more robust solutions for risk detection that directly takes into account the lived risk experiences of youth. 
    more » « less
  2. Photo sharing has become increasingly easy with the rise of social media. Social networking sites (SNSs), such as Instagram and Facebook, are well known for their image-sharing capabilities. However, this brings the concern of photo privacy, such as who may see the images of a user who is included in a post. Photo privacy settings offer detailed and more secure ways to share a user’s photos, however, this would require SNS users to understand these settings. To better grasp users’ understanding of photo privacy settings, we conducted a structured interview with Instagram users. We found that users were aware of the majority of the privacy settings asked about and that they accurately perceived their photo privacy safety based on their knowledge of photo privacy settings.

     
    more » « less
  3. Over the last decade, research has revealed the high prevalence of cyberbullying among youth and raised serious concerns in society. Information on the social media platforms where cyberbullying is most prevalent (e.g., Instagram, Facebook, Twitter) is inherently multi-modal, yet most existing work on cyberbullying identification has focused solely on building generic classification models that rely exclusively on text analysis of online social media sessions (e.g., posts). Despite their empirical success, these efforts ignore the multi-modal information manifested in social media data (e.g., image, video, user profile, time, and location), and thus fail to offer a comprehensive understanding of cyberbullying. Conventionally, when information from different modalities is presented together, it often reveals complementary insights about the application domain and facilitates better learning performance. In this paper, we study the novel problem of cyberbullying detection within a multi-modal context by exploiting social media data in a collaborative way. This task, however, is challenging due to the complex combination of both cross-modal correlations among various modalities and structural dependencies between different social media sessions, and the diverse attribute information of different modalities. To address these challenges, we propose XBully, a novel cyberbullying detection framework, that first reformulates multi-modal social media data as a heterogeneous network and then aims to learn node embedding representations upon it. Extensive experimental evaluations on real-world multi-modal social media datasets show that the XBully framework is superior to the state-of-the-art cyberbullying detection models. 
    more » « less
  4. In this work, we present a case study on an Instagram Data Donation (IGDD) project, which is a user study and web-based platform for youth (ages 13-21) to donate and annotate their Instagram data with the goal of improving adolescent online safety. We employed human-centered design principles to create an ecologically valid dataset that will be utilized to provide insights from teens’ private social media interactions and train machine learning models to detect online risks. Our work provides practical insights and implications for Human-Computer Interaction (HCI) researchers that collect and study social media data to address sensitive problems relating to societal good. 
    more » « less
  5. Urban greenway is an emerging form of urban landscape offering multifaceted benefits to public health, economy, and ecology. However, the usage and user experiences of greenways are often challenging to measure because it is costly to survey such large areas. Based on the online postings from Instagram in 2017, this paper used Computer Vision (CV) technology to analyze and compare how the general public uses two typical greenway parks, The High Line in New York City and the Atlanta Beltline in Atlanta. Face and object detection analysis were conducted to infer user composition, activities, and key experiences. We presented the temporal patterns of Instagram postings as well as the group gatherings, smiling, and representative objects detected from photos. Our results have shown high user engagement levels for both parks while teens are significantly underrepresented. The High Line had more group activities and was more active during weekdays than the Atlanta Beltline. Stronger sense of escape and physical activities can be found in Atlanta Beltline. In summary, social media images like Instagram can provide strong empirical evidence for urban greenway usage when combined with artificial intelligence technologies, which can support the future practice of landscape architecture and urban design. 
    more » « less