In multimodal machine learning, effectively addressing the missing modality scenario is crucial for improving performance in downstream tasks such as in medical contexts where data may be incomplete. Although some attempts have been made to retrieve embeddings for missing modalities, two main bottlenecks remain: (1) the need to consider both intra- and inter-modal context, and (2) the cost of embedding selection, where embeddings often lack modality-specific knowledge. To address this, the authors propose MoE-Retriever, a novel framework inspired by Sparse Mixture of Experts (SMoE). MoE-Retriever defines a supporting group for intra-modal inputs—samples that commonly lack the target modality—by selecting samples with complementary modality combinations for the target modality. This group is integrated with inter-modal inputs from different modalities of the same sample, establishing both intra- and inter-modal contexts. These inputs are processed by Multi-Head Attention to generate context-aware embeddings, which serve as inputs to the SMoE Router that automatically selects the most relevant experts (embedding candidates). Comprehensive experiments on both medical and general multimodal datasets demonstrate the robustness and generalizability of MoE-Retriever, marking a significant step forward in embedding retrieval methods for incomplete multimodal data.
more »
« less
Inconsistent Matters: A Knowledge-Guided Dual-Consistency Network for Multi-Modal Rumor Detection
Rumor spreaders are increasingly utilizing multimedia content to attract the attention and trust of news consumers. Though quite a few rumor detection models have exploited the multi-modal data, they seldom consider the inconsistent semantics between images and texts, and rarely spot the inconsistency among the post contents and background knowledge. In addition, they commonly assume the completeness of multiple modalities and thus are incapable of handling handle missing modalities in real-life scenarios. Motivated by the intuition that rumors in social media are more likely to have inconsistent semantics, a novel Knowledge-guided Dual-consistency Network is proposed to detect rumors with multimedia contents. It uses two consistency detection subnetworks to capture the inconsistency at the cross-modal level and the content-knowledge level simultaneously. It also enables robust multi-modal representation learning under different missing visual modality conditions, using a special token to discriminate between posts with visual modality and posts without visual modality. Extensive experiments on three public real-world multimedia datasets demonstrate that our framework can outperform the state-of-the-art baselines under both complete and incomplete modality conditions.
more »
« less
- Award ID(s):
- 2008155
- PAR ID:
- 10477822
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- IEEE Transactions on Knowledge and Data Engineering
- Volume:
- 35
- Issue:
- 12
- ISSN:
- 1041-4347
- Page Range / eLocation ID:
- 12736 to 12749
- Subject(s) / Keyword(s):
- Feature Extraction Visualization Social Networking Online Semantics Fake News Electronic Mail Data Mining Multi Modal Learning Rumor Detection Social Media Analysis Social Media Real World Datasets Visual Modality Multimodal Learning Post Content Multimedia Content Multimodal Representation Special Token Data Visualization Types Of Information Visual Information Visual Features Text Data Generative Adversarial Networks Content Knowledge Textual Information Graph Convolutional Network Entity Pairs Largest Distance Twitter Dataset Text Modality Text Representation Textual Features Missing Patterns Confidence Interval CI Representation Of Entities Data Instances
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Although studies have investigated cyber-rumoring previous to the pandemic, little research has been undertaken to study rumors and rumor-corrections during the COVID-19 (coronavirus disease 2019) pandemic. Drawing on prior studies about how online stories become viral, this study will fill that gap by investigating the retransmission of COVID-19 rumors and corrective messages on Sina Weibo, the largest and most popular microblogging site in China. This study examines the impact of rumor types, content attributes (including frames, emotion, and rationality), and source characteristics (including follower size and source identity) to show how they affect the likelihood of a COVID-19 rumor and its correction being shared. By exploring the retransmission of rumors and their corrections in Chinese social media, this study will not only advance scholarly understanding but also reveal how corrective messages can be crafted to debunk cyber-rumors in particular cultural contexts.more » « less
-
In multimodal machine learning, effectively addressing the missing modality scenario is crucial for improving performance in downstream tasks such as in medical contexts where data may be incomplete. Although some attempts have been made to effectively retrieve embeddings for missing modalities, two main bottlenecks remain: the consideration of both intra- and inter-modal context, and the cost of embedding selection, where embeddings often lack modality-specific knowledge. In response, we propose MoE-Retriever, a novel framework inspired by the design principles of Sparse Mixture of Experts (SMoE). First, MoE-Retriever samples the relevant data from modality combinations, using a so-called supporting group to construct intra-modal inputs while incorporating inter-modal inputs. These inputs are then processed by Multi-Head Attention, after which the SMoE Router automatically selects the most relevant expert, i.e., the embedding candidate to be retrieved. Comprehensive experiments on both medical and general multimodal datasets demonstrate the robustness and generalizability of MoE-Retriever, marking a significant step forward in embedding retrieval methods for incomplete multimodal data.more » « less
-
null (Ed.)Abstract Social media have emerged as increasingly popular means and environments for information gathering and propagation. This vigorous growth of social media contributed not only to a pandemic (fast-spreading and far-reaching) of rumors and misinformation, but also to an urgent need for text-based rumor detection strategies. To speed up the detection of misinformation, traditional rumor detection methods based on hand-crafted feature selection need to be replaced by automatic artificial intelligence (AI) approaches. AI decision making systems require to provide explanations in order to assure users of their trustworthiness. Inspired by the thriving development of generative adversarial networks (GANs) on text applications, we propose a GAN-based layered model for rumor detection with explanations. To demonstrate the universality of the proposed approach, we demonstrate its benefits on a gene classification with mutation detection case study. Similarly to the rumor detection, the gene classification can also be formulated as a text-based classification problem. Unlike fake news detection that needs a previously collected verified news database, our model provides explanations in rumor detection based on tweet-level texts only without referring to a verified news database. The layered structure of both generative and discriminative models contributes to the outstanding performance. The layered generators produce rumors by intelligently inserting controversial information in non-rumors, and force the layered discriminators to detect detailed glitches and deduce exactly which parts in the sentence are problematic. On average, in the rumor detection task, our proposed model outperforms state-of-the-art baselines on PHEME dataset by $$26.85\%$$ 26.85 % in terms of macro-f1. The excellent performance of our model for textural sequences is also demonstrated by the gene mutation case study on which it achieves $$72.69\%$$ 72.69 % macro-f1 score.more » « less
-
Over the last decade, research has revealed the high prevalence of cyberbullying among youth and raised serious concerns in society. Information on the social media platforms where cyberbullying is most prevalent (e.g., Instagram, Facebook, Twitter) is inherently multi-modal, yet most existing work on cyberbullying identification has focused solely on building generic classification models that rely exclusively on text analysis of online social media sessions (e.g., posts). Despite their empirical success, these efforts ignore the multi-modal information manifested in social media data (e.g., image, video, user profile, time, and location), and thus fail to offer a comprehensive understanding of cyberbullying. Conventionally, when information from different modalities is presented together, it often reveals complementary insights about the application domain and facilitates better learning performance. In this paper, we study the novel problem of cyberbullying detection within a multi-modal context by exploiting social media data in a collaborative way. This task, however, is challenging due to the complex combination of both cross-modal correlations among various modalities and structural dependencies between different social media sessions, and the diverse attribute information of different modalities. To address these challenges, we propose XBully, a novel cyberbullying detection framework, that first reformulates multi-modal social media data as a heterogeneous network and then aims to learn node embedding representations upon it. Extensive experimental evaluations on real-world multi-modal social media datasets show that the XBully framework is superior to the state-of-the-art cyberbullying detection models.more » « less
An official website of the United States government

