By employing generative deep learning techniques, Deepfakes are created with the intent to create mistrust in society, manipulate public opinion and political decisions, and for other malicious purposes such as blackmail, scamming, and even cyberstalking. As realistic deepfake may involve manipulation of either audio or video or both, thus it is important to explore the possibility of detecting deepfakes through the inadequacy of generative algorithms to synchronize audio and visual modalities. Prevailing performant methods, either detect audio or video cues for deepfakes detection while few ensemble the results after predictions on both modalities without inspecting relationship between audio and video cues. Deepfake detection using joint audiovisual representation learning is not explored much. Therefore, this paper proposes a unified multimodal framework, Multimodaltrace, which extracts learned channels from audio and visual modalities, mixes them independently in IntrAmodality Mixer Layer (IAML), processes them jointly in IntErModality Mixer Layers (IEML) from where it is fed to multilabel classification head. Empirical results show the effectiveness of the proposed framework giving state-of-the-art accuracy of 92.9% on the FakeAVCeleb dataset. The cross-dataset evaluation of the proposed framework on World Leaders and Presidential Deepfake Detection Datasets gives an accuracy of 83.61% and 70% respectively. The study also provides insights into how the model focuses on different parts of audio and visual features through integrated gradient analysis
more »
« less
HolisticDFD: Infusing Spatiotemporal Transformer Embeddings for Deepfake Detection
Deepfakes, or synthetic audiovisual media developed with the intent to deceive, are growing increasingly prevalent. Existing methods, employed independently as images/patches or jointly as tubelets, have, up to this point, typically focused on spatial or spatiotemporal inconsistencies. However, the evolving nature of deepfakes demands a holistic approach. Inspection of a given multimedia sample with the intent to validate its authenticity, without adding significant computational overhead has, to date, not been fully explored in the literature. In addition, no work has been done on the impact of different inconsistency dimensions in a single framework. This paper tackles the deepfake detection problem holistically. HolisticDFD, a novel, transformer-based, deepfake detection method which is both lightweight and compact, intelligently combines embeddings from the spatial, temporal and spatiotemporal dimensions to separate deepfakes from bonafide videos. The proposed system achieves 0.926 AUC on the DFDC dataset using just 3% of the parameters used by state-ofthe-art detectors. An evaluation against other datasets shows the efficacy of the proposed framework, and an ablation study shows that the performance of the system gradually improves as embeddings with different data representations are combined. An implementation of the proposed model is available at: https://github.com/smileslab/deepfake-detection/.
more »
« less
- Award ID(s):
- 1815724
- PAR ID:
- 10427320
- Date Published:
- Journal Name:
- Information sciences
- ISSN:
- 0020-0255
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
A deepfake is content or material that is synthetically generated or manipulated using artificial intelligence (AI) methods, to be passed off as real and can include audio, video, image, and text synthesis. The key difference between manual editing and deepfakes is that deepfakes are AI generated or AI manipulated and closely resemble authentic artifacts. In some cases, deepfakes can be fabricated using AI-generated content in its entirety. Deepfakes have started to have a major impact on society with more generation mechanisms emerging everyday. This article makes a contribution in understanding the landscape of deepfakes, and their detection and generation methods. We evaluate various categories of deepfakes especially in audio. The purpose of this survey is to provide readers with a deeper understanding of (1) different deepfake categories; (2) how they could be created and detected; (3) more specifically, how audio deepfakes are created and detected in more detail, which is the main focus of this paper. We found that generative adversarial networks (GANs), convolutional neural networks (CNNs), and deep neural networks (DNNs) are common ways of creating and detecting deepfakes. In our evaluation of over 150 methods, we found that the majority of the focus is on video deepfakes, and, in particular, the generation of video deepfakes. We found that for text deepfakes, there are more generation methods but very few robust methods for detection, including fake news detection, which has become a controversial area of research because of the potential heavy overlaps with human generation of fake content. Our study reveals a clear need to research audio deepfakes and particularly detection of audio deepfakes. This survey has been conducted with a different perspective, compared to existing survey papers that mostly focus on just video and image deepfakes. This survey mainly focuses on audio deepfakes that are overlooked in most of the existing surveys. This article's most important contribution is to critically analyze and provide a unique source of audio deepfake research, mostly ranging from 2016 to 2021. To the best of our knowledge, this is the first survey focusing on audio deepfakes generation and detection in English.more » « less
-
Easy access to audio-visual content on social media, combined with the availability of modern tools such as Tensorflow or Keras, and open-source trained models, along with economical computing infrastructure, and the rapid evolution of deep-learning (DL) methods have heralded a new and frightening trend. Particularly, the advent of easily available and ready to use Generative Adversarial Networks (GANs), have made it possible to generate deepfakes media partially or completely fabricated with the intent to deceive to disseminate disinformation and revenge porn, to perpetrate financial frauds and other hoaxes, and to disrupt government functioning. Existing surveys have mainly focused on the detection of deepfake images and videos; this paper provides a comprehensive review and detailed analysis of existing tools and machine learning (ML) based approaches for deepfake generation, and the methodologies used to detect such manipulations in both audio and video. For each category of deepfake, we discuss information related to manipulation approaches, current public datasets, and key standards for the evaluation of the performance of deepfake detection techniques, along with their results. Additionally, we also discuss open challenges and enumerate future directions to guide researchers on issues which need to be considered in order to improve the domains of both deepfake generation and detection. This work is expected to assist readers in understanding how deepfakes are created and detected, along with their current limitations and where future research may lead.more » « less
-
Deepfake technology presents a significant challenge to cybersecurity. These highly sophisticated AI-generated manipulations can compromise sensitive information and erode public trust, privacy, and security. This has led to broader societal impacts, including decreased trust and confidence in digital communications. This paper will discuss public knowledge, understanding, and perception of AI-generated deepfakes, which was obtained through an online survey that measured people's ability to identify video, audio, and images of deepfakes. The findings will highlight the public's knowledge and perception of deepfakes, the risks that deepfake media presents, and the vulnerabilities to detection and prevention. This awareness will lead to stronger defense strategies and enhanced cybersecurity measures that will ultimately enhance deepfake detection technology and strengthen overall cybersecurity measures that will effectively mitigate exploitation risks and safeguard personal and organizational interests.more » « less
-
The evolving landscape of manipulated media, including the threat of deepfakes, has made information verification a daunting challenge for journalists. Technologists have developed tools to detect deepfakes, but these tools can sometimes yield inaccurate results, raising concerns about inadvertently disseminating manipulated content as authentic news. This study examines the impact of unreliable deepfake detection tools on information verification. We conducted role-playing exercises with 24 US journalists, immersing them in complex breaking-news scenarios where determining authenticity was challenging. Through these exercises, we explored questions regarding journalists’ investigative processes, use of a deepfake detection tool, and decisions on when and what to publish. Our findings reveal that journalists are diligent in verifying information, but sometimes rely too heavily on results from deepfake detection tools. We argue for more cautious release of such tools, accompanied by proper training for users to mitigate the risk of unintentionally propagating manipulated content as real news.more » « less
An official website of the United States government

