skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Audio deepfakes: A survey
A deepfake is content or material that is synthetically generated or manipulated using artificial intelligence (AI) methods, to be passed off as real and can include audio, video, image, and text synthesis. The key difference between manual editing and deepfakes is that deepfakes are AI generated or AI manipulated and closely resemble authentic artifacts. In some cases, deepfakes can be fabricated using AI-generated content in its entirety. Deepfakes have started to have a major impact on society with more generation mechanisms emerging everyday. This article makes a contribution in understanding the landscape of deepfakes, and their detection and generation methods. We evaluate various categories of deepfakes especially in audio. The purpose of this survey is to provide readers with a deeper understanding of (1) different deepfake categories; (2) how they could be created and detected; (3) more specifically, how audio deepfakes are created and detected in more detail, which is the main focus of this paper. We found that generative adversarial networks (GANs), convolutional neural networks (CNNs), and deep neural networks (DNNs) are common ways of creating and detecting deepfakes. In our evaluation of over 150 methods, we found that the majority of the focus is on video deepfakes, and, in particular, the generation of video deepfakes. We found that for text deepfakes, there are more generation methods but very few robust methods for detection, including fake news detection, which has become a controversial area of research because of the potential heavy overlaps with human generation of fake content. Our study reveals a clear need to research audio deepfakes and particularly detection of audio deepfakes. This survey has been conducted with a different perspective, compared to existing survey papers that mostly focus on just video and image deepfakes. This survey mainly focuses on audio deepfakes that are overlooked in most of the existing surveys. This article's most important contribution is to critically analyze and provide a unique source of audio deepfake research, mostly ranging from 2016 to 2021. To the best of our knowledge, this is the first survey focusing on audio deepfakes generation and detection in English.  more » « less
Award ID(s):
2210011
PAR ID:
10415111
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Frontiers in Big Data
Volume:
5
ISSN:
2624-909X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Deepfake technology presents a significant challenge to cybersecurity. These highly sophisticated AI-generated manipulations can compromise sensitive information and erode public trust, privacy, and security. This has led to broader societal impacts, including decreased trust and confidence in digital communications. This paper will discuss public knowledge, understanding, and perception of AI-generated deepfakes, which was obtained through an online survey that measured people's ability to identify video, audio, and images of deepfakes. The findings will highlight the public's knowledge and perception of deepfakes, the risks that deepfake media presents, and the vulnerabilities to detection and prevention. This awareness will lead to stronger defense strategies and enhanced cybersecurity measures that will ultimately enhance deepfake detection technology and strengthen overall cybersecurity measures that will effectively mitigate exploitation risks and safeguard personal and organizational interests. 
    more » « less
  2. The increasing reach of deepfakes raises practical questions about people’s ability to detect false videos online. How vulnerable are people to deepfake videos? What technologies can help improve detection? Previous experiments that measure human deepfake detection historically omit a number of conditions that can exist in typical browsing conditions. Here, we operationalized four such conditions (low prevalence, brief presentation, low video quality, and divided attention), and found in a series of online experiments that all conditions lowered detection relative to baseline, suggesting that the current literature underestimates people’s susceptibility to deepfakes. Next, we examined how AI assistance could be integrated into the human decision process. We found that a model that exposes deepfakes by amplifying artifacts increases detection rates, and also leads to higher rates of incorporating AI feedback and higher final confidence than text-based prompts. Overall, this suggests that visual indicators that cause distortions on fake videos may be effective at mitigating the impact of falsified video. 
    more » « less
  3. Easy access to audio-visual content on social media, combined with the availability of modern tools such as Tensorflow or Keras, and open-source trained models, along with economical computing infrastructure, and the rapid evolution of deep-learning (DL) methods have heralded a new and frightening trend. Particularly, the advent of easily available and ready to use Generative Adversarial Networks (GANs), have made it possible to generate deepfakes media partially or completely fabricated with the intent to deceive to disseminate disinformation and revenge porn, to perpetrate financial frauds and other hoaxes, and to disrupt government functioning. Existing surveys have mainly focused on the detection of deepfake images and videos; this paper provides a comprehensive review and detailed analysis of existing tools and machine learning (ML) based approaches for deepfake generation, and the methodologies used to detect such manipulations in both audio and video. For each category of deepfake, we discuss information related to manipulation approaches, current public datasets, and key standards for the evaluation of the performance of deepfake detection techniques, along with their results. Additionally, we also discuss open challenges and enumerate future directions to guide researchers on issues which need to be considered in order to improve the domains of both deepfake generation and detection. This work is expected to assist readers in understanding how deepfakes are created and detected, along with their current limitations and where future research may lead. 
    more » « less
  4. Cochlear implants (CIs) allow deaf and hard-ofhearing individuals to use audio devices, such as phones or voice assistants. However, the advent of increasingly sophisticated synthetic audio (i.e., deepfakes) potentially threatens these users. Yet, this population’s susceptibility to such attacks is unclear. In this paper, we perform the first study of the impact of audio deepfakes on CI populations. We examine the use of CI-simulated audio within deepfake detectors. Based on these results, we conduct a user study with 35 CI users and 87 hearing persons (HPs) to determine differences in how CI users perceive deepfake audio. We show that CI users can, similarly to HPs, identify text-to-speech generated deepfakes. Yet, they perform substantially worse for voice conversion deepfake generation algorithms, achieving only 67% correct audio classification. We also evaluate how detection models trained on a CI-simulated audio compare to CI users and investigate if they can effectively act as proxies for CI users. This work begins an investigation into the intersection between adversarial audio and CI users to identify and mitigate threats against this marginalized group. 
    more » « less
  5. Deepfakes represent the generation of synthetic/fake images or videos using deep neural networks. As the techniques used for the generation of deepfakes are improving, the threats including social media disinformation, defamation, impersonation, and fraud are becoming more prevalent. The existing deepfakes detection models, including those that use convolution neural networks, do not generalize well when subjected to multiple deepfakes generation techniques and cross-corpora setting. Therefore, there is a need for the development of effective and efficient deepfakes detection methods. To explicitly model part-whole hierarchical relationships by using groups of neurons to encode visual entities and learn the relationships between real and fake artifacts, we propose a novel deep learning model efficient-capsule network (E-Cap Net) for classifying the facial images generated through different deepfakes generative techniques. More specifically, we introduce a low-cost max-feature-map (MFM) activation function in each primary capsule of our proposed E-Cap Net. The use of MFM activation enables our E-Cap Net to become light and robust as it suppresses the low activation neurons in each primary capsule. Performance of our approach is evaluated on two standard, largescale and diverse datasets i.e., Diverse Fake Face Dataset (DFFD) and FaceForensics++ (FF++), and also on the World Leaders Dataset (WLRD). Moreover, we also performed a cross-corpora evaluation to show the generalizability of our method for reliable deepfakes detection. The AUC of 99.99% on DFFD, 99.52% on FF++, and 98.31% on WLRD datasets indicate the effectiveness of our method for detecting the manipulated facial images generated via different deepfakes techniques. 
    more » « less