skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, June 13 until 2:00 AM ET on Friday, June 14 due to maintenance. We apologize for the inconvenience.


Title: Deterring Deepfake Attacks with an Electrical Network Frequency Fingerprints Approach
With the fast development of Fifth-/Sixth-Generation (5G/6G) communications and the Internet of Video Things (IoVT), a broad range of mega-scale data applications emerge (e.g., all-weather all-time video). These network-based applications highly depend on reliable, secure, and real-time audio and/or video streams (AVSs), which consequently become a target for attackers. While modern Artificial Intelligence (AI) technology is integrated with many multimedia applications to help enhance its applications, the development of General Adversarial Networks (GANs) also leads to deepfake attacks that enable manipulation of audio or video streams to mimic any targeted person. Deepfake attacks are highly disturbing and can mislead the public, raising further challenges in policy, technology, social, and legal aspects. Instead of engaging in an endless AI arms race “fighting fire with fire”, where new Deep Learning (DL) algorithms keep making fake AVS more realistic, this paper proposes a novel approach that tackles the challenging problem of detecting deepfaked AVS data leveraging Electrical Network Frequency (ENF) signals embedded in the AVS data as a fingerprint. Under low Signal-to-Noise Ratio (SNR) conditions, Short-Time Fourier Transform (STFT) and Multiple Signal Classification (MUSIC) spectrum estimation techniques are investigated to detect the Instantaneous Frequency (IF) of interest. For reliable authentication, we enhanced the ENF signal embedded through an artificial power source in a noisy environment using the spectral combination technique and a Robust Filtering Algorithm (RFA). The proposed signal estimation workflow was deployed on a continuous audio/video input for resilience against frame manipulation attacks. A Singular Spectrum Analysis (SSA) approach was selected to minimize the false positive rate of signal correlations. Extensive experimental analysis for a reliable ENF edge-based estimation in deepfaked multimedia recordings is provided to facilitate the need for distinguishing artificially altered media content.  more » « less
Award ID(s):
2039342
NSF-PAR ID:
10359339
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Future Internet
Volume:
14
Issue:
5
ISSN:
1999-5903
Page Range / eLocation ID:
125
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Rapid advances in the Internet of Video Things (IoVT) deployment in modern smart cities has enabled secure infrastructures with minimal human intervention. However, attacks on audio-video inputs affect the reliability of large-scale multimedia surveillance systems as attackers are able to manipulate the perception of live events. For example, Deepfake audio/video attacks and frame duplication attacks can cause significant security breaches. This paper proposes a Lightweight Environmental Fingerprint Consensus based detection of compromised smart cameras in edge surveillance systems (LEFC). LEFC is a partial decentralized authentication mechanism that leverages Electrical Network Frequency (ENF) as an environmental fingerprint and distributed ledger technology (DLT). An ENF signal carries randomly fluctuating spatio-temporal signatures, which enable digital media authentication. With the proposed DLT consensus mechanism named Proof-of-ENF (PoENF) as a backbone, LEFC can estimate and authenticate the media recording and detect byzantine nodes controlled by the perpetrator. The experimental evaluation shows feasibility and effectiveness of proposed LEFC scheme under a distributed byzantine network environment. 
    more » « less
  2. Wysocki, Bryant T. ; Holt, James ; Blowers, Misty (Ed.)
    The information era has gained a lot of traction due to the abundant digital media contents through technological broadcasting resources. Among the information providers, the social media platform has remained a popular platform for the widespread reach of digital content. Along with accessibility and reach, social media platforms are also a huge venue for spreading misinformation since the data is not curated by trusted authorities. With many malicious participants involved, artificially generated media or strategically altered content could potentially result in affecting the integrity of targeted organizations. Popular content generation tools like DeepFake have allowed perpetrators to create realistic media content by manipulating the targeted subject with a fake identity or actions. Media metadata like time and location-based information are altered to create a false perception of real events. In this work, we propose a Decentralized Electrical Network Frequency (ENF)-based Media Authentication (DEMA) system to verify the metadata information and the digital multimedia integrity. Leveraging the environmental ENF fingerprint captured by digital media recorders, altered media content is detected by exploiting the ENF consistency based on its time and location of recording along with its spatial consistency throughout the captured frames. A decentralized and hierarchical ENF map is created as a reference database for time and location verification. For digital media uploaded to a broadcasting service, the proposed DEMA system correlates the underlying ENF fingerprint with the stored ENF map to authenticate the media metadata. With the media metadata intact, the embedded ENF in the recording is compared with a reference ENF based on the time of recording, and a correlation-based metric is used to evaluate the media authenticity. In case of missing metadata, the frames are divided spatially to compare the ENF consistency throughout the recording. 
    more » « less
  3. A deepfake is content or material that is synthetically generated or manipulated using artificial intelligence (AI) methods, to be passed off as real and can include audio, video, image, and text synthesis. The key difference between manual editing and deepfakes is that deepfakes are AI generated or AI manipulated and closely resemble authentic artifacts. In some cases, deepfakes can be fabricated using AI-generated content in its entirety. Deepfakes have started to have a major impact on society with more generation mechanisms emerging everyday. This article makes a contribution in understanding the landscape of deepfakes, and their detection and generation methods. We evaluate various categories of deepfakes especially in audio. The purpose of this survey is to provide readers with a deeper understanding of (1) different deepfake categories; (2) how they could be created and detected; (3) more specifically, how audio deepfakes are created and detected in more detail, which is the main focus of this paper. We found that generative adversarial networks (GANs), convolutional neural networks (CNNs), and deep neural networks (DNNs) are common ways of creating and detecting deepfakes. In our evaluation of over 150 methods, we found that the majority of the focus is on video deepfakes, and, in particular, the generation of video deepfakes. We found that for text deepfakes, there are more generation methods but very few robust methods for detection, including fake news detection, which has become a controversial area of research because of the potential heavy overlaps with human generation of fake content. Our study reveals a clear need to research audio deepfakes and particularly detection of audio deepfakes. This survey has been conducted with a different perspective, compared to existing survey papers that mostly focus on just video and image deepfakes. This survey mainly focuses on audio deepfakes that are overlooked in most of the existing surveys. This article's most important contribution is to critically analyze and provide a unique source of audio deepfake research, mostly ranging from 2016 to 2021. To the best of our knowledge, this is the first survey focusing on audio deepfakes generation and detection in English. 
    more » « less
  4. Abstract

    Intellectual and Developmental Disabilities (IDDs), such as Down syndrome, Fragile X syndrome, Rett syndrome, and autism spectrum disorder, usually manifest at birth or early childhood. IDDs are characterized by significant impairment in intellectual and adaptive functioning, and both genetic and environmental factors underpin IDD biology. Molecular and genetic stratification of IDDs remain challenging mainly due to overlapping factors and comorbidity. Advances in high throughput sequencing, imaging, and tools to record behavioral data at scale have greatly enhanced our understanding of the molecular, cellular, structural, and environmental basis of some IDDs. Fueled by the “big data” revolution, artificial intelligence (AI) and machine learning (ML) technologies have brought a whole new paradigm shift in computational biology. Evidently, the ML-driven approach to clinical diagnoses has the potential to augment classical methods that use symptoms and external observations, hoping to push the personalized treatment plan forward. Therefore, integrative analyses and applications of ML technology have a direct bearing on discoveries in IDDs. The application of ML to IDDs can potentially improve screening and early diagnosis, advance our understanding of the complexity of comorbidity, and accelerate the identification of biomarkers for clinical research and drug development. For more than five decades, the IDDRC network has supported a nexus of investigators at centers across the USA, all striving to understand the interplay between various factors underlying IDDs. In this review, we introduced fast-increasing multi-modal data types, highlighted example studies that employed ML technologies to illuminate factors and biological mechanisms underlying IDDs, as well as recent advances in ML technologies and their applications to IDDs and other neurological diseases. We discussed various molecular, clinical, and environmental data collection modes, including genetic, imaging, phenotypical, and behavioral data types, along with multiple repositories that store and share such data. Furthermore, we outlined some fundamental concepts of machine learning algorithms and presented our opinion on specific gaps that will need to be filled to accomplish, for example, reliable implementation of ML-based diagnosis technology in IDD clinics. We anticipate that this review will guide researchers to formulate AI and ML-based approaches to investigate IDDs and related conditions.

     
    more » « less
  5. Jean-Jacques Rousseau ; Bill Kapralos ; Henrik I. Christensen ; Michael Jenkin ; Cheng-Lin (Ed.)
    Exponential growth in the use of smart speakers (SS) for the automation of homes, offices, and vehicles has brought a revolution of convenience to our lives. However, these SSs are susceptible to a variety of spoofing attacks, known/seen and unknown/unseen, created using cutting-edge AI generative algorithms. The realistic nature of these powerful attacks is capable of deceiving the automatic speaker verification (ASV) engines of these SSs, resulting in a huge potential for fraud using these devices. This vulnerability highlights the need for the development of effective countermeasures capable of the reliable detection of known and unknown spoofing attacks. This paper presents a novel end-to-end deep learning model, AEXANet, to effectively detect multiple types of physical- and logical-access attacks, both known and unknown. The proposed countermeasure has the ability to learn low-level cues by analyzing raw audio, utilizes a dense convolutional network for the propagation of diversified raw waveform features, and strengthens feature propagation. This system employs a maximum feature map activation function, which improves the performance against unseen spoofing attacks while making the model more efficient, enabling the model to be used for real-time applications. An extensive evaluation of our model was performed on the ASVspoof 2019 PA and LA datasets, along with TTS and VC samples, separately containing both seen and unseen attacks. Moreover, cross corpora evaluation using the ASVspoof 2019 and ASVspoof 2015 datasets was also performed. Experimental results show the reliability of our method for voice spoofing detection. 
    more » « less