skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on March 29, 2026

Title: Exploring AI in Vishing: Threats and Countermeasures
This study examines how artificial intelligence (AI) can help with voice phishing (vishing) attacks, with a particular emphasis on deepfake technologies and AI-driven voice synthesis. It examines the strategies used by cybercriminals, assesses the effectiveness of the present defenses, and identifies difficulties in identifying and preventing such attacks. The results show that to combat the increasing complexity of vishing strategies, there is an urgent need for sophisticated detection systems and preventive actions. Future directions include the creation of cooperative policy frameworks to control the misuse of AI and easily accessible solutions for small enterprises.  more » « less
Award ID(s):
1754054
PAR ID:
10623600
Author(s) / Creator(s):
;
Publisher / Repository:
The 2025 ADMI Symposium.
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Speech alignment is where talkers subconsciously adopt the speech and language patterns of their interlocutor. Nowadays, people of all ages are speaking with voice-activated, artificially-intelligent (voice-AI) digital assistants through phones or smart speakers. This study examines participants’ age (older adults, 53–81 years old vs. younger adults, 18–39 years old) and gender (female and male) on degree of speech alignment during shadowing of (female and male) human and voice-AI (Apple’s Siri) productions. Degree of alignment was assessed holistically via a perceptual ratings AXB task by a separate group of listeners. Results reveal that older and younger adults display distinct patterns of alignment based on humanness and gender of the human model talkers: older adults displayed greater alignment toward the female human and device voices, while younger adults aligned to a greater extent toward the male human voice. Additionally, there were other gender-mediated differences observed, all of which interacted with model talker category (voice-AI vs. human) or shadower age category (OA vs. YA). Taken together, these results suggest a complex interplay of social dynamics in alignment, which can inform models of speech production both in human-human and human-device interaction. 
    more » « less
  2. Abstract The rapid proliferation of ChatGPT has incited debates regarding its impact on human writing. Amid concerns about declining writing standards, this study investigates the role of ChatGPT in facilitating writing, especially among language learners. Using a case study approach, this study examines the experiences of Kailing, a doctoral student, who integrates ChatGPT throughout their writing process. The study employs activity theory as a lens for understanding writing with generative AI tools and data analyzed includes semi-structured interviews, writing samples, and GPT logs. Results indicate that Kailing effectively collaborates with ChatGPT across various writing stages while preserving her distinct authorial voice and agency. This underscores the potential of AI tools such as ChatGPT to enhance writing for language learners without overshadowing individual authenticity. This case study offers a critical exploration of how ChatGPT is utilized in the writing process and the preservation of a student’s authentic voice when engaging with the tool. 
    more » « less
  3. With the advent of automated speaker verifcation (ASV) systems comes an equal and opposite development: malicious actors may seek to use voice spoofng attacks to fool those same systems. Various counter measures have been proposed to detect these spoofing attacks, but current oferings in this arena fall short of a unifed and generalized approach applicable in real-world scenarios. For this reason, defensive measures for ASV systems produced in the last 6-7 years need to be classifed, and qualitative and quantitative comparisons of state-of-the-art (SOTA) counter measures should be performed to assess the efectiveness of these systems against real-world attacks. Hence, in this work, we conduct a review of the literature on spoofng detection using hand-crafted features, deep learning, and end-to-end spoofng countermeasure solutions to detect logical access attacks, such as speech synthesis and voice conversion, and physical access attacks, i.e., replay attacks. Additionally, we review integrated and unifed solutions to voice spoofng evaluation and speaker verifcation, and adversarial and anti-forensic attacks on both voice counter measures and ASV systems. In an extensive experimental analysis, the limitations and challenges of existing spoofng counter measures are presented, the performance of these counter measures on several datasets is reported, and cross-corpus evaluations are performed, something that is nearly absent in the existing literature, in order to assess the generalizability of existing solutions. For the experiments, we employ the ASVspoof2019, ASVspoof2021, and VSDC datasets along with GMM, SVM, CNN, and CNN-GRU classifers. For reproducibility of the results, the code of the testbed can be found at our GitHub Repository (https://github.com/smileslab/Comparative-Analysis-Voice-Spoofing). 
    more » « less
  4. Automatic Speech Recognition (ASR) systems are widely used in various online transcription services and personal digital assistants. Emerging lines of research have demonstrated that ASR systems are vulnerable to hidden voice commands, i.e., audio that can be recognized by ASRs but not by humans. Such attacks, however, often either highly depend on white-box knowledge of a specific machine learning model or require special hardware to construct the adversarial audio. This paper proposes a new model-agnostic and easily-constructed attack, called CommanderGabble, which uses fast speech to camouflage voice commands. Both humans and ASR systems often misinterpret fast speech, and such misinterpretation can be exploited to launch hidden voice command attacks. Specifically, by carefully manipulating the phonetic structure of a target voice command, ASRs can be caused to derive a hidden meaning from the manipulated, high-speed version. We implement the discovered attacks both over-the-wire and over-the-air, and conduct a suite of experiments to demonstrate their efficacy against 7 practical ASR systems. Our experimental results show that the over-the-wire attacks can disguise as many as 96 out of 100 tested voice commands into adversarial ones, and that the over-the-air attacks are consistently successful for all 18 chosen commands in multiple real-world scenarios. 
    more » « less
  5. null (Ed.)
    More and more, humans are engaging with voice-activated artificially intelligent (voice-AI) systems that have names (e.g., Alexa), apparent genders, and even emotional expression; they are in many ways a growing ‘social’ presence. But to what extent do people display sociolinguistic attitudes, developed from human-human interaction, toward these disembodied text-to-speech (TTS) voices? And how might they vary based on the cognitive traits of the individual user? The current study addresses these questions, testing native English speakers’ judgments for 6 traits (intelligent, likeable, attractive, professional, human-like, and age) for a naturally-produced female human voice and the US-English default Amazon Alexa voice. Following exposure to the voices, participants completed these ratings for each speaker, as well as the Autism Quotient (AQ) survey, to assess individual differences in cognitive processing style. Results show differences in individuals’ ratings of the likeability and human-likeness of the human and AI talkers based on AQ score. Results suggest that humans transfer social assessment of human voices to voice-AI, but that the way they do so is mediated by their own cognitive characteristics. 
    more » « less