Voice interfaces are increasingly becoming integrated into a variety of Internet of Things (IoT) devices. Such systems can dramatically simplify interactions between users and devices with limited displays. Unfortunately voice interfaces also create new opportunities for exploitation. Specifically any sound-emitting device within range of the system implementing the voice interface (e.g., a smart television, an Internet-connected appliance, etc) can potentially cause these systems to perform operations against the desires of their owners (e.g., unlock doors, make unauthorized purchases, etc). We address this problem by developing a technique to recognize fundamental differences in audio created by humans and electronic speakers. We identify sub-bass over-excitation, or the presence of significant low frequency signals that are outside of the range of human voices but inherent to the design of modern speakers, as a strong differentiator between these two sources. After identifying this phenomenon, we demonstrate its use in preventing adversarial requests, replayed audio, and hidden commands with a 100%/1.72% TPR/FPR in quiet environments. In so doing, we demonstrate that commands injected via nearby audio devices can be effectively removed by voice interfaces.
more »
« less
2MA: Verifying Voice Commands via Two Microphone Authentication
Voice controlled interfaces have vastly improved the usability of many devices (e.g., headless IoT systems). Unfortunately, the lack of authentication for these interfaces has also introduced command injection vulnerabilities - whether via compromised IoT devices, television ads or simply malicious nearby neighbors, causing such devices to perform unauthenticated sensitive commands is relatively easy. We address these weaknesses with Two Microphone Authentication (2MA), which takes advantage of the presence of multiple ambient and personal devices operating in the same area. We develop an embodiment of 2MA that combines approximate localization through Direction of Arrival (DOA) techniques with Robust Audio Hashes (RSHs). Our results show that our 2MA system can localize a source to within a narrow physical cone (< 30◦) with zero false positives, eliminate replay attacks and prevent the injection of inaudible/hidden commands. As such, we dramatically increase the difficulty for an adversary to carry out such attacks and demonstrate that 2MA is an effective means of authenticating and localizing voice commands.
more »
« less
- Award ID(s):
- 1702879
- PAR ID:
- 10091981
- Date Published:
- Journal Name:
- Proceedings of the 2018 on Asia Conference on Computer and Communications Security
- Page Range / eLocation ID:
- 89 to 100
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Voice controlled interactive smart speakers, such as Google Home, Amazon Echo, and Apple HomePod are becoming commonplace in today's homes. These devices listen continually for the user commands, that are triggered by special keywords, such as "Alexa" and "Hey Siri". Recent research has shown that these devices are vulnerable to attacks through malicious voice commands from nearby devices. The commands can be sent easily during unoccupied periods, so that the user may be unaware of such attacks. We present EchoSafe, a user-friendly sonar-based defense against these attacks. When the user sends a critical command to the smart speaker, EchoSafe sends an audio pulse followed by post processing to determine if the user is present in the room. We can detect the user's presence during critical commands with 93.13% accuracy, and our solution can be extended to defend against other attack scenarios, as well.more » « less
-
A smart home involves a variety of entities, such as IoT devices, automation applications, humans, voice assistants, and companion apps. These entities interact in the same physical environment, which can yield undesirable and even hazardous results, called IoT interaction threats. Existing work on interaction threats is limited to considering automation apps, ignoring other IoT control channels, such as voice commands, companion apps, and physical operations. Second, it becomes increasingly common that a smart home utilizes multiple IoT platforms, each of which has a partial view of device states and may issue conflicting commands. Third, compared to detecting interaction threats, their handling is much less studied. Prior work uses generic handling policies, which are unlikely to fit all homes. We present IoTMediator, which provides accurate threat detection and threat-tailored handling in multi-platform multi-control-channel homes. Our evaluation in two real-world homes demonstrates that IoTMediator significantly outperforms prior state-of-the-art work.more » « less
-
A Continuous Articulatory Gesture Based Liveness Detection for Voice Authentication on Smart DevicesVoice biometrics is drawing increasing attention to user authentication on smart devices. However, voice biometrics is vulnerable to replay attacks, where adversaries try to spoof voice authentication systems using pre-recorded voice samples collected from genuine users. To this end, we propose VoiceGesture, a liveness detection solution for voice authentication on smart devices such as smartphones and smart speakers. With audio hardware advances on smart devices, VoiceGesture leverages built-in speaker and microphone pairs on smart devices as Doppler Radar to sense articulatory gestures for liveness detection during voice authentication. The experiments with 21 participants and different smart devices show that VoiceGesture achieves over 99% and around 98% detection accuracy for text-dependent and text-independent liveness detection, respectively. Moreover, VoiceGesture is robust to different device placements, low audio sampling frequency, and supports medium range liveness detection on smart speakers in various use scenarios, including smart homes and smart vehicles.more » « less
-
The popularity of smart home devices has led to an increase in security incidents happening in smart homes. A key measure to avoid such incidents is to authenticate users before they can interact with smart devices. However, current methods often require additional hardware. This article proposes STATION, a gesture-based authentication system, an effective gesture-based authentication method built on top of the voice interfaces already available in these smart home devices, without adding new hardware. STATION uses a gesture processing pipeline that identifies Doppler-existing frames and detects the direction of arrival of Reflection to authenticate users in low SNR environments and at longer distances. Furthermore, regarding the nature of gesture-based authentication, this system also supports detecting user liveness, preventing replay and synthesis attacks from remote attackers. The evaluation of STATION shows high accuracy with a false acceptance rate (FAR) of 0.08% and false rejection rate (FRR) of 3.10% for users within 1.5 m of the device.more » « less
An official website of the United States government

