skip to main content

Title: Trade-offs in Augmented Reality User Interfaces for Controlling a Smart Environment
Smart devices and Internet of Things (IoT) technologies are replacing or being incorporated into traditional devices at a growing pace. The use of digital interfaces to interact with these devices has become a common occurrence in homes, work spaces, and various industries around the world. The most common interfaces for these connected devices focus on mobile apps or voice control via intelligent virtual assistants. However, with augmented reality (AR) becoming more popular and accessible among consumers, there are new opportunities for spatial user interfaces to seamlessly bridge the gap between digital and physical affordances. In this paper, we present a human-subject study evaluating and comparing four user interfaces for smart connected environments: gaze input, hand gestures, voice input, and a mobile app. We assessed participants’ user experience, usability, task load, completion time, and preferences. Our results show multiple trade-offs between these interfaces across these measures. In particular, we found that gaze input shows great potential for future use cases, while both gaze input and hand gestures suffer from limited familiarity among users, compared to voice input and mobile apps.
Authors:
; ; ; ; ; ; ; ;
Award ID(s):
1852002 1800961
Publication Date:
NSF-PAR ID:
10323634
Journal Name:
Proc. of the ACM Symposium on Spatial User Interaction (SUI ’21)
Page Range or eLocation-ID:
1 to 11
Sponsoring Org:
National Science Foundation
More Like this
  1. Smart devices and Internet of Things (IoT) technologies are replacing or being incorporated into traditional devices at a growing pace. The use of digital interfaces to interact with these devices has become a common occurrence in homes, work spaces, and various industries around the world. The most common interfaces for these connected devices focus on mobile apps or voice control via intelligent virtual assistants. However, with augmented reality (AR) becoming more popular and accessible among consumers, there are new opportunities for spatial user interfaces to seamlessly bridge the gap between digital and physical affordances. In this paper, we present a human-subject study evaluating and comparing four user interfaces for smart connected environments: gaze input, hand gestures, voice input, and a mobile app. We assessed participants’ user experience, usability, task load, completion time, and preferences. Our results show multiple trade-offs between these interfaces across these measures. In particular, we found that gaze input shows great potential for future use cases, while both gaze input and hand gestures suffer from limited familiarity among users, compared to voice input and mobile apps.
  2. Voice assistants embodied in smart speakers (e.g., Amazon Echo, Google Home) enable conversational interaction that does not necessarily rely on expertise with mobile or desktop computing. Hence, these voice assistants offer new opportunities to different populations, including individuals who are not interested or able to use traditional computing devices such as computers and smartphones. To understand how older adults who use technology infrequently perceive and use these voice assistants, we conducted a three-week field deployment of the Amazon Echo Dot in the homes of seven older adults. Participants described increased confidence using digital technology and found the conversational voice interfaces easy to use. While some types of usage dropped over the three-week period (e.g., playing music), we observed consistent usage for finding online information. Given that much of this information was health-related, this finding emphasizes the need to revisit concerns about credibility of information with this new interaction medium. Although features to support memory (e.g., setting timers, reminders) were initially perceived as useful, the actual usage was unexpectedly low due to reliability concerns. We discuss how these findings apply to other user groups along with design implications and recommendations for future work on voice user interfaces.
  3. Depression is a common mental illness characterized by sadness, lack of interest, or pleasure. According to the DSM-5, there are nine symptoms, from which an individual must present 4 or 5 in the last two weeks to fulfill the diagnosis criteria of depression. Nevertheless, the common methods that health care professionals use to assess and monitor depression symptoms are face-to-face questionnaires leading to time-consuming or expensive methods. On the other hand, smart homes can monitor householders’ health through smart devices such as smartphones, wearables, cameras, or voice assistants connected to the home. Although the depression disorders at smart homes are commonly oriented to the senior sector, depression affects all of us. Therefore, even though an expert needs to diagnose the depression disorder, questionnaires as the PHQ-9 help spot any depressive symptomatology as a pre-diagnosis. Thus, this paper proposes a three-step framework; the first step assesses the nine questions to the end-user through ALEXA or a gamified HMI. Then, a fuzzy logic decision system considers three actions based on the nine responses. Finally, the last step considers these three actions: continue monitoring through Alexa and the HMI, suggest specialist referral, and mandatory specialist referral.
  4. Voice assistants embodied in smart speakers (e.g., Amazon Echo, Google Home) enable voice-based interaction that does not necessarily rely on expertise with mobile or desktop computing. Hence, these voice assistants offer new opportunities to different populations, including individuals who are not interested or able to use traditional computing devices such as computers and smartphones. To understand how older adults who use technology infrequently perceive and use these voice assistants, we conducted a 3-week field deployment of the Amazon Echo Dot in the homes of seven older adults. While some types of usage dropped over the 3-week period (e.g., playing music), we observed consistent usage for finding online information. Given that much of this information was health-related, this finding emphasizes the need to revisit concerns about credibility of information with this new interaction medium. Although features to support memory (e.g., setting timers, reminders) were initially perceived as useful, the actual usage was unexpectedly low due to reliability concerns. We discuss how these findings apply to other user groups along with design implications and recommendations for future work on voice-user interfaces.
  5. Voice interfaces are increasingly becoming integrated into a variety of Internet of Things (IoT) devices. Such systems can dramatically simplify interactions between users and devices with limited displays. Unfortunately voice interfaces also create new opportunities for exploitation. Specifically any sound-emitting device within range of the system implementing the voice interface (e.g., a smart television, an Internet-connected appliance, etc) can potentially cause these systems to perform operations against the desires of their owners (e.g., unlock doors, make unauthorized purchases, etc). We address this problem by developing a technique to recognize fundamental differences in audio created by humans and electronic speakers. We identify sub-bass over-excitation, or the presence of significant low frequency signals that are outside of the range of human voices but inherent to the design of modern speakers, as a strong differentiator between these two sources. After identifying this phenomenon, we demonstrate its use in preventing adversarial requests, replayed audio, and hidden commands with a 100%/1.72% TPR/FPR in quiet environments. In so doing, we demonstrate that commands injected via nearby audio devices can be effectively removed by voice interfaces.