skip to main content

Title: Auditory cognition and perception of action video game players

A training method to improve speech hearing in noise has proven elusive, with most methods failing to transfer to untrained tasks. One common approach to identify potentially viable training paradigms is to make use of cross-sectional designs. For instance, the consistent finding that people who chose to avidly engage with action video games as part of their normal life also show enhanced performance on non-game visual tasks has been used as a foundation to test the causal impact of such game play via true experiments (e.g., in more translational designs). However, little work has examined the association between action video game play and untrained auditory tasks, which would speak to the possible utility of using such games to improve speech hearing in noise. To examine this possibility, 80 participants with mixed action video game experience were tested on a visual reaction time task that has reliably shown superior performance in action video game players (AVGPs) compared to non-players (≤ 5 h/week across game categories) and multi-genre video game players (> 5 h/week across game categories). Auditory cognition and perception were tested using auditory reaction time and two speech-in-noise tasks. Performance of AVGPs on the visual task replicated previous positive findings. However, no significant more » benefit of action video game play was found on the auditory tasks. We suggest that, while AVGPs interact meaningfully with a rich visual environment during play, they may not interact with the games’ auditory environment. These results suggest that far transfer learning during action video game play is modality-specific and that an acoustically relevant auditory environment may be needed to improve auditory probabilistic thinking.

« less
; ; ; ;
Publication Date:
Journal Name:
Scientific Reports
Nature Publishing Group
Sponsoring Org:
National Science Foundation
More Like this
  1. Cybersecurity competitions are exciting for the game participants; however, the excitement and educational value do not necessarily transfer to audiences because audiences may not be experts in the field. To improve the audiences’ comprehension and engagement levels at these events, we have proposed a virtual commentator architecture for cybersecurity competitions. Based on the architecture, we have developed a virtual animated agent that serves as a commentator in cybersecurity competition. This virtual commentator can interact with audiences with facial expressions and the corresponding hand gestures. The commentator can provide several types of feedback including causal, congratulatory, deleterious, assistive, background, and motivational responses. In addition, when producing speech, the lips, tongue, and jaw provide visual cues that complement auditory cues. The virtual commentator is flexible enough to be employed in the Collegiate Cyber Defense Competitions environment. Our preliminary results demonstrate the architecture can generate phonemes with timestamps and behavioral tags. These timestamps and tags provide solid building blocks for implementing desired responsive behaviors.
  2. In this paper, we propose a deep multi-Task learning model based on Adversarial-and-COoperative nets (TACO). The goal is to use an adversarial-and-cooperative strategy to decouple the task-common and task-specific knowledge, facilitating the fine-grained knowledge sharing among tasks. TACO accommodates multiple game players, i.e., feature extractors, domain discriminator, and tri-classifiers. They play the MinMax games adversarially and cooperatively to distill the task-common and task-specific features, while respecting their discriminative structures. Moreover, it adopts a divide-and-combine strategy to leverage the decoupled multi-view information to further improve the generalization performance of the model. The experimental results show that our proposed method significantly outperforms the state-of-the-art algorithms on the benchmark datasets in both multi-task learning and semi-supervised domain adaptation scenarios.
  3. We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay, such as in financial or military simulations and computer games. During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well. After that, the algorithm has to produce a strategy that has low exploitability. Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly. For the stochastic game setting, we propose using the distribution of state-action value functions induced by a belief distribution over possible environments. We compare the performance of various exploration strategies for this task, including generalizations of Thompson sampling and Bayes-UCB to this new setting. These two consistently outperform other strategies.
  4. Speech emotion recognition (SER) is a challenging task due to the limited availability of real-world labeled datasets. Since it is easier to find unlabeled data, the use of self-supervised learning (SSL) has become an attractive alternative. This study proposes new pre-text tasks for SSL to improve SER. While our target application is SER, the proposed pre-text tasks include audio-visual formulations, leveraging the relationship between acoustic and facial features. Our proposed approach introduces three new unimodal and multimodal pre-text tasks that are carefully designed to learn better representations for predicting emotional cues from speech. Task 1 predicts energy variations (high or low) from a speech sequence. Task 2 uses speech features to predict facial activation (high or low) based on facial landmark movements. Task 3 performs a multi-class emotion recognition task on emotional labels obtained from combinations of action units (AUs) detected across a video sequence. We pre-train a network with 60.92 hours of unlabeled data, fine-tuning the model for the downstream SER task. The results on the CREMA-D dataset show that the model pre-trained on the proposed domain-specific pre-text tasks significantly improves the precision (up to 5.1%), recall (up to 4.5%), and F1-scores (up to 4.9%) of our SER system.
  5. Collaborative mixed reality games enable shared social experiences, in which players interact with the physical and virtual game environment, and with other players in real-time. Recent advances in technology open a range of opportunities for designing new and innovative collaborative mixed reality games, but also raise questions around design, technical requirements, immersion, safety, and player experience. This workshop seeks to bring together researchers, designers, practitioners, and players to identify the most pressing challenges that need to be addressed in the next decade, discuss opportunities to overcome these challenges, and highlight lessons learned from past designs of such games. Participants will present their ideas, assemble and discuss a collection of related papers, outline a unifying research agenda, and engage in an outdoor game ideation and prototyping session. We anticipate that the CSCW community can contribute to designing the next generation of collaborative mixed reality games and technologies and to support the growth of research and development in this exciting and emerging area.