skip to main content


Title: The unobtrusive group interaction (UGI) corpus
Studying group dynamics requires fine-grained spatial and temporal understanding of human behavior. Social psychologists studying human interaction patterns in face-to-face group meetings often find themselves struggling with huge volumes of data that require many hours of tedious manual coding. There are only a few publicly available multi-modal datasets of face-to-face group meetings that enable the development of automated methods to study verbal and non-verbal human behavior. In this paper, we present a new, publicly available multi-modal dataset for group dynamics study that differs from previous datasets in its use of ceiling-mounted, unobtrusive depth sensors. These can be used for fine-grained analysis of head and body pose and gestures, without any concerns about participants' privacy or inhibited behavior. The dataset is complemented by synchronized and time-stamped meeting transcripts that allow analysis of spoken content. The dataset comprises 22 group meetings in which participants perform a standard collaborative group task designed to measure leadership and productivity. Participants' post-task questionnaires, including demographic information, are also provided as part of the dataset. We show the utility of the dataset in analyzing perceived leadership, contribution, and performance, by presenting results of multi-modal analysis using our sensor-fusion algorithms designed to automatically understand audio-visual interactions.  more » « less
Award ID(s):
1631674
NSF-PAR ID:
10107379
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the 10th ACM Multimedia Systems Conference
Page Range / eLocation ID:
249 to 254
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Humans convey their intentions through the usage of both verbal and nonverbal behaviors during face-to-face communication. Speaker intentions often vary dynamically depending on different nonverbal contexts, such as vocal patterns and facial expressions. As a result, when modeling human language, it is essential to not only consider the literal meaning of the words but also the nonverbal contexts in which these words appear. To better model human language, we first model expressive nonverbal representations by analyzing the fine-grained visual and acoustic patterns that occur during word segments. In addition, we seek to capture the dynamic nature of nonverbal intents by shifting word representations based on the accompanying nonverbal behaviors. To this end, we propose the Recurrent Attended Variation Embedding Network (RAVEN) that models the fine-grained structure of nonverbal subword sequences and dynamically shifts word representations based on nonverbal cues. Our proposed model achieves competitive performance on two publicly available datasets for multimodal sentiment analysis and emotion recognition. We also visualize the shifted word representations in different nonverbal contexts and summarize common patterns regarding multimodal variations of word representations. 
    more » « less
  2. Group meetings can suffer from serious problems that undermine performance, including bias, "groupthink", fear of speaking, and unfocused discussion. To better understand these issues, propose interventions, and thus improve team performance, we need to study human dynamics in group meetings. However, this process currently heavily depends on manual coding and video cameras. Manual coding is tedious, inaccurate, and subjective, while active video cameras can affect the natural behavior of meeting participants. Here, we present a smart meeting room that combines microphones and unobtrusive ceiling-mounted Time-of-Flight (ToF) sensors to understand group dynamics in team meetings. We automatically process the multimodal sensor outputs with signal, image, and natural language processing algorithms to estimate participant head pose, visual focus of attention (VFOA), non-verbal speech patterns, and discussion content. We derive metrics from these automatic estimates and correlate them with user-reported rankings of emergent group leaders and major contributors to produce accurate predictors. We validate our algorithms and report results on a new dataset of lunar survival tasks of 36 individuals across 10 groups collected in the multimodal-sensor-enabled smart room. 
    more » « less
  3. null (Ed.)
    Context is of fundamental importance to both human and machine vision; e.g., an object in the air is more likely to be an airplane than a pig. The rich notion of context incorporates several aspects including physics rules, statistical co-occurrences, and relative object sizes, among others. While previous work has focused on crowd-sourced out-of-context photographs from the web to study scene context, controlling the nature and extent of contextual violations has been a daunting task. Here we introduce a diverse, synthetic Out-of-Context Dataset (OCD) with fine-grained control over scene context. By leveraging a 3D simulation engine, we systematically control the gravity, object co-occurrences and relative sizes across 36 object categories in a virtual household environment. We conducted a series of experiments to gain insights into the impact of contextual cues on both human and machine vision using OCD. We conducted psychophysics experiments to establish a human benchmark for out-of-context recognition, and then compared it with state-of-the-art computer vision models to quantify the gap between the two. We propose a context-aware recognition transformer model, fusing object and contextual information via multi-head attention. Our model captures useful information for contextual reasoning, enabling human-level performance and better robustness in out-of-context conditions compared to baseline models across OCD and other out-of-context datasets. All source code and data are publicly available at https://github.com/kreimanlab/WhenPigsFlyContext 
    more » « less
  4. null (Ed.)
    This design-focused practice paper presents a case study describing how a training program developed for academic contexts was adapted for use with engineers working in industry. The underlying curriculum is from the NSF-funded CyberAmbassadors program, which developed training in communication, teamwork and leadership skills for participants from academic and research settings. For the case study described here, one module from the CyberAmbassadors project was adapted for engineers working in private industry: “Teaming Up: Effective Group and Meeting Management.” The key objectives were to increase knowledge and practical skills within the company’s engineering organization, focusing specifically on time management as it relates to project and product delivery. We were also interested in examining the results of translating curricula designed for an academic setting into a corporate setting. Training participants were all from the dedicated engineering department of a US-based location of an international company that provides financial services. The original curriculum was designed for live, in-person training, but was adapted for virtual delivery after the company adopted a 100% remote workforce in response to the COVID-19 pandemic. The training was conducted in four phases: (1) train-the-trainer to create internal evangelists; (2) train management to build buy-in and provide sponsorship; (3) phased rollout of training to individual members of the engineering department, contemporaneous with (4) specific and intentional opportunities to apply the skills in normal business activities including Joint Architecture Design (JAD) sessions. Effectiveness was measured through surveys at the engineering management level (before, during, and after training), and through direct discussions with engineering teams who were tracked for four weeks after the training. A number of cultural shifts within the company were observed as direct and indirect outcomes of this training. These include the creation and standardization of a template for meeting agendas; a “grassroots” effort to spread the knowledge and best practices from trained individuals to untrained individuals through informal, peer-to-peer interactions; individuals at varying levels of company hierarchy publicly expressing that they would not attend meetings unless an appropriate agenda was provided in advance; and requests for additional training by management who wanted to increase performance in their employees. As a result of this adaptation from academic to industry training contexts, several key curricular innovations were added back to the original CyberAmbassadors corpus. Examples include a reinterpretation of the separate-but-equal leadership roles within meetings, and the elevation of timekeeper to a controlling leadership role within a meeting. This case study offers valuable lessons on translating training from academic/research settings to industry, including a description of how the “business case” was developed in order to gain approval for the training and sponsorship from management. Future work includes adapting additional material from the CyberAmbassadors program for applications in a business context, and the continued formal and informal propagation of the current material within the company. 
    more » « less
  5. In this work, we propose a video-based transfer learning approach for predicting problem outcomes of students working with an intelligent tutoring system (ITS). By analyzing a student's face and gestures, our method predicts the outcome of a student answering a problem in an ITS from a video feed. Our work is motivated by the reasoning that the ability to predict such outcomes enables tutoring systems to adjust interventions, such as hints and encouragement, and to ultimately yield improved student learning. We collected a large labeled dataset of student interactions with an intelligent online math tutor consisting of 68 sessions, where 54 individual students solved 2,749 problems. We will release this dataset publicly upon publication of this paper. It will be available at https://www.cs.bu.edu/faculty/betke/research/learning/. Working with this dataset, our transfer-learning challenge was to design a representation in the source domain of pictures obtained “in the wild” for the task of facial expression analysis, and transferring this learned representation to the task of human behavior prediction in the domain of webcam videos of students in a classroom environment. We developed a novel facial affect representation and a user-personalized training scheme that unlocks the potential of this representation. We designed several variants of a recurrent neural network that models the temporal structure of video sequences of students solving math problems. Our final model, named ATL-BP for Affect Transfer Learning for Behavior Prediction, achieves a relative increase in mean F -score of 50 % over the state-of-the-art method on this new dataset. 
    more » « less