skip to main content

Title: Multicraft: A Multimodal Interface for Supporting and Studying Learning in Minecraft
In this paper, we present work on bringing multimodal interaction to Minecraft. The platform, Multicraft, incorporates speech-based input, eye tracking, and natural language understanding to facilitate more equitable gameplay in Minecraft. We tested the platform with elementary, middle school students and college students through a collection of studies. Students found each of the provided modalities to be a compelling way to play Minecraft. Additionally, we discuss the ways that these different types of multimodal data can be used to identify the meaningful spatial reasoning practices that students demonstrate while playing Minecraft. Collectively, this paper emphasizes the opportunity to bridge a multimodal interface with a means for collecting rich data that can better support diverse learners in non-traditional learning environments.
; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
HCI in Games: Serious and Immersive Games. HCII 2021. Lecture Notes in Computer Science
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper presents an expansion to the Abstract Meaning Representation (AMR) annotation schema that captures fine-grained semantically and pragmatically derived spatial information in grounded corpora. We describe a new lexical category conceptualization and set of spatial annotation tools built in the context of a multimodal corpus consisting of 185 3D structure-building dialogues between a human architect and human builder in Minecraft. Minecraft provides a particularly beneficial spatial relation-elicitation environment because it automatically tracks locations and orientations of objects and avatars in the space according to an absolute Cartesian coordinate system. Through a two-step process of sentence-level and document-level annotation designed to capture implicit information, we leverage these coordinates and bearings in the AMRs in combination with spatial framework annotation to ground the spatial language in the dialogues to absolute space.
  2. Spatial reasoning is an important skillset that is malleable to training interventions. One possible context for intervention is the popular video game Minecraft. Minecraft encourages users to engage in spatial manipulation of 3D objects. However, few papers have chronicled any in-game practices that might evidence spatial reasoning, or how we might study its development through the game. In this paper, we report on 11 middle school students’ spatial reasoning practices while playing Minecraft. We use audio and video data of student gameplay to delineate five in-game practices that align with spatial reasoning. We expand on a student case study, to explicate these practices. The identified practices may be beneficial for studying spatial reasoning development in game-based environments and contribute to a growing body of research on ways games support development of important and transferable skills.
  3. The overall goal of our research is to develop a system of intelligent multimodal affective pedagogical agents that are effective for different types of learners (Adamo et al., 2021). While most of the research on pedagogical agents tends to focus on the cognitive aspects of online learning and instruction, this project explores the less-studied role of affective (or emotional) factors. We aim to design believable animated agents that can convey realistic, natural emotions through speech, facial expressions, and body gestures and that can react to the students’ detected emotional states with emotional intelligence. Within the context of this goal, the specific objective of the work reported in the paper was to examine the extent to which the agents’ facial micro-expressions affect students’ perception of the agents’ emotions and their naturalness. Micro-expressions are very brief facial expressions that occur when a person either deliberately or unconsciously conceals an emotion being felt (Ekman &Friesen, 1969). Our assumption is that if the animated agents display facial micro expressions in addition to macro expressions, they will convey higher expressive richness and naturalness to the viewer, as “the agents can possess two emotional streams, one based on interaction with the viewer and the other basedmore »on their own internal state, or situation” (Queiroz et al. 2014, p.2).The work reported in the paper involved two studies with human subjects. The objectives of the first study were to examine whether people can recognize micro-expressions (in isolation) in animated agents, and whether there are differences in recognition based on the agent’s visual style (e.g., stylized versus realistic). The objectives of the second study were to investigate whether people can recognize the animated agents’ micro-expressions when integrated with macro-expressions, the extent to which the presence of micro + macro-expressions affect the perceived expressivity and naturalness of the animated agents, the extent to which exaggerating the micro expressions, e.g. increasing the amplitude of the animated facial displacements affects emotion recognition and perceived agent naturalness and emotional expressivity, and whether there are differences based on the agent’s design characteristics. In the first study, 15 participants watched eight micro-expression animations representing four different emotions (happy, sad, fear, surprised). Four animations featured a stylized agent and four a realistic agent. For each animation, subjects were asked to identify the agent’s emotion conveyed by the micro-expression. In the second study, 234 participants watched three sets of eight animation clips (24 clips in total, 12 clips per agent). Four animations for each agent featured the character performing macro-expressions only, four animations for each agent featured the character performing macro- + micro-expressions without exaggeration, and four animations for each agent featured the agent performing macro + micro-expressions with exaggeration. Participants were asked to recognize the true emotion of the agent and rate the emotional expressivity ad naturalness of the agent in each clip using a 5-point Likert scale. We have collected all the data and completed the statistical analysis. Findings and discussion, implications for research and practice, and suggestions for future work will be reported in the full paper. ReferencesAdamo N., Benes, B., Mayer, R., Lei, X., Meyer, Z., &Lawson, A. (2021). Multimodal Affective Pedagogical Agents for Different Types of Learners. In: Russo D., Ahram T., Karwowski W., Di Bucchianico G., Taiar R. (eds) Intelligent Human Systems Integration 2021. IHSI 2021. Advances in Intelligent Systems and Computing, 1322. Springer, Cham., P., &Friesen, W. V. (1969, February). Nonverbal leakage and clues to deception. Psychiatry, 32(1), 88–106. Queiroz, R. B., Musse, S. R., &Badler, N. I. (2014). Investigating Macroexpressions and Microexpressions in Computer Graphics Animated Faces. Presence, 23(2), 191-208.

    « less
  4. Šķilters, J. ; Newcombe, N. ; Uttal, D. (Ed.)
    As excitement for Minecraft continues to grow, we consider its potential to function as an engaging environment for practicing and studying spatial reasoning. To support this exposition, we describe a glimpse of our current analysis of spatial reasoning skills in Minecraft. Twenty university students participated in a laboratory study that asked them to recreate three existing buildings in Minecraft. Screen captures of user actions, together with eye tracking data, helped us identify ways that students utilize perspective taking, constructing mental representations, building and place-marking, and error checking. These findings provide an initial impetus for further studies of the types of spatial skills that students may exhibit while playing Minecraft. It also introduces questions about how the design of Minecraft activities may promote, or inhibit, the use of certain spatial skills.
  5. In this paper, we demonstrate how machine learning could be used to quickly assess a student’s multimodal representational thinking. Multimodal representational thinking is the complex construct that encodes how students form conceptual, perceptual, graphical, or mathematical symbols in their mind. The augmented reality (AR) technology is adopted to diversify student’s representations. The AR technology utilized a low-cost, high-resolution thermal camera attached to a smartphone which allows students to explore the unseen world of thermodynamics. Ninth-grade students (N= 314) engaged in a prediction–observation–explanation (POE) inquiry cycle scaffolded to leverage the augmented observation provided by the aforementioned device. The objective is to investigate how machine learning could expedite the automated assessment of multimodal representational thinking of heat energy. Two automated text classification methods were adopted to decode different mental representations students used to explain their haptic perception, thermal imaging, and graph data collected in the lab. Since current automated assessment in science education rarely considers multilabel classification, we resorted to the help of the state-of-the-art deep learning technique—bidirectional encoder representations from transformers (BERT). The BERT model classified open-ended responses into appropriate categories with higher precision than the traditional machine learning method. The satisfactory accuracy of deep learning in assigning multiple labels ismore »revolutionary in processing qualitative data. The complex student construct, such as multimodal representational thinking, is rarely mutually exclusive. The study avails a convenient technique to analyze qualitative data that does not satisfy the mutual-exclusiveness assumption. Implications and future studies are discussed.« less