<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Understanding Multi-user, Handheld Mixed Reality for Group-based MR Games</title></titleStmt>
			<publicationStmt>
				<publisher>ACM</publisher>
				<date>04/17/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10545746</idno>
					<idno type="doi">10.1145/3653688</idno>
					<title level='j'>Proceedings of the ACM on Human-Computer Interaction</title>
<idno>2573-0142</idno>
<biblScope unit="volume">8</biblScope>
<biblScope unit="issue">CSCW1</biblScope>					

					<author>Carlos Augusto Bautista_Isaza</author><author>Daniel Enriquez</author><author>Hayoun Moon</author><author>Myounghoon Jeon</author><author>Sang Won Lee</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[<p>Research has identified applications of handheld-based VR, which utilizes handheld displays or mobile devices, for developing systems that involve users in mixed reality (MR) without the need for head-worn displays (HWDs). Such systems can potentially accommodate large groups of users participating in MR. However, we lack an understanding of how group sizes and interaction methods affect the user experience. In this paper, we aim to advance our understanding of handheld-based MR in the context of multiplayer, co-located games. We conducted a study (N = 38) to understand how user experiences vary by group size (2, 4, and 8) and interaction method (proximity-based or pointing-based). For our experiment, we implemented a multiuser experience for up to ten users. We found that proximity-based interaction that encouraged dynamic movement positively affected social presence and physical/temporal workload. In bigger group settings, participants felt less challenged and less positive. Individuals had varying preferences for group size and interaction type. The findings of the study will advance our understanding of the design space for handheld-based MR in terms of group sizes and interaction schemes. To make our contributions explicit, we conclude our paper with design implications that can inform user experience design in handheld-based mixed reality contexts.</p>]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>Virtual reality (VR) experiences are becoming popular among consumers due to technological advancements and the increasing affordability of VR head-worn devices (HWDs). While VR HWDs can offer high levels of immersion to users when experiencing virtual environments (VEs), VR HWDs are poorly suited to some users. For example, children <ref type="bibr">[9]</ref> and those who experience HWDinduced headaches or nausea <ref type="bibr">[33]</ref> are discouraged from wearing HWDs. In addition, people who wear makeup <ref type="bibr">[34]</ref>, have coarsely textured hair <ref type="bibr">[38]</ref>, or wear prescription glasses <ref type="bibr">[57]</ref> may not be willing to wear HWDs due to the discomfort or inconvenience that would result. Furthermore, certain activities that involve multiple dynamically moving users in a co-located space, such as recreational games, may not be compatible with the use of VR HWDs due to the risk of collision or potential injury <ref type="bibr">[39,</ref><ref type="bibr">52,</ref><ref type="bibr">60]</ref>.</p><p>In response to these challenges, handheld VR-using handheld devices as windows to view VR worlds-has been used as a solution for making VR content approachable to broader ranges of users <ref type="bibr">[20,</ref><ref type="bibr">24,</ref><ref type="bibr">68]</ref>. This research thread effectively broadens the options available to accommodate users for whom VR HWDs are not an option, offering a new avenue to design for multiple users in VR and explore how expansive and dynamic this alternative VR experience can be. While previous works in handheld VR alluded to the potential of scalability in these systems <ref type="bibr">[16,</ref><ref type="bibr">20,</ref><ref type="bibr">24,</ref><ref type="bibr">25]</ref>, larger-scale systems involving more than two users have yet to be studied.</p><p>While group size effects have been a recurring topic at CSCW <ref type="bibr">[44,</ref><ref type="bibr">58,</ref><ref type="bibr">61,</ref><ref type="bibr">65]</ref>, we do not have a clear understanding of how group size in MR can affect the user experience. Understanding the effects of group size is significant because it changes the rewarding mechanisms and cognitive load that each individual can have <ref type="bibr">[50,</ref><ref type="bibr">69]</ref>. Group size is a particularly interesting topic in handheld VR, as users can maintain situational awareness in the real and virtual worlds simultaneously, turning any VR application into something approaching a mixed-reality (MR) experience <ref type="bibr">[46]</ref>. This awareness, which is unavailable to HWD users, can invite more active, dynamic, and physical interaction, affording a sport-like game experience or socially constructed learning. For example, such systems will allow for novel types of group-based MR content, such as room-scale competitive games where tens of children run around holding tablets, or MR-based learning, where teachers teach a group of students with virtual content in social settings. Lastly, interaction design should also be considered in multiuser settings, as it can affect how groups of people interact with each other and use interactive systems <ref type="bibr">[5,</ref><ref type="bibr">19,</ref><ref type="bibr">59]</ref>.</p><p>This paper aims to investigate how group sizes and interaction methods can affect the overall user experience in a handheld VR game setting. We defined two research questions intended to serve as means of assessing the effects of group size and interaction methods on users' experiences in handheld, co-located MR experiences:</p><p>&#8226; (RQ1) How does the group size in a competitive game affect users' experiences in handheld MR? &#8226; (RQ2) How do different types of interaction methods (proximity-based vs. pointing-based) affect users' experiences in handheld MR?</p><p>To answer these questions, we implemented MOMIS, a handheld MR environment and competitive game for up to 10 players. The object of the game is simple: claim more balloons in the virtual environment than the opposing team. The context of a group-based recreational game allows us to explore various types of interaction, incorporating both cooperative and competitive aspects of interactive systems. In addition, unlike alternative activities, such as team-based tasks, learning environments, or interactive demos, playing a game gives participants an immediate and explicit goal-to win the game-effectively motivating them to explore the physical space. We used the system to conduct a user study in which we varied group sizes within the game (1:1, 2:2, and 4:4) and the degree to which participants actively move, encouraged by two distinct interaction types: Poke (proximity-based interaction) and Shoot (pointing interaction at a distance).</p><p>Our results indicate that interaction types can affect social presence and perceived workload. In addition, we found that participants had a level of situational awareness in the physical world sufficient to avoid physical contact with other participants even in the largest and most dynamic setting (4:4, Poke). However, individuals diverged in their preferences for game settings. Some participants prioritized group size preferences (smaller or larger), while others gave more weight to interaction type preferences (Poke or Shoot). Overall, participants felt positive about the dynamics that the game enabled in VR.</p><p>The findings of the study will advance our understanding of the design space of handheld MR in terms of group sizes and interaction schemes. To make our contributions explicit, we conclude our paper with design implications that can inform user experience design in handheld MR.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">RELATED WORK</head><p>We identified three distinct areas that motivated our research: <ref type="bibr">(1)</ref> the investigation of group size in co-located groupware; (2) co-located, handheld MR; and (3) the use of mobile devices to enable asymmetric, co-located VR systems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">The Effects of Group Size and Interaction Methods in Groupware</head><p>Existing literature on collaborative MR largely avoids delving into larger-scale collaboration involving more than two people. Researchers have given attention to the potential for large-scale collaboration, especially in the context of handheld MR; for instance, Rogers et al. identified a knowledge gap concerning multiplayer VR games with more than two users <ref type="bibr">[56]</ref>. Similarly, Dagan et al. suggested "enable any number of players" as part of their design recommendations for mobile AR <ref type="bibr">[14]</ref>. However, existing works that involve handheld VR have typically validated their studies with just one or two mobile device users during their evaluations <ref type="bibr">[10,</ref><ref type="bibr">16,</ref><ref type="bibr">20,</ref><ref type="bibr">24]</ref>. Authors also explicitly mention scaling interactions to accommodate larger groups as a direction to take up in future work. In practice, however, researchers tend to neglect the possibility of iterating larger-scale versions of the same applications. Consequently, we still lack an understanding of the effects of group size in co-located MR settings, despite the fact that many collaborative, handheld MR works can accommodate more than two users <ref type="bibr">[18]</ref>.</p><p>Investigating the effects of group size has traditionally been a prevalent topic at CSCW, especially when groupware technologies support novel, co-located collaboration <ref type="bibr">[44,</ref><ref type="bibr">61,</ref><ref type="bibr">65]</ref>. Such studies have been particularly plentiful for technologies that encourage spatial interaction, such as tabletop displays or shared large-scale displays, and many researchers have also investigated these technologies' impacts on task performance. Ryall et al. found that depending on the group size, people exhibited varying work strategies to achieve the same collaborative goal <ref type="bibr">[58]</ref>. For instance, group size significantly influenced strategies for organizing shared resources <ref type="bibr">[58]</ref>. Tang et al. discovered that closely shared perspectives were more important than avoiding disembodied visualizations of users in three-way remote communication <ref type="bibr">[65]</ref>. Oftentimes, researchers study interaction methods or technological configurations that may exert an additional influence on interaction alongside group size effects. Researchers found that group size interacted with other design factors in groupware: how information was presented <ref type="bibr">[59]</ref>, how displays were configured <ref type="bibr">[19]</ref>, and the sizes of target objects <ref type="bibr">[43]</ref>. These results suggest that designers must consider adjustments to their interaction schemes as group sizes increase. In our study, we aim to test whether the interaction between group size and interaction method (proximity-or pointing-based) can influence the user experience.</p><p>Lastly, the kinds of movements a game affords users for interaction can greatly affect the user experience <ref type="bibr">[45]</ref>. In the case of "Brick", a collaborative, handheld AR game, several insights were gained for designing room-scale collaborative games <ref type="bibr">[10]</ref>. The researchers suggested that the interaction method should align with users' existing mental models for familiar devices. On the other hand, multiplayer interaction in AR offers a broad design space for physical proximity and exertion. Designers can consciously decide whether to emphasize these values or not. Another study proposed "encouraging touch through proxemics" for playful, co-located interaction <ref type="bibr">[14]</ref>. In this regard, we chose to study the most intuitive interaction for the touch screen on a mobile device (Shoot, pointing-based) and another interaction type that can promote physical proximity and exertion (Poke, proximity-based). Our work further explores the design possibilities enabled by users' even more rapid and active room-scale movement by incorporating competitiveness into our game design.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Handheld MR for Co-located Collaboration</head><p>Mobile device-only, co-located collaboration has been frequently explored in AR settings <ref type="bibr">[17,</ref><ref type="bibr">29,</ref><ref type="bibr">30,</ref><ref type="bibr">42,</ref><ref type="bibr">53,</ref><ref type="bibr">55]</ref>. Feng et al. conducted a survey of co-located AR collaboration between 2012 and 2022 and found that handheld AR is frequently used owing to its flexibility and in spite of its limitations (e.g., a narrow field of view and hands being occupied); half of the surveyed papers utilized hand-held displays <ref type="bibr">[18]</ref>. However, using exclusively mobile devices for VR has not been frequently observed. The distinction between handheld AR and handheld VR should be based on whether virtual objects are positioned within the physical context or within a virtual environment (VE) <ref type="bibr">[41]</ref>. Using a mobile device's camera, on-screen visuals can represent the physical world in mobile AR with virtual objects overlaid, while the content on the screen can be entirely virtual in handheld VR <ref type="foot">1</ref> . When users view VR content on mobile phones, the limited field of view available on small screens results in limited immersiveness, but increased awareness of the physical space and the co-presence of other users, given that users can see beyond the mobile device screens. In one study, the authors even referred to the mode of using monoscopic VR on mobile phones as "AR mode," despite the on-screen visuals being entirely virtual <ref type="bibr">[47]</ref>. The author's perception of handheld VR might have been equivalent to AR due to the visual access to other actors and the physical environment available when using mobile devices to view VR content. Hence, the nature of handheld VR makes it challenging to distinguish between mobile VR and mobile AR. M&#252;ller et al. conducted a comparison of handheld VR and handheld AR in co-located and remote collaborative setups, finding no statistically significant differences between the two conditions in terms of task performance, preference, and workload <ref type="bibr">[46]</ref>. The study did find individual differences; half of the participants demonstrated a preference for VR, while the remaining half expressed a preference for AR.</p><p>Compared to the long-standing and extensive history of literature in collaborative mobile AR <ref type="bibr">[18,</ref><ref type="bibr">54,</ref><ref type="bibr">62]</ref>, we have a limited understanding of the counterpart in handheld VR <ref type="bibr">[46]</ref>. This gap suggests the existence of an underexplored space for larger-scale symmetric collaboration in VR using mobile devices. The ubiquity and adaptability of mobile phones and tablets facilitates access for many groups who are already familiar with the devices, as well as for those who cannot or would prefer not to wear an HWD; for example, individuals might be discouraged from wearing an HWD due to concerns about their health <ref type="bibr">[9,</ref><ref type="bibr">33]</ref> or personal appearance <ref type="bibr">[34,</ref><ref type="bibr">57]</ref>. While symmetric collaboration among HWD users outperformed asymmetric (HWD/mobile) and mobile-only conditions in object manipulation tasks, there was no significant difference in social presence <ref type="bibr">[23]</ref>. This suggests that in contexts in which social engagement is central to the user experience (e.g., entertainment, learning, games), handheld collaborative VR may be an effective option. In our work, we explore the potential of handheld VR as an inclusive and social alternative to VR HWD-based collaboration.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Using Mobile Devices for Asymmetric Collaboration in VR</head><p>Many previous works have used mobile devices as an alternative to make VR accessible by allowing non-HWD users to view VR through these devices. The typical approach is to enable asymmetrical collaboration among HWD and non-HWD users, meaning that users use two distinct types of devices. ShareVR is one such example where a non-HWD user uses a mobile device that acts as a second viewport to the virtual environment (VE) <ref type="bibr">[24]</ref>. The VE is also projected onto the floor from a top-down perspective, offering non-HWD users an extended physical representation of the VE. In VR Invite, Freiwald et al. developed a similar system to include older adults who may experience severe nausea when wearing VR HWDs <ref type="bibr">[20]</ref>. TransceiVR took a different approach, allowing HWD users and non-HWD users to collaborate in co-located conditions. The system mirrored HWD users' perspectives and gave non-HWD users the ability to navigate their view temporally-that is, to rewind their perspectives <ref type="bibr">[68]</ref>. WebTransceiVR extended TransceiVR by enabling online participants to see local users' views, essentially acting as cameramen <ref type="bibr">[36]</ref>. Similarly, XRDirector used mobile devices to control camera movement in VR-based movie production, where VR HWD actors animate 3D characters <ref type="bibr">[47]</ref>. Drey et al. conducted an evaluation of asymmetric collaboration in the context of learning, revealing that the use of mobile devices can lead to an increased cognitive load for teachers, yet also result in comparable learning outcomes <ref type="bibr">[16]</ref>. FaceDisplay took a unique approach to sharing an HWD user's view with non-HWD users by mounting touch screens on the surface of a VR HWD. The displays on the surface of the VR HWD allowed people around the HWD user to see what the user saw and interact with the user <ref type="bibr">[25]</ref>. Asymmetrical collaboration has also been applied to numerous commercial games, enhancing cooperative multiplayer experiences with unique asymmetric roles <ref type="bibr">[1]</ref><ref type="bibr">[2]</ref><ref type="bibr">[3]</ref><ref type="bibr">[4]</ref>. While these approaches vary, all of these works share the goal of making VR more inclusive of users who are unwilling or unable to wear HWDs. Our work aims to explore symmetric co-located collaboration using mobile devices exclusively, where users' roles shift from being secondary viewers to primary actors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">MOMIS: SYSTEM OVERVIEW</head><p>To explore handheld MR with support for multiple users, we created MOMIS, which stands for Mobile-based MR for Informal Settings: a virtual environment (VE) platform that uses motiontracked, six-degree-of-freedom (6DOF) mobile devices. The platform is designed to allow people to interact with a VE through a mobile device similarly to HWD users. A mobile user can look into the VE by using a mobile device that acts as a "window" into the virtual world, similarly to previous works <ref type="bibr">[20,</ref><ref type="bibr">24]</ref>. With this window, the mobile user can see the VE from the perspective that corresponds to their own position and orientation in the physical world. In this section, we present the technical and design details of the system. The platform was developed and improved over multiple iterations; the choices made may be useful for future researchers and practitioners creating handheld mixed reality systems. Figure <ref type="figure">1</ref> shows what the system looks like when multiple users are using the system at the same time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Platform Specification</head><p>3.1.1 Hardware Setup. In this section, we introduce the hardware setup we used to enable 6DOF motion-tracked tablets. A VR-ready desktop computer runs SteamVR <ref type="bibr">[64]</ref> and updates the positions of VR HWDs (if used) and tablet-mounted VIVE <ref type="bibr">[13]</ref> trackers. Although we did not include VR HWD users in our study, asymmetric interaction is also feasible in MOMIS, as in previous works <ref type="bibr">[20,</ref><ref type="bibr">24]</ref>. The size of the virtual-physical environment in MOMIS is limited by the specifications of VIVE base stations, which support areas up to 10 meters by 10 meters <ref type="bibr">[13]</ref>. We placed the VIVE base stations two meters above the ground to achieve better angles for tracking, using dedicated tripods. To track the exact positions of the tablets, a VIVE tracker is mounted on top of each device. While we initially attached the VIVE trackers to the backs of tablets using Velcro tape, during the pilot study, we noticed that tablets could occlude the trackers' markers depending on how users oriented their tablets, especially when users oriented their tablets to interact with virtual objects at ground level. We changed the position to the top of the tablet using a commodity tablet mount to minimize occlusions by mobile devices or other users, especially given the amount of simultaneous users. We fabricated wooden attachments using laser cutters to mount each tracker to a tablet, as seen in Figure <ref type="figure">2</ref> (right). The VIVE base stations updated each VIVE tracker's position and orientation in real time.</p><p>The computer used to run the system was the Dell Alienware Aurora R12 <ref type="bibr">[6]</ref> equipped with an 11th-generation Intel Core i7 11700F CPU running at 2.5 GHz and an NVIDIA GeForce RTX 3080 with 16 GB of memory. It concurrently ran the system at 90 frames per second and updated SteamVR at the same frequency. We chose Samsung Galaxy A7 Lite (8.7", 32 GB) tablets for their light weight and computing power, as after testing a few different tablet devices, we found that holding a tablet for an extended period of time can be physically exhausting.</p><p>3.1.2 Software Setup. The VIVE tracker mounted on a tablet dictates the position and orientation of the tablet user's viewing perspective in the VE. The position and orientation of the tracker relative to the center of the tablet was calculated and coded to correctly orient the tablet user's perspective. As users move while holding tablets, their positions in the VE are updated accordingly. To interact with others inside the VE, a user's position relative to others in the physical environment should match their virtual position. Therefore, to avoid desynchronizing users' virtual and physical locations relative to other users, locomotion methods that do not correspond to physical movements (e.g., teleportation) were not used. Because the VE necessarily corresponds to the physical environment where the system is installed, the virtual objects that users can interact with must be placed within the tracked and physically accessible area in the room. In MOMIS, we used a virtual space measuring 6 meters by 6 meters to match the scale of the development lab space and the scale of the user study space.</p><p>We developed a standalone mobile app in Unity <ref type="bibr">[66]</ref>. The application's first screen presents users with a graphical interface with two consecutive screens: one where they can specify their tracker numbers, which are written on the physical devices (Figure <ref type="figure">4</ref>-Right), and one to choose an interaction type (Figure <ref type="figure">4</ref>-Left). After the brief configuration process, users can enter the VE and interact with the main experience.</p><p>To handle networking, MOMIS uses Photon Unity Networking (PUN) <ref type="bibr">[51]</ref> to transmit information to the VR-ready computer via Wi-Fi to synchronize users and interactions inside the shared VE, and to update each tracker's position and orientation. The Photon server is instantiated upon creation by the host user (a research team member) on a VR-ready computer. The host user runs SteamVR, which concurrently updates and shares each tracker's position and orientation to all devices, allowing mobile device users to see the VE from where they stand. Interactions are shared through remote procedure calls using PUN. Users participating in the experience are visualized as tablet-shaped boxes so that users can see each others' locations through the screen as well.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">MR Content: Competitive Multiplayer Game Experience</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1">Game Design. We considered the following design goals in creating MR content for the MOMIS platform:</head><p>&#8226; Interaction must be intuitive; a novice user should be able to instantly learn how to use the system. &#8226; The MR content must involve multiplayer components, such as collaboration or competition among users. &#8226; The content needs to accommodate varying numbers of participants to allow us to investigate the effects of group size. &#8226; The MR content needs to encourage users to dynamically and actively move within the virtualphysical space. The principles specified above consider usability, practical needs in running a user study, the multiuser nature of the system, and the research questions we aim to answer. We sought to develop multiuser interactive content that is simple, yet engaging enough to create meaningful experiences through which to answer our questions. With these principles in mind, we decided to make a simple, team-based game with differing interaction methods. It is worth noting that MOMIS is a platform, and the MR content presented in this section is just an instrument for our study. MOMIS could also be used for other types of content in other contexts.</p><p>In the game, players form two teams (red and blue) and compete against each other to accumulate points by interacting with virtual objects-more specifically, by changing the colors of virtual balloons. When the game begins, all the balloons are white. When a player interacts with a balloon, its color changes to match the player's team color. Teams earn one point each time a balloon changes to the player's team color. For example, if a player on the red team interacts with a white balloon, the white balloon will turn red, and the red team will score one point. Even after a team has claimed a balloon, the opposing team can still reclaim the balloon and change its color again. For example, the blue team can still earn a point by turning a red balloon blue. The game is thus continuous and competitive, as balloons can change colors repeatedly within the two-minute time limit. The experience requires users to move inside the virtual-physical world while interacting inside the VE. While the game is simple and straightforward, it can be influenced by how much players perform in consideration of others: other players' locations, their anticipated behaviors, and collision avoidance for safety. This cognitive effort required of players is relevant to many theoretical constructs in collaborative system design (e.g., workspace awareness <ref type="bibr">[26]</ref>, spatiality <ref type="bibr">[49]</ref>, nonverbal communication <ref type="bibr">[11]</ref>, territoriality <ref type="bibr">[61]</ref>, and feedthrough <ref type="bibr">[15]</ref>).</p><p>Other players inside the experience are represented by avatars with matching team colors. The avatars resemble floating flat boxes, patterned after the tablets and their orientations (note the red, tablet-shaped box in Figure <ref type="figure">5</ref>-(Right)). A small cube on the back of the box represents the position of the tablet's user. More realistic avatars with legs, arms, torsos, or heads would need to display additional information accurately to avoid interfering with users' VE experiences. We did not attempt to visualize a full-body avatar because the assumed locations of players' limbs inside the VE (in the absence of tracking data) may not be accurate; this may be an interesting topic for future research.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2">Interaction</head><p>Methods: Poke vs. Shoot. One of the research questions that we set out to answer is how the extent to which users dynamically move affects the user experience: social presence, engagement, workload, and even safety issues (e.g., physical contact). To address these considerations, we developed two methods of interacting with virtual objects-Poke and Shoot-which differ in the required action and distance. For Poke, users directly make contact with a virtual object using their devices, as seen in Figure <ref type="figure">3</ref>-Left. This method uses a simple collision detection algorithm. For Shoot, tapping the screen shoots a ball-shaped projectile or "bullet" capable of traveling up to two meters from the center of the tablet in the VE, as seen in Figure <ref type="figure">3</ref>-Right.</p><p>The trajectory of the projectile in the Shoot interaction is a straight line orthogonal to the tablet plane at the time the user touches the screen. The projectile's velocity was chosen to reflect a typical walking speed, such that players cannot save time by shooting balloons from a distance instead of walking to them. Initially, we tested ray casting as a third interaction method, analogous to the Poke method but using an infinitely long rod attached to the back of the player's device in the virtual space. However, we realized that this made the game dynamics too chaotic; a player could potentially stand in one spot and continuously spin to find balloons, which would make the player dizzy. The speed and distance constraints in Shoot allowed users to interact with virtual objects at a short distance, but required them to adjust to the projectile's trajectory. In both interaction types, a successful interaction (i.e., one that changes a balloon's color) triggered a beep as an audio cue. In the case of Shoot, firing a projectile triggered a beep audio cue as well.</p><p>During the pilot study, we confirmed that the differences between Poke and Shoot affected how participants moved in the physical environment (see the supplementary videos). In the case of  Poke, navigation and interaction require tablets-and by extension, their users-to move more or less constantly. Meanwhile, Shoot encourages users to locate target balloons, then stand still to aim, fire, and confirm the color change, resulting in less movement overall. These two interaction types allowed us to investigate the effects of people moving dynamically in handheld MR.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.3">User Interface.</head><p>The tablets run an Android application that renders a Unity scene. Apart from displaying the VE, the application also displays three UI modules: lobby configuration screens, a scoreboard, and a team color indicator. Before users join a game, they must choose either the Poke or Shoot interaction type (4-Left). After choosing an interaction type, users link their trackers to the system using the tracker's ID number, as shown in Figure <ref type="figure">4</ref>-Right.</p><p>Upon entering the experience, users see the VE and can explore it with their tablets, as seen in Figure <ref type="figure">5</ref>-(Left). The scoreboard and team color indicators appear at the edge of the screen in translucent colors to minimize the occlusion of the VE. Thus, users can see which team they are on and what score each team has achieved.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">USER STUDY</head><p>To explore the affordances and limitations of co-located, multiuser, handheld MR experiences, we designed the following user study. For reference, our research questions are presented once more below.</p><p>&#8226; RQ1: How does the group size in a competitive game affect users' experiences in handheld MR? &#8226; RQ2: How do different types of interaction methods (proximity-based vs. pointing-based) affect users' experiences in handheld MR? 4.1 Study Design 4.1.1 Independent Variables.</p><p>To explore the effects of different experience conditions on the platform's affordances and user experience, we designed a 2 &#215; 3 within-subjects study with two independent variables: Interaction Type with virtual objects and Group Size; that is, how many participants were playing the game at once. Each participant repeated the game six times, once in each of the six possible conditions. We had two interaction type conditions; namely, Poke (proximity-based interaction) and Shoot (pointing-based interaction), as outlined in &#167; 3.2.2; as well as three conditions for group size; namely, 1:1 (small), 2:2 (medium), and 4:4 (large). We scaled the group sizes by doubling, as opposed to linearly increasing the size, to explore a bigger scale without forcing participants to undergo an excessive number of trials. We did not include device type as a factor (VR HWDs vs. mobile devices) because VR HWDs are not viable options for certain populations, such as children (due to age restrictions) and individuals who experience motion sickness. Previous research has already compared VR headsets with mobile devices in a one-on-one setting, demonstrating a higher level of presence <ref type="bibr">[24]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.2">Dependent Variables.</head><p>To measure participants' game experiences with different interaction types and group sizes, we wrote a post-game questionnaire asking about perceived presence, social presence, engagement, and workload for each experience condition.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#8226; [Presence]</head><p>We used a subset (three items) of Slater, Usoh, and Steed's presence questionnaire (SUS) <ref type="bibr">[63]</ref> on (1) the sense of "being there" in the virtual environment, (2) the extent of perceiving the virtual environment as the dominant reality, and (3) the extent of perceiving the virtual environment as visiting a "place" rather than viewing images. We calculated a presence score by counting the number of 6 or 7 responses (on a seven-point scale) to the three questions, resulting in a number in the range [0, 3] as suggested by the original authors <ref type="bibr">[63]</ref>.</p><p>&#8226; [Social Presence and Engagement] We used the social presence module and the in-game module for social presence and engagement, respectively, from the Game Experience Questionnaire <ref type="bibr">[31]</ref>. Among the three components within the social presence module, we used only six questions (all on a five-point scale) from the behavioral involvement component <ref type="bibr">[24]</ref>. For engagement, we used two questions per component to measure each of the six components, yielding a total of twelve questions organized into six subscales: Competence, Flow, Positive Affect, Negative Affect, Tension, and Challenge. We discarded the Sensory and Imaginative Immersion components, as our goal was not to measure the quality of the game's plot or its visual appeal; indeed, the game had no storytelling component at all. &#8226; [Perceived Workload] We used NASA-TLX <ref type="bibr">[27]</ref> to measure perceived workload in terms of Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, and Frustration, with values in the range [0, 100]. &#8226; [Perceived Connectivity] We investigated whether the increased group size and dynamic interaction could negatively impact perceived connectivity for various reasons; for instance, the tracking program on the desktop computer may slow down as the group size increases, or the tablet-mounted VIVE trackers may be occluded more in bigger groups or when players move more actively. We asked participants if they observed a loss of connection on their own devices and on others' devices on a scale from 0 (Never) to 4 (Always).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#8226; [Preference]</head><p>We asked participants to rank the six conditions in terms of preference, from Rank 1 (most enjoyable) to Rank 6 (least enjoyable).</p><p>We observed participants during the experiment to see if there were any potential collision risks.</p><p>Participants also left open-ended feedback on what components made the experiences enjoyable or frustrating, and how these experiences compared with their previous VR experiences.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Procedure</head><p>Participants were recruited in groups of eight per session, as the large group size condition required four players on each of the two teams. Participants were briefed on the purpose and content of the experiment and confirmed their willingness to participate. Then, participants signed a consent form approved by the university's Institutional Review Board and were asked to complete a demographic questionnaire. Participants were assigned tracker IDs in the order of their arrival.</p><p>If participants came in pairs, they were assigned to consecutive numbers to play one-on-one games with each other. Before beginning the experiment, participants were trained to use both interaction methods (Poke and Shoot) until they felt comfortable with them. Participants then experienced six different conditions with a mixture of two interaction methods and three group sizes: (a </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Participants</head><p>We recruited 40 participants through online advertisements in university communities. Two participants, who arrived late, could not participate in all six sessions, so their data were excluded from the following analysis to maintain the within-subjects nature of the study. Those two participants were replaced by members of the research team in the sessions that took place before their arrival, so all sessions nonetheless had the correct participant counts. The remaining 38 participants (26 male, 12 female) had an average age of 25.79 (std. dev. = 5.01). Among the 38 participants, 27 had prior experience with VR, and 26 had prior experience with multiplayer games. Each participant received a $10 electronic gift card as compensation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Data Analysis</head><p>We present the results of the statistical analysis of our quantitative and qualitative studies in this section. In the quantitative study, scores for Social Presence, Engagement, and Workload were analyzed using a 2 &#215; 3 within-subjects experiment design (for interaction type and group size). We performed Shapiro-Wilk tests to determine if the data were normally distributed for all of the dependent variables. The data did not appear to be normally distributed, so we applied the nonparametric aligned rank transform (ART) process <ref type="bibr">[71]</ref> and then ran a repeated-measures ANOVA for each dependent variable on the transformed values, using paired-samples t tests for post hoc analysis after a significant ANOVA. We used a Bonferroni correction to control for the multiple comparisons in this analysis. For example, a p-value of less than &#120572; = .05/3 = .017 would be considered significant in a post hoc analysis of group size (1:1, 2:2, 4:4). In the specific case of presence, using the SUS presence questionnaire, we followed the procedure suggested by the authors to count the number of "6" and "7" responses to the three 7-point scale items. Therefore, the presence score was a value in the range [0, 3] <ref type="bibr">[63]</ref>. We used this score to perform a repeatedmeasures ANOVA on the ART-transformed values. We checked all the interactions between the two factors and there was no statistical significance between the two factors for all metrics, so we do not present the interaction term in the result tables. Instead, we present only the average and standard deviation per level of the two factors (Table <ref type="table">2</ref>) and the statistical test results (Table <ref type="table">1</ref>).</p><p>For the qualitative portion of the study, participants were asked to provide comments in the form of responses to open-ended questions. We used two questions for this purpose: (a) "What components made you enjoy the experience and what made you feel frustrated?" and (b) "If you had any previous VR experience, how would you compare this experience (e.g. VR headsets)?" We ran a thematic analysis and identified the most relevant topics and quotes to complement our quantitative results. Additionally, the researchers running the study conducted an informal observational study; we describe the most relevant observations in a separate section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">RESULTS</head><p>In this section, we outline our results in terms of each dependent variable: Presence, Social Presence, Workload, and Perceived Connectivity. We also present qualitative results from our own observations and the responses to open-ended items from the exit survey. The results of significance tests for each combination of a dependent and independent variable are summarized in Table <ref type="table">1</ref>. Averages and standard deviations are provided in Table <ref type="table">2</ref>, in the Appendix.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Presence</head><p>No factors had significant effects on Presence, although the figures for group size (&#119865; (2, 74) = 2.90, &#119901; = 0.061) and the interaction between group size and interaction type (&#119865; (2, 74) = 2.57, &#119901; = 0.084) suggest further research may be warranted. There was no significant effect found in terms of Interaction Type.</p><p>In our open-ended feedback, the majority of participants did not make any comments about immersiveness or presence in comparison to VR HWDs. Some participants (8/38) who had used VR HWDs admitted that VR HWDs are "more immersive" (P4, P10, P14) than mobile devices. A few participants (3/38) reported that they felt a high level of presence, stating that they were immersed in the game, or that the experience was comparable to (or better than) using VR HWDs.</p><p>&#8226; (P1) "The games were very engaging. I almost forgot that I was in VR. "  &#8226; (P22) "I thought this was just as interactive as using a VR headset, which I was surprised about. "</p><p>&#8226; (P29) "Most fun I've had in a VR game. "</p><p>While it seems that only a few participants considered MOMIS impressive compared to alternative systems, the participants discovered other benefits in handheld MR, discussed later. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Social Presence</head><p>We found significant differences in social presence by interaction type. The result for Poke (&#119872; = 2.97, &#119878;&#119863; = 0.82) was significantly higher than that for Shoot (&#119872; = 2.66, &#119878;&#119863; = 0.93, &#119865; (1, 37) = 17.86, &#119901; &lt; .001). The result indicates that using Poke may be more effective in fostering the sense of being with others. We could not find evidence that group size affected participants' perceptions presence. Social presence in this context is not comparable with games or other VEs. In games or VEs, other users are visualized as avatars or game characters. In the case of MOMIS, users see each other in the co-located space, and the VEs do not have full-body avatars as in other VR content. Therefore, even when they have to focus on on-screen content, the participants should have been able to maintain awareness of other participants through their peripheral vision, if not look directly at them.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Engagement</head><p>The effect of group size on challenge perception was statistically significant (&#119865; (2, 74) = 2.65, &#119901; =&lt; 0.001): "I felt challenged, I had to put a lot of effort into it. " The post hoc analysis indicated that the 1:1 group size setup (&#119872; = 2.84, &#119878;&#119863; = 0.90) created the perception of a greater challenge than did the 4:4 setup (&#119872; = 2.31, &#119878;&#119863; = 0.94, &#119865; (1, 74) = 18.46, &#119901; &lt; .001). No other pairs exhibited statistically significant differences. We anticipated that having more people could influence Engagement-Challenge in either direction; people may feel more challenged when there are too many people in the room, or less challenged when the game is collaborative. This result indicates that group size can modulate the extent to which users find gameplay challenging.</p><p>Group size also had effects on Engagement-Positive Affect and Engagement-Negative Affect. For the positive affect component of the GEQ in-game module ("I felt content, I felt good"), group size had a statistically significant effect (&#119865; (2, 74) = 7.40, &#119901; &lt; .01). The post hoc analysis revealed that people felt more positive in the 1:1 setup (&#119872; = 2.97, &#119878;&#119863; = 0.78) than in the 2:2 setup (&#119872; = 2.71, &#119878;&#119863; = 0.80, &#119865; (1, 74) = 8.80, &#119901; &lt; .01) and 4:4 setup (&#119872; = 2.68, &#119878;&#119863; = 0.76, &#119865; (1, 74) = 13.01, &#119901; &lt; .001). For the negative affect component of the GEQ in-game module "I felt bored, I found it tiresome"), group size again had a statistically significant effect (&#119865; (2, 74) = 4.57, &#119901; &lt; .05). The post hoc analysis revealed that people felt more negative in the 2:2 setup (&#119872; = 0.88, &#119878;&#119863; = 0.85) than in the 4:4 setup (&#119872; = 0.67, &#119878;&#119863; = 0.84, &#119865; (1, 74) = 8.59, &#119901; = .0045).</p><p>Meanwhile, Interaction Type had a significant effect on Engagement-Tension. The Shoot interaction type (&#119872; = 0.88, &#119878;&#119863; = 0.98) produced more tension ("I felt frustrated, I felt irritable") than the Poke interaction type (&#119872; = 0.72, &#119878;&#119863; = 0.88). However, the average score was relatively low-below 1-which indicates "slightly". We did not identify any other significant effect on Engagement.</p><p>In the participants' qualitative feedback, we found that most participants expressed how they were engaged with the game and the study for different reasons. In the open-ended feedback, we noticed that many participants (15/38) loved the experience of playing with other people, although some (13/38) did not enjoy the experience for the same reason.</p><p>&#8226; (P27) "Multiplayer activity and actual physical movements were enjoyable. " &#8226; (P34) "Way fun! Would love to play again! Enjoyed the ease of the way to gain points. We will discuss how people's preferences widely vary depending on the group size in greater detail in &#167; 5.6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4">Workload</head><p>Workload, as measured by NASA-TLX, allows us to understand how two factors (interaction type and group size) affect the workload of their experience on six dimensions (Mental demand, physical demand, temporal demand, performance, effort, and frustration) . Below, we report the dimensions for which we identified statistically significant effects.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>5.4.1</head><p>Perceived effort is alleviated in a bigger group. We found significant differences in the effort scores of the different group sizes (&#119865; (2, 74) = 5.15, &#119901; &lt; .01). The average score for 1:1 (&#119872; = 72.75, &#119878;&#119863; = 21.4039) was significantly higher than that for the 4:4 group size (&#119872; = 65.79, &#119878;&#119863; = 20.48, &#119865; (1, 74) = 10.13, &#119901; &lt; .01). The result is consistent with the result for Engagement-Challenges found in the previous section. We did not identify any other significant differences in other dimensions by group size.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>5.4.2</head><p>The design of the Shoot interaction may be mentally demanding and frustrating.</p><p>Meanwhile, the interaction type produced different types of workloads. First, we found the Shoot interaction type (&#119872; = 56.39, &#119878;&#119863; = 22.95) to be significantly higher in terms of mental demand (Mental Demand: "How mentally demanding was the task?") than the Poke interaction type (&#119872; = 50.63, &#119878;&#119863; = 23.67, &#119865; (1, 37) = 10.71, &#119901; &lt; .01). In addition, we found that participants rated the frustration level (Workload-Frustration: "How insecure, discouraged, irritated, stressed, and annoyed were you?") of the Shoot interaction type (&#119872; = 40.10, &#119878;&#119863; = 25.06) as significantly higher than the Poke interaction type (&#119872; = 32.58, &#119878;&#119863; = 20.69, &#119865; (1, 37) = 18.73, &#119901; &lt; .001). However, the overall frustration score was lower than approximately 40 in all conditions. In the qualitative result, we found that this might have been relevant to the nature of the device being a viewing device and a pointing device simultaneously. While some people loved the Shoot interaction mode as is (P37: "It is really intense. I liked the shoot part the most"), some people (10/38) found the projectile's trajectory difficult to understand.</p><p>&#8226; (P14) "I felt the shooting mechanics were a bit off that made it a little frustrating. " &#8226; (P23) "I enjoyed it but it was kind of tiring holding the tablet especially during the shoot games because I had to hold and aim. "</p><p>Because participants had to aim correctly by holding the tablet at an angle orthogonal to the target, we determined that a pointing interaction ought to have visual aids, such as crosshairs or ray casting. Some other participants (5/38) pointed out how slow the projectile was. The slow projectile speed might have amplified frustration of Shoot interaction in view of the aiming interaction, which was already challenging on its own. Our design choices were based on the anticipation that near-instant speed and visual aids could decrease the amount of movement needed to interact with virtual objects. However, this type of design can increase the mental workload in such an experience.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>5.4.3</head><p>The Poke interaction may be physically and temporally demanding and frustrating. In the meantime, Poke had an impact on the perceived workloads of physical and temporal demand. On the NASA TLX, users rated the Poke interaction type (&#119872; = 65.29, &#119878;&#119863; = 24.34) as significantly more physically demanding (Physical Demand: "How physically demanding was the task?") than the Shoot interaction type (&#119872; = 53.26, &#119878;&#119863; = 24.95, &#119865; (1, 37) = 20.55, &#119901; &lt; .001). Similarly, in the temporal demand dimension of the NASA TLX (Temporal Demand: "How hurried or rushed was the pace of the task?"), we found the Poke interaction type (&#119872; = 71.30, &#119878;&#119863; = 23.62) rated significantly higher than the Shoot interaction type (&#119872; = 65.16, &#119878;&#119863; = 24.27, &#119865; (1, 37) = 8.17, &#119901; &lt; .01).</p><p>It seems that people understood the game to be physically and temporally demanding by nature. No participants complained that playing in the Poke interaction type was specifically demanding. In fact, a few participants specifically mentioned a preference for the Poke interaction mode.</p><p>&#8226; (P11) "Poke was fun, more than shooting. It was frustrating not being able to shoot fast as I wish. " &#8226; (P29) "I found poke the most fun as I had to move my tablet around more in the VR space and as such found it more immersive. " &#8226; (P21) "I really enjoyed just running around and playing competitive games. " P21's comment offered a good summary of the affordances that MOMIS can provide for novel types of VR content. We also include a supplementary video in which readers can see how dynamically the participants walk and move during the game for Poke mode (see Supplementary Material).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.5">Perceived Connectivity</head><p>We asked the participants about perceived connectivity to see if the group size and the interaction type had an impact on the performance of the system. Perceived connectivity did not differ significantly across conditions, neither by group size nor interaction type. On the item that states "I observed a loss of connection on my device (e.g., screen freeze, delay): 0 (Never) to 4 (Always)", the median response was 1 (&#119872; = 0.94, &#119878;&#119863; = 1.01). This result indicates that a majority of users rarely perceived significant delays or disconnections, and we could not find any evidence that connectivity would be degraded by a particular interaction type or group size.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.6">Preference</head><p>We asked all the participants to rank the six conditions in order from most enjoyable to least enjoyable. The two most preferred settings (those most frequently assigned Rank 1) were 1:1 Poke and 4:4 Shoot, which were ranked first by 11 out of 38 and 10 out of 38 participants, respectively. The distribution of ranks for each condition is shown in Figure <ref type="figure">8</ref> (left). The horizontal bar chart shows how many times each condition was ranked within the top 3 by shifting all the bars to match the borderline between Rank 3 and Rank 4 across the conditions. Figure <ref type="figure">8</ref> (right) further shows how each individual chose Rank 1 and Rank 6 conditions using a heatmap. For example, a total of 10 participants picked 4:4 Shoot as their first choice, and 6 of those 10-the darkest cell in the matrix-picked 1:1 Poke as their least preferred condition. We noticed a trend for those who prefer a smaller scale tending to pick the 4:4 setup as the least preferred and vice versa. We discuss this trend more in the Discussion section. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.7">Observation and Notable Findings</head><p>The authors continuously observed the participants' behavior during the study, as well as after the study (while reviewing the recording). We outline notable findings from our observations, as well as open-ended comments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>5.7.1</head><p>The participants played dynamically but safely. We observed that some participants moved dynamically during the game. We believe that this was possible because they had visual access to both the virtual world and the physical world. To gain an edge in competition, a few participants, especially while using the Poke interaction type, would run (see the supplementary video). However, while there were a few close calls, we saw no physical contact between participants. It seems that the participants understood the level of awareness needed to navigate the virtual world in relation to the physical world. One of the unique privileges of MOMIS is that users can have awareness in both the VE and the physical world, in contrast to HWD-based VR, where users are immersed in the VE to the exclusion of the physical world. P35's comment offers a good summary of this benefit.</p><p>&#8226; (P35) "This is different because of the concrete spatial aspect, where you and others are in the same physical space. It required an increased awareness of actual surroundings while still immersed in the VR world. " Being able to move dynamically in the game was commonly pointed out as a unique component of MOMIS. The following are examples of responses to the item "How would you compare this experience with your previous VR experience of using VR headsets?" that exemplify how participants perceived MOMIS differently from VR HWDs.</p><p>&#8226; (P4) "I think that headsets are more immersive simply because visually you can see nothing but the environment you are thrown into. However, for group activities I think this approach is better because with headsets it would be too clunky and I think participants would be more injury prone" &#8226; (P7) "I would say that VR headsets would make me feel unsafe while playing the game since I</p><p>[was] afraid of hitting the wall or someone else while can't seeing others around me. [...]" &#8226; (P27) "I think VR headsets are more engaging. But for this type of activity, it can be risky because you cannot see other people. " Their awareness of both environments also affected their behaviors. Some virtual objects were close to the perimeter of the physical environment. When poking these objects, we observed participants holding their devices' views secondary to their attention to the physical world, using their physical spatial memory to avoid collisions with the walls. This required them to break their immersion in the application temporarily.</p><p>While they were able to move quickly, no one complained about nausea or motion sickness except one participant (P4, see below). This was another difference that some participants pointed out in the open-ended question.</p><p>&#8226; (P1) "I usually don't like wearing a VR headset because of its weight. However, the tablet was so light that I almost forgot I was holding it. I also liked that I could use the wide space without motion sickness. " &#8226; (P18) "Downside of VR is to make me feel dizzy but headset makes me dizzier but it doesn't. " &#8226; (P4) "I felt motion sickness due to the lagging. Not tall-people-friendly enough. " Overall, the result suggests that MOMIS can be an inclusive VR option to accommodate people who experience nausea.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>5.7.2</head><p>The limited field of view and weights. A few participants (3/38) reported that the limited field of view is a significant downside that might have impacted immersiveness for them. P36's comment below summarizes this point.</p><p>&#8226; (P36) "Seemed like a smaller scale version of a VR headset experience, and was fun and immersive despite that, just not quite as immersive as real VR games" There were few opinions on the weight of the tablet. While one participant reported the tablets were "so light" (P1), other participants reported that the tablet was "a bit heavy to hold [with] single hand" (P9) and "[made] my arms feel tired sometimes. " (P7). However, given the short duration of the experience, we believe that we do not have evidence to conclude that a tablet can be considered a lightweight option.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">DISCUSSION</head><p>Our work examines the impact of group size and interaction type on the design and implementation of multiuser, co-located MR experiences. We explored the system's design affordances using mobile devices with variance in terms of group sizes and interaction methods. Particularly, we documented our design choices through the iterative development process, including environment, avatars, virtual objects, and audiovisual feedback. The user experience was evaluated with multiple relevant metrics useful for MR, groupware, and games: Presence, Social Presence, Engagement, and Workload. Our prototype revealed variance in some of these metrics depending on the group size and interaction type. What follows is a brief summary of our findings.</p><p>&#8226; Increasing group size alleviated perceived challenge, mental demand, and perceived effort needed, but reduced positive affect. &#8226; Proximity-based interaction can facilitate social presence compared to pointing-based interaction.</p><p>&#8226; Proximity-based interaction was found to be more physically and temporally demanding than pointing-based interaction. &#8226; Pointing-based interaction was found to be more mentally demanding and cause frustration.</p><p>&#8226; Participants were divided on their most preferred setups, with the two most popular setups being one that allowed for the most active movement (Poke, 1:1) and another that needed the least active movement (Shoot, 4:4).</p><p>We present and discuss the results organized by the system contribution and two research questions: Interaction Type (RQ1) and Group Size (RQ2).</p><p>6.1 (RQ1) Varying Group Size Can Modulate Challenge and Users' Affect.</p><p>The group size significantly affected a subscale of GEQ, Challenge (I" felt challenged, I had to put a lot of effort into it"). In particular, the participants felt that playing games in the 1:1 setting was more challenging than in a 4:4 setup. We had a similar result from the Workload-Efforts measurement as well; participants felt they had to put more effort into the 1:1 group setup compared to the 4:4 group. Being in a team makes it a group effort, which might have contributed to distributing efforts and challenges to the team and alleviating individual pressure. The diffusion of responsibility was shown to reduce cognitive effort and facilitate communication with those who shared responsibilities <ref type="bibr">[50,</ref><ref type="bibr">69]</ref>. In addition, given that the area of play and the number of balloons were controlled (held constant) across conditions, the perceived individual territory in larger-group settings may reduce the perceived cognitive load <ref type="bibr">[61]</ref>.</p><p>In the meantime, Positive Affect (GEQ) was significantly higher for 1:1 settings than 2:2 and 4:4 settings, suggesting a more positive gaming experience from playing 1:1 games compared to other setups. Challenges and Positive Affect may be correlated to some extent, especially in the context of game design, as pointed out by game researchers: what manifests through challenges embedded in playing games, such as failures, can encapsulate positive and negative emotion <ref type="bibr">[32,</ref><ref type="bibr">35]</ref>. This trend was also discovered in sports <ref type="bibr">[69]</ref>; enjoyment decreased as team size increased. Another work showed that in a gamified learning setting, larger group sizes reduce students' engagement, as well as their effort and motivation, as individual contributions become less recognizable in larger groups <ref type="bibr">[5]</ref>. Overall, the result suggests that increasing group size can diminish perceived challenges but also be effective in eliciting positive effects from gameplay with more agency. Group size had no significant effect on Presence, Social Presence, and other subscales of Workload and Engagement.</p><p>Accounting for individual differences, we observed a bimodal distribution of people in terms of their most preferred setup (Rank 1) regarding group size: a sizable group of users (15/38) ranked a 4:4 game setup (Poke: 5, Shoot: 10) as their most preferred setup, whereas a similar number (14/38) ranked a 1:1 game setup (Poke: 11, Shoot: 3) as their first choice. Naturally, fewer participants preferred the 2:2 setup the most (9/38). Interestingly, a majority of those who ranked 1:1 setups as their first choice ranked 4:4 setups as their least desired (11/14) (see Figure <ref type="figure">??</ref>). Similarly, most of those who ranked 4:4 setups as their first choice ranked 1:1 setups as least desired (10/15). While we initially speculated about a "the-more-the-merrier" effect appearing in multiuser mixed reality, our study reveals that individuals may have different preferences regarding group size.</p><p>More research is needed to study the relationship between the number of players and gamers' experience and individual differences, as pointed out in previous works <ref type="bibr">[28,</ref><ref type="bibr">70]</ref>. One potential avenue for exploration in this regard is individuals' personality traits and skill levels (competence); research has found that physical capability and personality scores were able to predict the favorite game mode from among cooperative and competitive modes with 89.3% accuracy <ref type="bibr">[48]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2">(RQ2) Interaction types affect social presence and types of workload in a game</head><p>The goal of RQ2 was to understand how the extent to which the participants dynamically move affects the user experience. We varied participants' movement by incorporating two interaction methods: Poke (proximity-based interaction) and Shoot (pointing-based interaction). Based on our observations, the two interaction types encouraged participants to play the game differently.</p><p>We discovered participants felt more social presence (GEQ) when they played with the Poke interaction type than with the Shoot interaction type. If our study had been conducted in HWDbased VR, we could have interpreted this sensation as participants feeling that they were together with others in the VE <ref type="bibr">[12]</ref>. However, we believe this effect came from having visibility and audibility in a co-located space <ref type="bibr">[11,</ref><ref type="bibr">21]</ref>. We extend the literature that discusses the design space for physical proximity and contact <ref type="bibr">[10]</ref> with the information that interaction design that encourages physical proximity and contact can promote social presence among other users in handheld MR.</p><p>In addition, we found how each interaction method can influence the different types of workloads. Our results indicate that participants felt more time pressure and felt physically tired using the Poke interaction. This result again is aligned with our observation that participants constantly navigated the space in the Poke condition. At the same time, it can serve as a manipulation check as far as the Poke interaction method's effectiveness in nudging the participants to move more dynamically relative to the other condition. Another component that may contribute to the physical demand is the need to constantly carry a mobile device. This result is also consistent with previous work; non-HWD users who preferred large displays (e.g., TVs) over mobile devices reported the convenience of virtual movement <ref type="bibr">[20,</ref><ref type="bibr">24]</ref>. Therefore, it is important to understand the extent to which designers would like to make a mobile VR experience physical and apply appropriate interaction methods depending on the target population, space, target experience duration, and context.</p><p>Lastly, using the Shoot interaction type increased perceived mental demand and tension/frustration. Aiming a target in the Shoot method certainly involves a perception and cognition task, and this may be more demanding than moving a tablet toward a target. The result that Shoot elicited less social presence than Poke while imposing greater mental demands is aligned with previous works that found that social presence was negatively correlated with mental demand <ref type="bibr">[8,</ref><ref type="bibr">67]</ref>. In addition, previous works noted that pointing-based interactions could be inconspicuous, less efficiently facilitating awareness of what a user's current action is <ref type="bibr">[10]</ref>. Still, a considerable number of people (17/38) picked a condition with the Shoot interaction type as their most preferred setup.</p><p>Interestingly, the two most popular setups were two extreme options: 1:1/Poke (&#119899; = 11) and 4:4/Shoot (&#119899; = 10), the former being physical and strenuous, and the latter being cooperative and slow-paced. In light of the results we presented in the previous section, those who preferred 4:4/Shoot might have preferred the Shoot method because it is slow-paced and preferred the 4:4 setup because the setup alleviates perceived challenges and efforts. In contrast, those who ranked (Poke/1:1) first might enjoy the challenge and the effort demanded by the game under proximity-based interaction and the spatial freedom they have in a 1:1 setting. Previous works also showed a divide in users' preferences between cooperative and competitive modes for game-based rehabilitation, which could be accounted for by age and personality <ref type="bibr">[48]</ref>. This finding again suggests the bimodal nature of people's preferences, which can be a challenge in designing mixed-reality content that is inclusive.</p><p>Therefore, understanding the nature of the target population and the user experience that the designers want in collaborative VR will influence the choice of interaction methods.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.3">Applications of Larger-Group, handheld VR</head><p>Our study expands the literature of asymmetric collaborative VR, which mostly supports a dyad of an HWD user and a non-HWD user <ref type="bibr">[24,</ref><ref type="bibr">25,</ref><ref type="bibr">68]</ref> by developing MOMIS, a handheld MR platform, and running an experimental study that involves larger groups of non-HWD users (up to 10 people in our case) in a safe and physically active MR experience. Not only does our study show that it is possible to develop engaging, fun, and physically active MR experiences using handheld technologies in co-located environments, but we also account for the effect of group sizes and interaction types on various aspects of the user experience. We believe that our study's findings will also apply to AR settings, as the difference between AR and VR is negligible in task performance <ref type="bibr">[46]</ref>; rather, which setting to choose from among AR and VR seemingly depends on the nature of the virtual content and the physical environment (e.g., cluttered space vs. open space).</p><p>Our study reinforces the growing literature on making the MR experience social with co-located, multiuser VR experiences and can motivate new types of MR content that involve broader ranges of groups. Handheld MR can be useful for contexts in which users must be aware of both virtual and physical environments, including other actors and physical objects. In this work, we explored the multiplayer game context, which has already been frequently studied relative to multiuser, handheld MR <ref type="bibr">[10,</ref><ref type="bibr">14,</ref><ref type="bibr">30,</ref><ref type="bibr">37,</ref><ref type="bibr">56]</ref>. Our findings can be applied to other contexts where gamification is applied, such as gamified learning environments, which typically include competitive components <ref type="bibr">[40]</ref>.</p><p>As we reviewed in 6.1 and 6.2, the findings from this study are consistent with previous works that did not involve games and competition. We argue that this is because there exists a common structure in handheld MR: multiple users in shared virtual objects. For example, this structure can create a competition-like relationship when multiple users are trying to interact with the same object. In this setting, proximity-based interaction will be more conspicuous than pointing-based interaction, promoting greater workspace awareness <ref type="bibr">[26]</ref>. In addition, if the group size is larger in a collaborative setting, the diffusion of labor from a larger group size can alleviate users' perceived individual workloads, effectively lowering the mental barrier for novices.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">DESIGN RECOMMENDATIONS FOR HANDHELD MR</head><p>The study results provide useful information to designers and developers (7.1, 7.2, 7.3). Furthermore, we document design insights that we gained from our iterative development process <ref type="bibr">(7.4, 7.5)</ref>. We present these as design recommendations below for those who aim to design multiuser, handheld MR experiences that avoid physical contact.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.1">Leverage Users' Awareness of Both Virtual and Physical Worlds in Handheld MR</head><p>When designing an environment in a handheld MR setting, it is important to leverage users' awareness of both the virtual and physical worlds. Such awareness can effectively facilitate interactive user experiences that involve large groups of people moving dynamically. Users may not necessarily need full-body avatars of other users, as they can see each other beyond the mobile device screen.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.2">Choose Interaction Methods Considering the Context and Goals</head><p>Based on our findings, UX designers can choose interaction methods in handheld VR settings to shape the user experience. On one end, they can create a competitive and physically demanding setting using proximity-based interaction methods, inviting intensive movement. Designers should also consider potential exertion associated with physically active interaction methods. On the other end, they can design less physically demanding pointing-based interactions. In addition, if the goal of the UX includes facilitating social presence, employing proximity-based interaction is more desirable, as it encourages users to monitor both the VE and their physical surroundings. Similarly, pointing-based interaction can be subtle, reducing workspace awareness <ref type="bibr">[10]</ref>. Therefore, UX designers can consider the context (space, nature of content, target duration, etc.) and the goal of an application in choosing which interaction methods to use.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.3">Understand the Tension of Group-based Interaction</head><p>The effects of group size in the range under 10 (from 1:1 to 4:4) have varying impacts on engagement and workload in handheld MR. In general, running VR content with a larger group can alleviate perceived challenges and workloads, coming from competitive aspects of games or gamified UX, but there is more potential to elicit positive affect in a smaller group. In addition, participants' preferences can be divisive; there will be a group of people who prefer competitive, dynamic, and individual settings, while another group will prefer a cooperative, slow-paced environment <ref type="bibr">[48]</ref>. One can take advantage of asymmetric roles, offering both dynamic and stationary roles in handheld VR to accommodate users with distinctive preferences <ref type="bibr">[56]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.4">Use Naturally Spatialized Auditory Feedback for Room-scale Interaction</head><p>While this was not part of the central research question, many users found it useful to have audio feedback when interacting with objects. Typically, a VR HWD sound system is for the user only, not others. In our design, we observed that providing sharp audio cues when interacting not only helped users identify whether they had succeeded in accomplishing a task, but also raised awareness of where other people were through sounds coming from their positions. Therefore, designers can include auditory displays to compensate for limited immersion and facilitate social presence and situational awareness.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.5">Account for Performance Differences among Mobile Devices</head><p>Another practical insight we gained from the development of MOMIS is that the VR content developed for high-powered, VR-ready computers may not be rendered reliably on relatively lowspec tablets. During development, depending on the age and model of the mobile device in use, the mobile application sometimes ran slowly and would be considerably delayed in the environment and in frames per second. As a result, the research team made a mobile-only VE with fewer moving parts, fewer computationally intensive animations, a low-polygon environment, and few script updates. Thanks to this design, most people did not experience slowdowns on the tablet devices. Rendering content models with different fidelity and the same layout should be considered for translating VR content made for VR HWDs or supporting asymmetric interaction between VR HWDs and mobile devices.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8">LIMITATIONS</head><p>Our findings in this study may primarily apply to the specific context in which we developed our content: games that incorporate competitive and cooperative elements, or which otherwise have some characteristics in common (e.g., gamified, group-based learning). The proposed environment created a fast-paced and exciting user experience, allowing us to explore perhaps the most dynamic scenarios in which multiple users can consume an interactive virtual environment and interact with each other. However, different contexts with varying collaborative natures (e.g., work contexts) may have other factors influencing the results, including verbal communication and shared artifacts central to collaboration. Therefore, the study's findings will help us understand the design space of handheld mixed reality (MR), where people roam to experience VR content.</p><p>One component we did not consider was inclusivity as part of our experiment design and evaluation criteria (e.g., gender differences, including children, and diversifying group formation). While we believe that MOMIS, a multiuser, handheld MR, will be more effective than alternatives in including diverse types of users due to the device type used, we did not consider this factor in recruitment and metrics that we measured. For example, the studied participants were mostly young male adults (26 males out of 38, 18-39 age range with an average age of 25.8), meaning people outside this demographic group, such as children, teenagers, females, and middle-aged or older adults, were not properly represented. We believe that the best way to evaluate the inclusivity of the system is to conduct a field study, which we plan to carry out. We will discuss our plan to deploy the system in informal learning settings in &#167;9.</p><p>This paper lacks an in-depth exploration of user behaviors and does not reveal the internal mechanisms and psychological processes of users' behavioral changes in response to the group size. This limitation is inherent in the study's quantitative design, which focused on evaluating the system using standard VR and gaming metrics. To address this weakness, we intend to conduct an observation-based field study with follow-up interviews as part of our future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="9">CONCLUSION AND FUTURE WORK</head><p>Our work explores the experience of handheld mixed reality (MR) in multiuser competitive games. We developed MOMIS to conduct the experiment, which allows users to use mobile devices as windows into a virtual world to adapt to mixed reality, and which facilitates the use of social MR environments by including those who cannot wear head-mounted displays. This paper investigates the effect of group size <ref type="bibr">(2, 4, and 8)</ref> and interaction methods (proximity-based and pointing-based) on the user experience. We found that proximity-based interactions positively impacted social presence and physical/temporal workload, while participants felt less challenged and motivated in larger group settings. Furthermore, individuals had varying preferences for group size and interaction type. This study informs user experience design in handheld mixed reality. It contributes to our understanding of the handheld MR design space regarding group size and interaction scenarios.</p><p>Our system offers the possibility of integrating non-HWD users into virtual environments (VEs) on a larger scale than just two users. It provides a platform for such users to actively participate, rather than being excluded. We plan to use MOMIS in informal learning settings to see if it can be applied to children's STEM learning materials. Informal learning settings like science museums are ideal places for us to see the inclusive and social benefits of MOMIS as children and their family members visit to enjoy collective learning experiences. We are currently developing STEM materials that will be displayed in MOMIS in collaboration with a local science museum (redacted for anonymity) to exhibit the work.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Proc. ACM Hum.-Comput. Interact., Vol. 8, No. CSCW1, Article 197. Publication date: April 2024.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_1"><p>In the context of this paper, we categorize early mobile phone-based stereoscopic HWD devices (e.g., Google Cardboard, which has been discontinued)<ref type="bibr">[7]</ref> as a type of VR HWD. Mobile VR (or handheld VR) here refers to cases where a monoscopic view is displayed on handheld mobile devices. In literature, the term "head-held displays" (HHD) is often used to refer to mobile AR/VR[18,  </p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_2"><p>22, 62]. Proc. ACM Hum.-Comput. Interact., Vol. 8, No. CSCW1, Article 197. Publication date: April 2024.</p></note>
		</body>
		</text>
</TEI>
