<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Watch Out! Modelling Pedestrians with Egocentric Distractions</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>10/16/2020</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10223008</idno>
					<idno type="doi">10.1145/3424636.3426910</idno>
					<title level='j'>ACM SIGGRAPH Motion Interaction and Games</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Melissa Kremer</author><author>Brandon Haworth</author><author>Mubbasir Kapadia</author><author>Petros Faloutsos</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[The use of mobile devices is one of the most commonly observed family of distracted behaviours exhibited by pedestrians in urban environments. We develop an event-driven behaviour tree model for distracted pedestrians that includes initiating mobile device use as well as terminating or pausing mobile device use based on internal or external cues to refocus attention. We present a simple, probabilistic attention model for such pedestrians. The proposed model is not meant to be complete. It primarily focuses on computing the probability that a distracted agent looks up, based on the agent's individual characteristics and the elements in their environment. We condition the potentially attention grabbing elements in the environment on distraction-specific egocentric fields for visual attention. We also propose an oriented ellipse model for capturing the affects of cognitively fuzzy goals during distracted navigation. Our model is simple and intuitively parameterized, and thus can be easily edited and extended.
CCS CONCEPTS• Computing methodologies → Agent / discrete models; Procedural animation.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>The use of mobile devices, such as cell phones, while walking is a commonly observed behaviour among pedestrians in urban settings. Whether texting, talking or using other mobile apps the interaction between a pedestrian and a mobile device is often considered a distraction, in the sense that the ability of the pedestrian to perceive and react to the environment is reduced. It is known that using a mobile device affects the distracted pedestrian's movement as well as focus, and can lead to serious accidents, such as tripping or walking into the path of a vehicle. At the same time it affects the nearby pedestrians who may be forced to slow down, or alter their trajectory. It is therefore timely and important for computer games and urban simulations, to model and account for distracted pedestrians.</p><p>This paper builds on previous work <ref type="bibr">[Kremer et al. 2020</ref>] and presents a way of thinking about the problem of distraction modelling in synthetic crowds and a corresponding framework to build distracted agents. There are three key issues that one must address to incorporate distractions in a simulated environment: (a) when pedestrians decide to use their mobile devices, (b) how they behave when they are distracted, and (c) when they refocus their attention to their environment. Each one of these issues can be quite complex and is dependent on many variables, as they relate to cognition, vision, and attention. For instance, whether a pedestrian will answer or ignore a ringing cell phone, depends on the pedestrian's personality, age, location, whether they have company, and so many other circumstances. We propose a simple, editable, and extensible framework that one can use to author distracted pedestrians, and which offers common sense starting points, and intuitive parameters. Most of the important elements of this framework are modelled with conditional probabilities which can be set by the user based on intuition, experimentation, or derived from data and observations. In this paper, we exemplify the proposed framework by focusing on a few common distracted behaviours related to the use of mobile devices while walking: (a) taking a phone call, (b) reading, and (c) texting.</p><p>At the heart of our model is an agent-centric representation and abstraction of an agent's surrounding area, using egocentric fields that encode or condition probabilities that an agent's attention may be attracted by a specific area in his or her environment. Egocentric fields are a very effective and versatile way of encoding multiple layers of dynamic or static information centered on the agents or elements of the environment. A wide range of intuitive and powerful operations on these fields, such as filtering and convolution, can be used to combine and process them to support specific decision processes. An interesting example of the effectiveness of egocentric fields in crowd simulation is <ref type="bibr">[Kapadia et al. 2009]</ref>.</p><p>Our contributions can be summarized as follows:</p><p>&#8226; We blend the use of parametric and event-driven interactive behaviour trees to model state interactions between pedestrians and their devices. &#8226; We ground the steering model parameters which are affected by different forms of distractions to the literature on human behaviour during distraction. &#8226; We model navigation error (lateral deviation) identified in distraction studies by representing an agent's goal as a parameterized elliptical region. &#8226; We model visual attention using Gaussian models in polar coordinates, and used them for both the steering and the attention refocusing models. &#8226; We model the conditions for refocusing attention based on intrinsic parameters and extrinsic parameters conditioned on the visual attention model.</p><p>It is important to clarify that we do not propose a new steering method, but rather an egocentric behavioral layer that affects the steering abilities and parameters of an agent, and which can be combined with most of the standard steering approaches. In simple terms, this layer decides when and for how long a pedestrian temporarily acts more as a moving obstacle rather than a steering agent. The effect of the distracting behaviors on the sensory and locomotion abilities of the pedestrian is informed by the related literature which is based on actual observations, and thus reflects ground truth. The proposed intuitive parameterization can be easily adjusted as more studies become available in the future, or to fit a specific context.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">RELATED WORKS</head><p>In this section we first present a short overview of crowd simulation techniques, and then we review the relevant literature in attention modelling and distracted pedestrians studies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Crowd Simulation</head><p>A wide array of techniques have been used in crowd simulation. Many of these are outlined in <ref type="bibr">[Thalmann and Musse 2012]</ref>. The Boids model proposed by Reynolds <ref type="bibr">[Reynolds 1987</ref>] calculated velocities based on simple rules to perform agent navigation and obstacle avoidance, and was capable of demonstrating flock cohesion behaviour. More recently, anticipatory and synthetic vision based models have been proposed <ref type="bibr">[Best et al. 2016;</ref><ref type="bibr">Ond&#345;ej et al. 2010;</ref><ref type="bibr">Paris et al. 2007;</ref><ref type="bibr">Van Den Berg et al. 2011</ref>]. Hybrid models have also been developed <ref type="bibr">[Snape et al. 2011</ref>]. Predictive force-based models use current trajectories to compute future collision-free paths for some time window. One such model is Predictive Avoidance Model (PAM), which uses a piecewise predictive force to calculate future collisions well in advance so that the effort needed to make avoidance maneuvers can be minimized <ref type="bibr">[Karamouzas et al. 2009]</ref>. Another method used a probabilistic model to improve the flexibility of predictions and the number of factors taken into account <ref type="bibr">[Wolinski et al. 2016]</ref>. There is a lot of work on behavioural modeling for crowds, that focus on deliberate and complex behaviors such as those seen in a train station, for example <ref type="bibr">[Shao and Terzopoulos 2005]</ref>. Most of this work is out of the scope of this paper, since we focus on the low level effects of egocentric and individual distractions.</p><p>In general, most of the prior work in crowd simulation focuses on normative agents with perfect visual abilities, and does not considered the effects of egocentric distractions, such as mobile devices, on the individual agent.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Gaze and Attention Modelling</head><p>A wide variety of approaches and techniques have been proposed for modelling vision and gaze in simulations, as examined in <ref type="bibr">[Bruce et al. 2015]</ref>. This is indicative of how complicated the human vision system is and the difficulty of modelling it. VR-based studies have recently been proven to provide valuable insights into how people use gaze to navigate their environment <ref type="bibr">[Berton et al. 2020</ref><ref type="bibr">[Berton et al. , 2019;;</ref><ref type="bibr">Huang and Terzopoulos 2020;</ref><ref type="bibr">Lynch et al. 2018</ref>]. One study showed that gaze is typically directed primarily towards neighbours that have the highest collision risk, based on the distance to closest approach and the time to closest approach <ref type="bibr">[Meerhoff et al. 2018]</ref>. A similar study tracked users' eye gaze without a VR setup but using a monitor and a joystick <ref type="bibr">[Berton et al. 2018]</ref>.</p><p>Visual attention but without explicit gaze behaviour has been modelled to some degree within crowd simulations <ref type="bibr">[Kuffner and Latombe 1999;</ref><ref type="bibr">Peters and O'Sullivan 2002]</ref>. Methods have been proposed that use environmental stimuli to trigger gaze redirection for characters based on a set of rules <ref type="bibr">[Hill 1999;</ref><ref type="bibr">Khullar and Badler 2001]</ref>.</p><p>Conversely, environment centric saliency maps have been utilized to draw agent gaze to the most noticeable objects <ref type="bibr">[Itti et al. 1998</ref>]. In other works, gaze has been modelled in face-to-face scenarios for more personal situations <ref type="bibr">[Bailly et al. 2010</ref>]. Gaze has also been modelled in interactive environments with real users <ref type="bibr">[Kokkinara et al. 2011;</ref><ref type="bibr">Narang et al. 2016]</ref>. One approach <ref type="bibr">[Grillon and Thalmann 2009</ref>] attempted to simulate gaze in crowds for aesthetics, although it didn't alter the trajectories of agents or investigate the effects of distraction on crowds. We are not aware of prior work in this area that models visual attention for distracted behaviours within our context.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Distracted Pedestrian Analysis</head><p>Previous studies on distraction in the real-world has shown various interesting effects of walking while distracted. For example, secondary activities can be visual, motor, or cognitive distractions or a combination of these. The type of distracting activity widely determines its task cost, with cognitive activities being the most distracting, followed by motor and then visual <ref type="bibr">[Tian et al. 2018</ref>]. Another study suggests that gait is affected by cognitive and visual demand but not gross motor demand, although it did not examine or draw conclusions about the effects of fine motor demand such as swiping and tapping finger movements <ref type="bibr">[Prupetkaew et al. 2019]</ref>. A study on driving while talking on the phone showed that participants talking on the phone recalled as little as half as much visual information they reported when not tallking on the phone, and that hands-free phone calls were just as distracting as holding a cellphone while talking <ref type="bibr">[Strayer et al. 2011]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">METHODOLOGY</head><p>In this section, we discuss the details of our mobile device-centric distraction model. This includes models for initiating mobile device use, the impact of distraction on steering, and reacting to internal and external cues to refocus. At the heart of our model are event-driven interactive behaviour trees which drive both agent distractions and mobile device events <ref type="bibr">[Kapadia et al. 2015a,b]</ref>. That is, in this model, the objects of distraction (mobile devices) are separate autonomous agents which produce events that can be consumed at any time during simulation by pedestrians in possession of these devices. In addition to this behavioural level of modelling, we propose a ground-projected visual attention model. This model ties distraction types to changes in perception at the level of steering, such as changes in the Field of View (FOV) or neighbour distance. This model also serves as the basis for the proposed refocus approach, in which agents can become attentive again because of their local surroundings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Distractions</head><p>We implement three types of distraction related to mobile devices, which are prevalent in everyday modern lives, particularly in urban settings. These include taking a phone call, reading, and texting on a mobile device while navigating.</p><p>At a high level, these behaviours are managed and triggered through two types of behaviour trees. A mobile device agent has a parametric behaviour tree which drives the use of the device. This behaviour tree triggers events based on distraction specific probabilities set by the animator. These are p c , p r , and p t for phone call, reading, and texting respectively, such that p c +p r +p t +p n = 1.</p><p>Here p n = 1 -(p c +p r +p t ) is simply the probability that no mobile device interaction will begin during a time-step. At each simulation time-step t &#8712; T , a trial in a multinomial distribution is performed, such that the number of trials is n = T f s , where f s is the frequency of simulation updates. In this way, an animator may choose values for p c , p r , and p t such that a pedestrians agent will likely become distracted over some period. Once a particular distraction trial succeeds, a distraction specific event is triggered.</p><p>The distraction event is consumed by the pedestrian agent's event driven interactive behaviour tree. This causes a number of distraction specific changes in the pedestrian agent's animation, steering, and visual attention model. In our proposed approach, animation is driven by a typical Inverse Kinematics (IK) approach with procedural end-effector targets that include noise and variation among the pedestrian agents to improve naturalness and heterogeneity of large scenarios. The impact on steering and attention are explained in the following sections.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Steering</head><p>The proposed approach includes several distraction specific changes to steering that are based on the literature on human navigation during distractions. The primary parameters related to steering are preferred speed, waypoint undershoot/overshoot, waypoint lateral deviation, field of view ( FOV), and neighbour distance. The FOV and neighbour distance parameters are detailed in Section 3.3 as they relate to the visual attention model. In our implementation we use PAM <ref type="bibr">[Karamouzas et al. 2009]</ref> as the underlying steering model. However, any steering model may be used as long as it uses the aforementioned parameters and a local goal waypoint as its current target.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Agent Environment g f g w e l e o</head><p>Figure <ref type="figure">1</ref>: A birds-eye view illustration of the proposed fuzzy goal model in a simple environment. We encode the possibility of lateral deviation and overshoot/undershoot as the lengths of the semi-minor e l and semi-major e o axes respectively. This forms an oriented ellipse around the current waypoint goal &#1076; w within which the new fuzzy goal &#1076; f is randomly chosen.</p><p>The preferred speed s p for a normative, not distracted, pedestrian agent can vary depending on the context of the scenario. For illustration purposes we use a default value of 1.3m/s, which reflects the average human walking speed. The preferred walking speed during distraction may be a fraction of the agent's normative walking speed depending on the specifics of the distraction model. For the implementation of mobile device-based distractions in the proposed approach, we derive values from a review of the literature on distractions. For a pedestrian agent on a phone call, the preferred speed is set to 1.066 m/s or 82% of the normative walking speed <ref type="bibr">[Prupetkaew et al. 2019]</ref>. For a pedestrian agent reading, the preferred speed is set to 1.144 m/s (88%) <ref type="bibr">[Niederer et al. 2018;</ref><ref type="bibr">Schabrun et al. 2014]</ref>. For a distracted agent texting, the preferred speed is set to 1.04 m/s or (80%) <ref type="bibr">[Agostini et al. 2015;</ref><ref type="bibr">Cha et al. 2015;</ref><ref type="bibr">Haga et al. 2015;</ref><ref type="bibr">Licence et al. 2015;</ref><ref type="bibr">Pizzamiglio et al. 2017;</ref><ref type="bibr">Plummer et al. 2015;</ref><ref type="bibr">Prupetkaew et al. 2019;</ref><ref type="bibr">Schabrun et al. 2014;</ref><ref type="bibr">Yu et al. 2015</ref>]. Further research may be needed to determine distributions for values for these parameters as they relate to age, sex, culture, etc.</p><p>During reading or texting behaviours, where vision is primarily focused downward and the visual horizon is short, navigating pedestrians must rely on a mental model of their local waypoint. To model this fuzzy mental model of local waypoints we propose a parametric ellipse model of the local waypoint. This model encapsulates both lateral deviation and overshoot/undershoot caused by a less than accurate cognition of local waypoints. Lateral deviation means that someone who is distracted begins to deviate left or right from a straight path to their local waypoint, and it has been extensively studied in the related literature <ref type="bibr">[Jeon et al. 2016;</ref><ref type="bibr">Lamberg and Muratori 2012;</ref><ref type="bibr">Schabrun et al. 2014]</ref>. Overshoot/undershoot is an observed real-world behaviour of pedestrians who are so distracted that they forget to turn where they should or proceed past their destination. Both of these behaviours are modelled by sampling the proposed oriented parametric ellipse, as seen in Figure <ref type="figure">1</ref>. The semi-major, e O , and semi-minor, e l axes govern the degree of undershoot/overshoot and lateral deviation respectively. In practice, the semi-major and semi-minor axes of the ellipse are determined by the distance of the agent to their actual waypoint, &#1076; w , when the fuzzy waypoint &#1076; f is chosen. However, an animator may specify a maximum length of the semi-major and semi-minor axes of the ellipse per agent, and at runtime the model will attempt to keep this ratio intact. To achieve this, the semi-minor axis is set to 10% of the distance between the agent and their current waypoint, up to the specified maximum. The semi-major axis length is set to 10% of this distance multiplied by the major-minor axis ratio derived from the user specified maximums of the ellipsoid axes. We use the 10% value in our experiments because it corresponds to the average largest lateral deviation amount reported in the literature <ref type="bibr">[Jeon et al. 2016;</ref><ref type="bibr">Lamberg and Muratori 2012;</ref><ref type="bibr">Schabrun et al. 2014</ref>]. This adaptive approach also ensures that the generated fuzzy waypoint is never behind the agent. The ellipse representation is then sampled for a random point within its bounds, which we refer to as the fuzzy waypoint, &#1076; f . Once a potential fuzzy waypoint is chosen, it is tested and adapted for reachability against the underlying environment model (Navigation Mesh, Grid, Obstacles, etc). If an agent refocuses, i.e. is no longer distracted, their local waypoint returns to their original waypoint. If an agent reaches a fuzzy waypoint while distracted, the true waypoint is updated as normal. If the agent is still distracted, a new fuzzy waypoint is then computed for them, and so on.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Visual Attention</head><p>We propose a vision model for normative and distracted pedestrians that is designed to interface with both the underlying steering model and the proposed model for refocusing attention. Visual attention is a complex process which incorporates intrinsic and extrinsic factors into visual perception <ref type="bibr">[Tsotsos 2001</ref>]. To overcome this complexity, and for the sake of usability and performance, we make use of the fact that human visual perception clarity is concentrated in the centre of the visual field and becomes less focused towards the edges of the visual field. Additionally, people tend to focus more attention on visual cues central to their visual field and more nearby to themselves <ref type="bibr">[Ikeda and Takeuchi 1975;</ref><ref type="bibr">Leibowitz and Appelle 1969]</ref>. In our model, we collapse the three-dimensional field of view formed by binocular vision into a flat model and consider horizontal vision, similar to how steering models perceive environment. The proposed model represents the visual field in polar coordinates over a 200 &#8226; sweep. Within this visual field we represent the distribution of attention as two Gaussians, one half Gaussian in the r dimension and one full Gaussian in the &#952; dimension of the polar field. These represent the fall off of attention with respect to distance and the attenuation of attention with respect to angle (i.e. peripheral vision) respectively. The attenuation with respect to distance is defined as N (0,&#963; r ), where r = 0 is the centre of the head. The attenuation of the attention with respect to angle is defined as N (0,&#963; &#952; ), where &#952; = 0 is aligned with the forward vector of the agent's head. In this way, each visual attention model for each distraction has two parameters (&#963; r , &#963; &#952; ). These functions are convolved together to produce a single distribution of attention over the polar visual field. The relative values of the resultant field can then be transformed and stored in a single channel of a texture for fast lookup. In this paper, we normalize individual values in the field textures and sample singular points of the field, using the nearest point on object's (agent or obstacle) underlying collision representation. In this way, each sample of each individual pixel in the texture can be modelled by a Bernoulli distribution conditioned on its location in the field p(x |r , &#952; )). However, this method affords many approaches, such as integration over a probability density function (by normalizing pixel values by the sum of pixels in the field, then summing the pixels within the region of overlap with the field and the object). As well, a practitioner may choose different parameters or functions to convolve over (r , &#952; ). The construction of each visual attention model presented in this paper is explained in the following paragraphs and can be seen in Figure <ref type="figure">2</ref>.</p><p>A pedestrian with normative vision during walking is often modelled with a neighbour distance <ref type="bibr">[Karamouzas et al. 2009</ref>], planning horizon <ref type="bibr">[Singh et al. 2011]</ref>, or query radius <ref type="bibr">[Helbing and Molnar 1995]</ref> between 5 -15m. This range of choices are backed up by literature on human-human collision interactions with a bias toward lower values <ref type="bibr">[Cinelli and Patla 2008;</ref><ref type="bibr">Fajen and Warren 2003]</ref>. In this work, to support arbitrary models, our visual field is constrained to 10m or the average query value for neighbouring objects used in steering models. The two parameters for normative vision of a pedestrian agent are then &#963; r = 5m and &#963; &#952; = 30 &#8226; , to utilize the entirety of the field where the majority of attention is within 5m and within a 60 &#8226; sweep, which corresponds to the reported central visual field horizontal range [Bhise 2011].</p><p>We assume that an agent talking on the phone is able to see as far as an undistracted agent, so &#963; r = 5m. However, studies have shown that while talking on the phone, visual information is often missed and peripheral vision shrinks about 7-10% for large or small targets due to the cognitive load <ref type="bibr">[Maples et al. 2008</ref>]. Considering other agents as large targets, we shrink &#963; &#952; by about 7% and set &#963; &#952; = 28 &#8226; . This means that the majority of the visual field is still used but attention is focused within 5m and a 56 &#8226; sweep.</p><p>A reading agent is a more complicated case because the head is tilted downwards so vision is significantly affected. A study on peripheral vision showed that when people focus on visual information, their ability to notice and process information in their peripheral vision diminishes <ref type="bibr">[Ikeda and Takeuchi 1975]</ref>. Further, the study shows that the peripheral information loss is correlated with the foveal load. However, they also found that the effect is reduced if the subjects have participated in the study before and are thus trained to notice things in their peripheral vision. Trained subjects had an average periperal information uptake of about 76.5% for high foveal load when compared to normative participants with no visual load. For our experiments, we assume that cellphone users frequently use their cellphones while walking and are therefore trained to use their peripheral vision while walking, and we also assume that cellphone use is a high foveal load. Therefore we set &#963; &#952; to 76.5% of normative, so &#963; &#952; = 22.95 &#8226; for reading agents. Another study <ref type="bibr">[Schabrun et al. 2014</ref>] reports that people reading on a cell phone have an average head pitch angle of 29.22 &#8226; downward. Reported values for the average height for adults age 18 years in 1996 was 1.71m for men and 1.59m for women worldwide so we used an average of these two values, 1.65m, to calculate a focus distance of &#963; r = 2.95m on the ground in front of them <ref type="bibr">[Collaboration et al. 2016]</ref>. Therefore, attention is focused within 2.95m in front of them and within a 46 Similar to a reading agent, a texting agent also has reduced field of view due to looking down. However, texting also has an additional motor load and is perhaps more cognitively demanding than reading. We therefore estimate an additional 10% loss in peripheral vision information processing and set &#963; &#952; to 66.5% of normative, so &#963; &#952; = 19.95 &#8226; for texting agents. However, <ref type="bibr">[Schabrun et al. 2014]</ref> reports that texting pedestrians pitch their head downwards by a larger amount than reading pedestrians, about 31.80 &#8226; which we can again use with an average height of 1.65m to set &#963; r = 2.66m. Attention is then focused within 2.66m and a 40 &#8226; sweep.</p><p>A visual representation of each model and how its respective texture is created is shown in Figure <ref type="figure">2</ref>. In the first three columns, bright yellow areas represent the space where most attention is paid, while no attention is paid to dark blue areas.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Refocusing</head><p>We propose a refocusing model which produces one of two actions refocus, continue. For a pedestrian agent which is currently distracted, a reaction module executes three possible Bernoulli trials at a frequency f r . We use the value f r = 3Hz, an approximation of the inverse of average human reaction time. However, this may be set to a lower or higher value per agent, producing less or more attentive pedestrians respectively.</p><p>The first trial represents the intrinsic properties of a pedestrian and tests whether the agent is going to refocus because of his or her circumstance and personality. This trial is conducted at every refocus update. The value of Pr (refocus) = p i may represent whether a pedestrian agent is a person who takes important calls (e.g. a business person, on-call worker, etc.), or other elements of the agent's personality. This value may be set by the practitioner to suit scenario needs. As this value remains fixed, at every reaction update the Bernoulli trial is a part of a Bernoulli process which can be modelled as a Binomial distribution B(N r , p i ) where N r is the total number of reaction updates N r = T d f r and T d is the duration of the distraction up to the timestep t &#8712; T .</p><p>The second and third trials represent the extrinsic properties of local objects (obstacles, agents) relative to the pedestrian agent. Both of these trials are conditioned on the visual attention model and each is performed for each object in the FOV at every refocus update. The first of these is related to the static visual appeal of the object and tests whether the agent will notice the object and refocus his or her attention. The value of Pr (refocus) = p e may represent shape, colour, contrast, importance, and so on of an object. The probability an agent will see the object because of its properties is conditioned on the visual attention field such that P(refocus|p e , r, &#952; ). The third trial accounts for an object's relative speed which is known to have particular importance for predicting collisions <ref type="bibr">[Karamouzas et al. 2014]</ref>, and thus we consider it separately in our context. The value</p><p>represents whether an agent will refocus because of the movement of an object, where v max is the maximum velocity of all the objects in the scene. The probability that an agent will see the object because of its movement is also conditioned on the visual attention field such that P(refocus|p v , r, &#952; ).</p><p>When a distracted activity is chosen, a random task time is selected. The maximum task time is parameterizable and can be specified by the practitioner. If an agent is interrupted while distracted and looks up, after another random wait time the agent will resume the task, provided that there is at least 1 second of task time remaining, to avoid oscillations.</p><p>When one of the trials succeeds then the associated agent switches to a normative visual attention model. In the reading and texting cases the agent also interrupts temporarily the distracting activity and looks ups and lowers the device, while in the talking case the agent who is already looking up continues to hold the device to a talking position. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5">Crowd Authoring</head><p>In order to test our method with virtual crowds quickly and easily, we have implemented a simple and intuitive crowd authoring interface in Unity3D. For every test scene we set one or more goals around the scene, and crowd spawning areas. Crowd spawning areas are simple planes placed in the scene that represent the dimensions or areas within which agents should be spawned. The user drags the desired crowd spawning area and goal from the scene hierarchy into the floor and goal fields of the window respectively. A variety of options are then available to the user, including how many agents of each type to generate. Normative agents do not have cellphones and will not become distracted, while distracted agents may occasionally become distracted or undistracted according to their properties and the corresponding visual attention model. Opening up the distraction parameters foldout allows the user to quickly set desired probabilities and other parameters for distracted agents all in one place. Clicking the Create Agents button will spawn the desired number of agents with the desired parameters into the scene view and hierarchy, where their positions and other attributes can be further tweaked. The user is free to spawn multiple groups of agents with different parameters in the same or different spaces and with the same or different goals. A similar runtime tool has also been implemented that determines agent positions randomly at every runtime. When agents are generated, a free location is found by sampling the navmesh randomly within the spawn region. When a free location is found with appropriate distance to other agents, the agent is spawned. Obstacles can also be spawned with or without agents, and the navmesh is updated accordingly. An equally simple interface is used to set the parameters of the objects, such as the extrinsic probability, p e , that controls an object's visual appeal.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">RESULTS</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Computational Performance</head><p>In this section we study the computational performance of our framework using a variant of the statistical approach proposed in <ref type="bibr">[Kapadia et al. 2011</ref>]. At each trial, an environment is populated with 100 obstacles randomly placed in a grid like fashion, and then agents and their goals are randomly generated. We use this approach with an increasing number of distracted agents and measure the frame time in milliseconds. The result is illustrated in Figure <ref type="figure">3</ref>. The system is capable of handling more than 200 distracted agents without a significant increase in frame time. However, the frame time starts to increase as the number of agents increases towards 500. It is worth noting though that our implementation has not been optimized yet.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Refocus Model Parameters</head><p>In order to demonstrate the effect of the intrinsic look up probability p i , we use the scenario space environment with a fixed number of 20 agents. The p i value is homogeneous among all agents, but we slowly increase this value from 0 to 1 and count the number of refocus events. The result is shown in Figure <ref type="figure">4</ref>. As expected, higher values of p i yields a higher number of refocus events. Interestingly, the talking distraction exhibits more look up behaviours than the other distracted activities. This makes sense, because the visual field of these agents is much larger so it is more likely that objects in the environment represented in this process by the extrinsic probabilities p e and p v will be noticed and trigger a look up behaviour.</p><p>The same procedure is used to demonstrate the effect of the extrinsic look up probability p e . For a group of homogeneous agents, we use the same p e factor for all obstacles and slowly increase it from 0 to 1, and count the number of refocus events. The result is shown in Figure <ref type="figure">5</ref>. Higher values of p e correlate to higher numbers of refocus events, as one would expect.</p><p>The impact on look up count plateaus after p i and p e values have reached certain points. Since the probability checks run very frequently at three times per second, even small increases for p i and p e cause a look up behaviour to trigger, up to a point where further increasing the value has only a small impact.</p><p>Finally, we evaluate the effect of the p v by comparing unidirectional and bidirectional flows. In a unidirectional flow scenario, relative velocity between agents will be small since all agents are moving in the same direction. In a bidirectional flow, agents approaching from the opposite direction will have high relative velocities. The number of refocus events is shown in Figure <ref type="figure">6</ref>.</p><p>As expected, talking pedestrians exhibit a higher number of look up behaviours. Surprisingly, however, texting agents had slightly more look up behaviours than reading agents. A possible explanation for this is that texting agents move on average slower than reading agents (see Section 3.2), so external objects and agents may be in their field of view for longer, resulting in a higher chance of catching their attention and triggering a look up behaviour. Also, because texting agents move slower they remain in the scene longer, which allows them more opportunity for look up triggers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Visual Attention Model Parameters</head><p>To illustrate and compare results from the different vision models for each of the distraction types, we again use 20 agents in the scenario space environment. All parameters are homogeneous across agents, however in each trial we restrict the attention models so that only one model can be active, either normative, reading on a device, talking, or texting. We then record the refocus events, collisions, and average path length. Collisions are detected through Unity3D's built-in collision detection where each agent is represented by a capsule collider. Thus we can compare how these measurements differ between the visual attention models for each activity type.</p><p>The results are summarized in Figure <ref type="figure">7</ref>.</p><p>As expected, normative and talking modalities had similar results for collisions, while texting and reading distractions resulted in more collisions. Unexpectedly, reading had the most collisions. Even though the visual field is slightly more reduced for texters than readers, this result could be explained by the fact that readers also move quicker than texters resulting in more collisions. For look up events, talking had the highest amount while texting and reading were roughly the same. This is expected, as for talkers looking up does not interrupt the distraction, so they tend to be distracted for longer and so would trigger more look up events. Path length was highest for readers and texters, with readers' paths being longer than texters. This may be explained by the increased collision rate for readers, which causes them to be pushed away from the optimal path.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Fuzzy Goal Model Parameters</head><p>To evaluate the parameters used in the fuzzy goal model, we use the scenario space environment with a single agent. For this agent, random values for the maximum semi-major and semi-minor lengths of the fuzzy goal ellipse are generated along with a randomized goal. Here we randomize the maximum semi-major and semi-minor axis between [0, 1], but higher values are also possible. Then we measure lateral deviation and overshoot/undershoot with respect to the actual waypoint. We also measure the pathlength. The results are shown in Figure <ref type="figure">8</ref>.</p><p>From this we can see that in general path length is higher when the maximum semi-minor axis length is high. This is expected, since semi-minor axis length represents the amount of lateral deviation, which deviates from optimal paths. Generally higher values of semimajor axis length also increase path length, but to a lesser extent and with more noise. Overshooting or undershooting may not result in significant deviations from the optimal path. In particular, undershooting is not likely to affect path length as much because upon the agents reaching their fuzzy goal they will start heading to the next waypoint which is often in the same direction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5">Qualitative Evaluation</head><p>We evaluate qualitatively the proposed framework using the following common and indicative scenarios: (a) two oncoming agents with obstacles, and (b) two groups of agents crossing a corridor from opposite sides.</p><p>In the oncoming obstacle scenario, two agents start at opposite sides and move along potentially crossings paths at the presence of obstacles. Figure <ref type="figure">9</ref> shows the resulting trajectories for the three different distractions with both agents performing the same distracted activity. Normative and talking models exhibit similar trajectories, while the reading agents show deviations from the optimal path. The trajectory of the texting agent that started on the left side, clearly shows that the agent bumped on the first obstacle that the agent encountered.</p><p>In the crossing groups scenario, two groups of a total of 50 agents walk through a narrow corridor from opposite directions. The results for this scenario are depicted in Figure <ref type="figure">10</ref>. For the texting and reading case, distraction tends to occur less in the middle where the two groups meet, because the p v factor is triggering look up behaviours, while the talking on the phone distraction is not interrupted. Figure <ref type="figure">11</ref> shows the traces for the same scenario with twenty agents distracted in all three different ways.</p><p>In Figure <ref type="figure">12</ref>, we show the visual field of a single distracted agent from this scenario before and after a look up behaviour triggers due to the presence of another agent, for reference.</p><p>In Figure <ref type="figure">13</ref> we show a snapshot from an urban scene where pedestrians, including distracted ones, cross at an intersection. We refer the reader to the accompanying video for animated versions of these experiments which show the resulting behaviours more clearly.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">DISCUSSION</head><p>The proposed framework for authoring distracted pedestrians and the associated visual attention model are specifically demonstrated on egocentric distractions related to the use of mobile devices. A number of intuitive parameters are available to the user to author specific situations, personalities, and variations. For example, very cautious agents may not become distracted as often or may look up more often. Outgoing or less cautious agents may send and receive more texts and calls and so on. In particular, the attention model,  which is based on egocentric fields and related operations, can be customized or extended to support agents with different sensory abilities, such as limited peripheral vision and other vision impairments. Furthermore, it could be combined with more sophisticated models of agent vision that have been reported in the computer graphics and computer vision literature.</p><p>Our work focuses on individual (egocentric) rather than emerging group behaviours. Nevertheless, comparing the resulting traces with real-world crowd data may offer additional insights. We plan to address this issue in the future with a carefully designed study.</p><p>In the future, we aim to address non-egocentric distractions, and deliberate behaviours that may impede an agent's navigation abilities, such as window shopping, reading an advertisement on a board, looking at an interesting object or an accident, talking to a friend, et al. It would be interesting to investigate whether projecting variants of the proposed attention models along the direction of gaze would be sufficient to accommodate such behaviours, rather than resorting to a 3D representation of the field of view.</p><p>Exploring the range of the distracted agents' parameters, as well as determining and providing default values from real data are important future tasks. We also plan to investigate how other individual characteristics of an agent, such as age, gender, mobility and sensory abilities, may affect the agent's behaviour towards egocentric distractions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">CONCLUSION</head><p>We have presented an approach and a framework for modeling egocentric distractions related to the use of mobile devices for crowds in everyday casual navigation tasks. Our approach provides an intuitive set of parameters that can be used to customize these behaviors for each agent. The parameters of the model and their default values are grounded in studies of distracted pedestrians, and thus reflect ground truth.</p><p>The proposed attention model can be easily extended to model pedestrians with varying sensory and physical abilities. We believe that the inclusion of a wide variety of non-normative pedestrian behaviours and attributes will push realism and fidelity in crown animations to a higher level that more accurately reflects the diversity we see in our daily lives.   </p></div></body>
		</text>
</TEI>
