<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Remote Telemanipulation with Adapting Viewpoints in Visually Complex Environments</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>2019 June</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10104548</idno>
					<idno type="doi">10.15607/RSS.2019.XV.068</idno>
					<title level='j'>Robotics: Science and Systems XV</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Daniel Rakita</author><author>Bilge Mutlu</author><author>Michael Gleicher</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[In this paper, we introduce a novel method to support remote telemanipulation tasks in complex environments by providing operators with an enhanced view of the task environment. Our method features a novel viewpoint adjustment algorithm designed to automatically mitigate occlusions caused by workspace geometry, supports visual exploration to provide operators with situation awareness in the remote environment, and mediates context-specific visual challenges by making viewpoint adjustments based on sparse input from the user. Our method builds on the dynamic camera telemanipulation viewing paradigm, where a user controls a manipulation robot, and a camera-in-hand robot alongside the manipulation robot servos to provide a sufficient view of the remote environment. We discuss the real-time motion optimization formulation used to arbitrate the various objectives in our shared-control-based method, particularly highlighting how our occlusion avoidance and viewpoint adaptation approaches fit within this framework. We present results from an empirical evaluation of our proposed occlusion avoidance approach as well as a user study that compares our telemanipulation shared-control method against alternative telemanipulation approaches. We discuss the implications of our work for future shared-control research and robotics applications.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>From an early age, people develop an innate ability to adapt their viewpoints to plan and coordinate manipulations within their environments <ref type="bibr">[40]</ref>. People shift how they look at an object throughout a grasping action <ref type="bibr">[34]</ref>, scan their environments to plan future actions <ref type="bibr">[17]</ref>, and naturally adjust their viewpoints to look over and around occlusions when handling items in visually cluttered settings <ref type="bibr">[23]</ref>. The tight coupling between manipulation and viewpoint contributes to people's adeptness in executing tasks in complex day-to-day environments, such as when crouching to look in a cabinet below a sink to adjust a valve, moving the head to look around in a cluttered cabinet, or looking up to secure a light bulb into a ceiling fixture.</p><p>While much work in remote telemanipulation has focused on the control aspects of the problem <ref type="bibr">[27]</ref>, such as studying effects of time delays <ref type="bibr">[26]</ref>, impedance <ref type="bibr">[11]</ref>, and stability <ref type="bibr">[19]</ref>, little is known about how the operator's viewpoint should adapt given environmental and task considerations. In fact, many telemanipulation systems utilize static cameras, where viewpoints are immutable, or use an end-effector camera on the manipulator itself, where the viewpoint and manipulation points are locked together and cannot adapt separately given the task at hand. Recent work has shown the efficacy of moving the camera to continuously adjust the viewpoint on-the-fly for a remote operator <ref type="bibr">[1,</ref><ref type="bibr">25,</ref><ref type="bibr">30]</ref>, leading to telemanipulation performance and perceptual benefits over an array of static cameras and an end-effector camera <ref type="bibr">[30]</ref>. However, it still remains unclear how the viewpoint should be adapted to afford effective manipulations in visually complex environments, i.e., environments where occlusions are likely to occur, where operators may need to look around to obtain situation awareness and plan future actions, or when specific viewpoints may be necessary given the semantics of the task.</p><p>In this paper, we introduce a telemanipulation method where the viewpoint continuously adapts over time to better serve manipulations in response to current environment or task conditions. Consider the scenario of a teleoperator remotely preparing a meal for a family member where our method effectively coordinates the operator's manipulations with their viewpoints, such as allowing the user to visually explore the environment to look for cooking oil, automatically providing a sufficient view to reach into a drawer to get a measuring Preprint cup, and accepting viewpoint modifications by the operator to check if the oil has been filled up to the measurement line.</p><p>To adapt viewpoints in real-time, our method builds on the dynamic camera telemanipulation viewing paradigm, where a user controls a manipulation robot, and a camera robot servos with a camera-in-hand to provide a view of the manipulation to the operator <ref type="bibr">[30]</ref> (Figure <ref type="figure">I</ref>). We adopt the control interface presented in this prior work, where the user fluidly controls the translations and rotations of the manipulation robot's endeffector using a motion controller, and the system automatically adjusts the control frame on-the-fly such that inputs can be made with respect to what is currently seen on screen. In contrast to Rakita et al. <ref type="bibr">[30]</ref>, our work considers how the two robots should coordinate together given environmental and task considerations, both in mathematical formulation by considering the two robots within a single motion optimization and in interface design by simultaneously updating both robots.</p><p>To determine how the viewpoint should be adapted given environment and task considerations, we draw inspiration from how people adjust their viewpoints in complex, humancentered environments. We analyzed a video dataset consisting of people completing tasks in their own kitchens, recorded through head-mounted cameras and identified three distinct viewpoint adaptation behaviors during manipulation (explained in &#167;III). The real-time camera and manipulation-arm control problem is structured as a shared-control method to reduce the user's cognitive load while allowing sparse manual input for user-directed viewpoint shifts, and the shared-control is formulated as a real-time optimization problem to allow fast and real-time coordination between the arms (outlined in &#167;V).</p><p>In &#167;VI, we present two evaluations that show the efficacy of our proposed methods. Our first evaluation assesses the performance of our automatic occlusion avoidance method, outlined in &#167;IV, and shows that our occlusion avoidance algorithm is robust in finding effective viewpoints in the presence of visual obfuscations. Our second evaluation features a user study that compares our methods against alternative telemanipulation approaches.</p><p>Our contributions in this work include (1) a model of how viewpoints should adapt in complex environments to support effective telemanipulation, influenced by how people adapt their viewpoints in such environments; (2) a set of motion optimization methods and a control interface that support these classes of viewpoint adaptation types; and (3) empirical evaluations that provide insight into the performance and efficacy of the proposed methods. <ref type="foot">1</ref></p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. RELATED WORKS</head><p>Our approach build on ideas from active vision and visual servoing, and is influenced by work in computer graphics.</p><p>Active Vision-The control of viewpoint for robotics applications is often termed active vision, see Chen et al. <ref type="bibr">[7]</ref> or Bajcsy et al. <ref type="bibr">[4]</ref> for surveys. Methods reason about posing cameras for numerous applications, such as object search <ref type="bibr">[35]</ref>, object modeling <ref type="bibr">[5,</ref><ref type="bibr">8]</ref>, robot grasp planning <ref type="bibr">[24]</ref>, object tracking <ref type="bibr">[6]</ref>, and surveillance <ref type="bibr">[36]</ref>. Our work shares similar goals with active-vision surveillance methods, such as maximizing visual coverage and avoiding occlusions <ref type="bibr">[2,</ref><ref type="bibr">36]</ref>. However, this body of work uses mobile-robot platforms and static cameras to survey a wide search area, while our work is focused on viewing a workspace for teleoperation using a camera-in-hand robot. Work in laparoscopic robotic surgery allows surgeons to control a flexible robot camera to obtain a sufficient view of the procedure area <ref type="bibr">[21]</ref>. Our work similarly seeks to stream back a sufficient view of a workspace. However, because manually controlling both the camera and manipulation tool may require an expert user, such as a surgeon, we explore automated movement of the camera to support novice users.</p><p>Another relevant aspect of active vision considers how to move the viewpoint to gain more information about the scene, termed the "next view" problem. The problem has a long history (see Connolly <ref type="bibr">[12]</ref>) for an early example or Zhang et al. <ref type="bibr">[41]</ref> for a recent example that considers occlusion. Our work considers how to find improved views for human viewers, rather than views that add information for 3D reconstruction.</p><p>Recent work considers how to choose viewpoints for robots to perform tasks. For example, Saran et al. <ref type="bibr">[33]</ref> describe methods for determining the most useful viewpoint for performing actions by observing the differences between successes and failures, while Rosman et al. <ref type="bibr">[32]</ref> plan sensor locations based on simulations. In contrast, our work chooses viewpoints for human viewers based on real-time information of the scene.</p><p>Visual Servoing-Visual servoing is a robot-control paradigm in which a robot moves based on visual feedback (see the work by Corke <ref type="bibr">[13]</ref> for a full introduction). In our work, the camera moves based on both what it sees and a geometric understanding of the manipulation robot. Similar to eye-in-hand visual servoing systems (e.g., Wilson et al. <ref type="bibr">[39]</ref>), the camera robot provides a view using an end-effector-mounted camera.</p><p>Animation &amp; Graphics-Computer Graphics and Animation applications consider the problems of automatic camera control, see Christie et al. <ref type="bibr">[10]</ref> for a survey. Gleicher and Witkin <ref type="bibr">[18]</ref> introduced the idea of adjusting viewpoint position and orientation based on controls in the image plane. Our work uses this idea of mapping visual goals to camera movements. Virtual camera methods have been developed to avoid visual occlusions with objects in the scene <ref type="bibr">[9]</ref>. Our visual-occlusion-avoidance method differs from such approaches as these methods have the benefit of a full geometric understanding of the whole environment and a camera that is free to move anywhere in the scene rather than being constrained by the motion of a robot. In data visualization, many works have considered how to choose viewpoints to best enable viewers to see a data set [e.g., <ref type="bibr">22,</ref><ref type="bibr">37,</ref><ref type="bibr">38]</ref>, but again, these methods require complete geometry. Galvane <ref type="bibr">[16]</ref> reviews many approaches that automatically move a camera around in a virtual scene to achieve various goals. Our work draws on this work on automatic camera control, as we dynamically move a camera in our environment to improve the visibility of remote telemanipulation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. VIEWPOINT ADAPTATION TYPES</head><p>Our work aims to support effective remote telemanipulation performance, even in visually complex environments, by adapting viewpoints to task and environment considerations, which raises a key question: how should the viewpoint be able to adapt when considering the environment or task?</p><p>To explore this question, we drew inspiration from how people adjust their viewpoints in complex, human-centered environments. We analyzed the Epic Kitchens video dataset <ref type="bibr">[14]</ref> consisting of people completing tasks in their own kitchens, recorded through head-mounted cameras. Our goal was to assess the primary ways in which people adjust their viewpoints in day-to-day life to perform effective manipulations and to support these viewpoint adaptation types in telemanipulation.</p><p>The video dataset was independently coded by a trained coder through a two pass process. On the first pass, the coder watched the videos and took notes on any patterns that connect viewpoint changes, manipulations, and the environment. The notes were reviewed after viewing, and related concepts were clustered together into higher-level categories. On the second pass, the coder watched the dataset videos again, particularly watching for viewpoint change patterns that further defined or separated the categories referenced in the notes from pass one.</p><p>Upon completion of pass two, three central viewpoint adaptation types were identified as high level categories that cover most ways that viewpoints change to support manipulations given the environment. These viewpoint adaptation types are:</p><p>(1) Geometrically dictated viewpoint adaptations. These adaptations describe any viewpoint change that is influenced by the workspace geometry. Examples from the dataset included shifting the viewpoint up to retrieve a spice on the top shelf in the cabinet, shifting the viewpoint to the side to see around a cereal box to grasp a coffee mug on the other side of the table, or looking down into a drawer to grasp a fork.</p><p>(2) Semantically dictated viewpoint adaptations. These adaptations describe any viewpoint change that is associated with the semantics of a given task. An example was viewing a toaster from above when making toast in order to see the slots to place the bread. This viewpoint selection involved more than just geometric reasoning; while there are many geometrically un-occluded views around the sides of the toaster, these views would be insufficient given the semantics of the particular task.</p><p>(3) Visual explorations. These adaptations describe any viewpoint changes that involve looking around the environment to plan future actions. Examples from the dataset included looking around to find the lid for a pan and searching the counter-top to find the next ingredient when making dinner.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. TECHNICAL OVERVIEW</head><p>In this section, we provide a high-level description for how we support the three visual adaptation types into our remote telemanipulation method. &#167;V provides detailed mathematical treatments of these descriptions, including how these concepts fit within an overall real-time motion optimization framework.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Geometrically Dictated Viewpoint Adaptations</head><p>As explained in section &#167;III, geometrically dictated viewpoint adaptations occur whenever the workspace geometry somehow influences the set of viewpoints that would support an effective manipulation. For example, when reaching into an open drawer, the concave geometry of the drawer limits the set of appropriate viewpoints to those from above the drawer, as opposed to around the side or beneath the drawer.</p><p>A key technical problem we address in this work is how to determine an effective viewpoint that is robust in the presence of potentially complex environment geometry. Our premise is, given that the camera is able to be moved by the camera-inhand robot arm, the camera should be able to recognize if the visual target point cannot be seen at the moment, such that the camera robot can dynamically react and adapt to move the camera to a pose where the visual target can be seen again. We construct a differential adjustment algorithm to seek out a new viewpoint at each update that is estimated to get incrementally closer to mitigating such visual occlusions or obfuscations.</p><p>Our solution is based on the observation that, regardless of how complex the environment geometry is, we always know of one occlusion-free path: the free-space path the manipulator took to get to its current configuration. Thus, if the endeffector cannot be seen, the main strategy of our algorithm is to incrementally servo the camera robot to align the camera with the manipulation arm's approach direction until the endeffector can be seen again. We note that there may be more occlusion-free paths for a given environment geometry that would elicit clear views of the end-effector. In our current work, we allow the user to manually nudge the camera toward those alternate viewpoints given their understanding of the task and personal preference. In future work, we will explore more real-time mapping, geometric processing, and data-driven approaches to finding such alternate viewpoints automatically.</p><p>Using the observation presented above, our viewpoint search algorithm is structured as follows (as illustrated in Figure <ref type="figure">IV):</ref> (1) Consider the robot's "manipulation vector," i.e., the vector that points forward along the robot's final wrist joint. Suppose there is an upper allowable bound on the angle between the viewpoint vector and the manipulation vector. This can be thought of as the radius r o of an outer cone emanating out from behind the end-effector where the camera must be placed within. If the end-effector can be seen at update t, increase r o by some increment, capping this value at some maximum.</p><p>(2) If the end-effector cannot be seen at update t, decrease r o some increment, placing a minimum value of r i (an inner cone radius). We place a minimum because the end-effector itself is opaque, so views too aligned with the manipulation vector will elicit views occluded by the end-effector.</p><p>(3) If the end-effector still cannot be seen when r o = r i , move the camera closer to the end-effector.</p><p>This adjustment process discussed above repeats at each update, either providing more slack on the outer visibility cone radius r o when the end-effector can be seen, or squeezing the viewpoint angle in and bringing the camera closer to the Fig. <ref type="figure">2</ref>. Illustration of our geometric viewpoint adaptation algorithm. (a) The robot's manipulation vector is used as a proxy for an approach direction. (b) If the end-effector cannot be seen from the camera, (c) the radius of the outer-visibility cone is decreased so that the view aligns more with the manipulation vector.</p><p>end-effector until it can be seen. We provide an evaluation of this viewpoint search algorithm in &#167;VI.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Semantically Dictated Viewpoint Adaptations</head><p>Semantically dictated viewpoint adaptations describe any context-specific viewpoint changes. We handle these adaptations by allowing the user to provide sparse manual inputs into the system to specify how they want the viewpoint to be adapted. Specifically, the user can provide a directional input, represented in camera-space, of how they want to adjust the camera position. Because the camera will automatically point at the end-effector visual target, camera-space directional inputs will result in orbital rotations about the visual target point. The directional input can also be made toward or away from the visual target point; thus, these manual modifications can also move the camera in to provide more detail or move the camera out to provide more context given the context of the task.</p><p>When the user provides manual directional inputs to the system, we automatically override the geometric search process discussed above and increase the radius r o . This approach ensures that the user has adequate flexibility when they want to control the viewpoint. When the user stops providing manual inputs, the geometric visual search automatically resumes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Visual Explorations</head><p>Visual explorations describe any viewpoint changes that involve looking around and surveying the environment to plan future actions. We handle visual explorations in our telemanipulation method by allowing the user to manually switch to a visual exploration mode, wherein they can naturally adjust the camera's look-at point around the environment. Various automatic aspects of the method are maintained while in visual exploration mode, such as keeping the camera upright and avoiding collisions. When the user decides to exit visual exploration mode, the camera robot smoothly transitions back to automatically looking at the manipulation robot's end-effector.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. TECHNICAL DETAILS</head><p>This section provides technical details for the optimization and shared-control solutions outlined above.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Motion Optimization Framework</head><p>In order to sufficiently realize the manipulation arm and camera arm shared-control method described throughout this work, there are many motion qualities that need to be consistently maintained in the control loop. For example, the manipulation robot should follow end-effector pose goals specified by the user, the camera should look at the visual target point, the manipulation robot and camera robot should not collide, etc.</p><p>To accommodate all of these sub-goals in real-time, we use an optimization-based inverse kinematics solver that handles trade-offs between different objectives on the fly. At each update, the method calculates joint angles for each robot that will exhibit these desired features through a process called inverse kinematics (IK) (see Aristidou et al. <ref type="bibr">[3]</ref> for a review of IK methods. We note that this is a generalized IK formulation, because we reason over kinematic goals other than just end effector position and orientation goals.</p><p>Our method utilizes the RelaxedIK solver to achieve real-time optimization performance <ref type="bibr">[29]</ref>. The solver utilizes a flexible non-linear optimization framework to handle IK problems that dynamically trade-off between multiple objectives, and is able to produce per-update motions that accurately follow endeffector pose goals without sacrificing motion feasibility.</p><p>The IK problem is formulated as a constrained optimization:</p><p>Here, &#920; is the n-vector of robot joint values (n is the number of degrees of freedom); c i (&#920;) is a set of inequality constraints; c e (&#920;) is a set of equality constraints; l i and u i values define the upper and lower bounds for the robot's joints; and f is a scalar objective function.</p><p>Throughout this work, we consider both robot arms together within a single optimization, i.e., the state vector &#920; concatenates the joint value degrees of freedom from both arms such that the objectives and constraints can consider both arms together. In prior automatic dynamic camera method by Rakita et al. <ref type="bibr">[30]</ref>, the manipulation arm and camera arm ran under two separate optimization instances, resulting in behaviors where the camera arm would only react to the actions of the manipulation arm, and the manipulation arm had no sense of the camera arm. Because our current work optimizes over both arms in a single procedure, both arms are aware of each others motion priorities and can plan together. This formulation results in more sophisticated behavior such as the manipulation arm moving its elbow down to clear visual space for the camera.</p><p>Our optimization formulation involves twelve objective terms and two constraints. The objective terms encode the following kinematic goals: (1) match end-effector position goal on the manipulation arm; (2) match end-effector orientation goal on the manipulation arm; (3) minimize joint velocity of the full state vector; (4) minimize joint acceleration of the full state vector; (5) minimize joint jerk of the full state vector; <ref type="bibr">(6)</ref> avoid collisions between the arms and modeled environment features; <ref type="bibr">(7)</ref> keep camera upright; (8) avoid occlusions caused by the manipulation robot; (9) point camera towards visual target ("look-at" objective); <ref type="bibr">(10)</ref> keep camera position between inner and outer visibility cones (outlined in &#167;IV); <ref type="bibr">(11)</ref> match desired goal distance between camera and visual target; (12) follow user's manual camera translation inputs (if provided). The two constraints are designed to clamp joint velocities at each update and avoid kinematic singularities for each arm, respectively. Our implementations of objectives 1-8 and both constraints follow prior work <ref type="bibr">[29,</ref><ref type="bibr">30]</ref>; the next section details how we incorporate our viewpoint adaptation types as objectives 9-12.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Incorporating Viewpoint Adaptations into RelaxedIK</head><p>In this section, we highlight how we incorporate our viewpoint adaptation types, outlined in &#167;IV, into the RelaxedIK optimization framework. We use the same Groove loss function introduced in previous work <ref type="bibr">[29,</ref><ref type="bibr">30]</ref> and specify the loss function parameters for each additional term. Because these terms are incorporated within a larger optimization framework with other objectives built in, other features, such as avoiding occlusions incurred by the manipulation robot, will automatically be exhibited alongside these viewpoint adaptations.</p><p>Geometrically Dictated Viewpoint Adaptations. We incorporate geometrically dictated viewpoint adaptations into the RelaxedIK optimization framework using three objective terms, two to encourage the camera position to be between the inner and outer visibility cones outlined in &#167;IV, and one to bring the camera closer if deemed necessary by the search method.</p><p>The outer and inner visibility cone terms are:</p><p>Here, &#920; m and &#920; c refer to the degrees-of-freedom of &#920; corresponding to the manipulation robot and camera robot, respectively; FK(.) refers to a function that returns the rotation frame of the end-effector provided a given joint configuration and the forward kinematics model of the arm; and &#923;(.) refers to a function that returns the "forward" vector of the input rotation frame. These terms use Groove loss parameters of t = -3.0, d = 60.0, c = 1e14, f = 0.00001, g = 10.0, which encourages both terms to be less than zero.</p><p>The camera distance objective is:</p><p>Here, FK(.) is a function that returns the position of the endeffector given a joint configuration and the forward kinematics model of the arm and d refers to a goal distance. This term uses loss function parameters t = 0.0, d = 2.0, c = 0.5, f = 35.0, g = 2.0, which pulls the objective term output to zero.</p><p>Semantically Dictated Viewpoint Adaptations. Semantically dictated viewpoint adaptations are handled in our method by allowing the user to provide a sparse directional input to dictate where the camera should move. As an objective term, this is supported by having the camera pulled toward a new location per update using the following term:</p><p>Here, c is the camera location at the previous update and g is the directional input specified by the user (represented in the camera's local frame). The magnitude of the directional input vector g and a scalar &#955; can adjust the sensitivity of the manual inputs. When no manual inputs are being provided from the user, g is considered to be [0, 0, 0] T . This term uses loss function parameters t = 0.0, d = 2.0, c = 0.5, f = 35.0, g = 2.0, which pulls the objective term output to zero.</p><p>Visual Exploration. Visual exploration is supported in our shared-control method by allowing the user to manually move the visual target and is formulated as the objective term:</p><p>Here, t denotes the visual target point, v denotes the viewpoint vector pointing out of the front of the camera's focal point, dis(., .) is a function that returns the orthogonal distance between a point and line segment arguments, respectively, and &#947; is some large scalar value used to cast out the line segment. By default, the visual target point t is set as the end-effector point on the manipulation robot. However, when users enter visual exploration mode, they are able to move the visual target point t around by rotating a motion controller. This term uses loss function parameters t = 0.0, d = 2.0, c = 0.1, f = 10.0, g = 2.0, which pulls the objective term output to zero.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. EVALUATIONS</head><p>We carried out two forms of evaluation to demonstrate the effectiveness of our dynamic camera shared-control method for remote telemanipulation. Below, we outline our prototype system and discuss the designs and findings of our evaluations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Prototype Details</head><p>We instantiated our shared-control camera method in a system, described below, designed to provide sufficient performance and safety to demonstrate its benefits in a user study.</p><p>Teleoperation Interface-In our system, we used the mimicrycontrol interface, presented by Rakita et al. <ref type="bibr">[28]</ref>, to control the manipulation robot for remote teleoperation. This method was shown to be more effective for novice users to control a robot arm using full 6-DOF Cartesian control than other interfaces. We used HTC Vive motion controllers as the motion input devices to capture user input at 80 Hz. One controller moved the manipulation robot while the other allowed for camera translation adjustments using sparse motion controls. Robots-Our system used a 6-DOF Universal Robots UR5 robot as the manipulation robot and a 7-DOF Rethink Robotics Sawyer robot as the camera robot to match the system used in prior work <ref type="bibr">[30]</ref>, which served as one of our comparison cases.  2 where we incorporated our additional camera objectives and viewpoint adaptation types. In order to achieve sufficient real-time performance to include both the manipulation arm and camera arm degrees of freedom within a single optimization structure, we used a version of RelaxedIK implemented in the Julia programming language, which is substantially faster than its Python alternative. Solutions are returned in 8 ms (125 Hz) in our system running on an HP Pavilion laptop with an Intel Core 2.6 GHz i7-6700HQ CPU with 32 GB RAM. The optimization used the SLSQP solver provided by NLopt, and gradients were sent to the solver using forward-mode automatic differentiation using the ForwardDiff Julia package <ref type="bibr">[31]</ref>.</p><p>Our system used two Logitech 930e webcams attached to the camera arm end-effector; one streamed high-definition video over USB that was then displayed on a large-screen monitor, while the other looked for Aruco markers to assess whether the manipulation arm's end-effector could be seen.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Assessing Geometrically-dictated Viewpoint Adjustments</head><p>In our first evaluation, we designed a testbed of tasks to assess the performance of our proposed viewpoint adjustment algorithm in the presence of occlusions or obfuscations, as outlined in &#167;IV-A. We conducted a procedure in simulation where the camera-in-hand robot would start in a random configuration with the goal of finding a clear view of the manipulation robot's end-effector as quickly as possible, following the procedure outlined in &#167;IV-A.</p><p>Tasks-We designed three tasks for our testbed: (1) finding the robot's end-effector as it took something out of a refrigerator on the bottom shelf, requiring a forward view at the level of the bottom of the refrigerator to sufficiently see; (2) finding the robot's end-effector as it screwed a light bulb into a ceiling fixture, requiring a view from below to see adequately; and (3) finding the robot's end-effector as it reached into a box, requiring a view from above to sufficiently see.</p><p>Procedure-Our evaluation involved starting the camera robot in 500 random initial configurations for each task. Any initial configuration where the robots did not start in collision states with each other or the environment were deemed acceptable. The manipulation robot started in the same configuration for each of the 500 trials per task, maintaining the same end-effector pose throughout each trial and moving other 2 RelaxedIK: <ref type="url">https://github.com/uwgraphics/relaxed ik</ref> joints only if redundancy was present and deemed necessary by the optimization as outlined in &#167;V. A trial was deemed successful only if (a) a viewpoint that is not blocked by objects in the environment or by the manipulation arm itself is found in less than ten seconds, determined through ray-casting in the simulated scene; and (b) no collisions occurred between the two robots or with statically modeled environment objects.</p><p>Robots-We used two separate robot pairs for this evaluation: the UR5 robot as the manipulation arm and the Sawyer robot as the camera robot as well as two 7-DOF Jaco arms.</p><p>Results-Our results, summarized in Table <ref type="table">I</ref>, show that our geometric viewpoint adaptation algorithm found a clear viewpoint on all tasks for both robot pairs in almost all cases. The failures occurred when the random start placed the robots very close to a collision state.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. User Study</head><p>In this section, we present a user study that we compared our telemanipulation shared-control method to other alternatives.</p><p>Hypotheses-Our hypotheses predicted that (H.1.) our telemanipulation system that considers geometric, semantic, and exploration aspects of viewpoint-manipulation coordination would significantly improve performance and perceptual results over a remote telemanipulation alternative that uses a moving camera without these considerations; and (H.2.) a state-of-theart co-located telemanipulation system would outperform our remote telemanipulation method on performance and user experience, because participants could utilize depth perception when looking at the environment and more effectively coordinate viewpoint and manipulation by moving their own heads.</p><p>Experimental Design-To test our hypotheses, we designed a 3 &#215; 1 within-participants experiment in which participants completed in-home tasks outlined below using three control paradigms, in a counterbalanced order: mimicry-control (MC), autonomous dynamic camera (ADC), and viewpoint sharedcontrol manipulation system (VSMS).</p><p>(1) Mimicry-control. The user stood behind the robot and guided the robot through motions with their own hand motion <ref type="bibr">[28]</ref>, and the robot used an optimization-based motion retargeting solution to mimic the user's hand pose motion in real-time. Because the user is co-located with the robot in its workspace, this condition serves as a comparison against how people perform our experimental tasks when they can use their own stereo vision (including depth perception) and control their own viewpoints with their heads.</p><p>(2) Autonomous dynamic camera. This paradigm followed prior work Rakita et al. <ref type="bibr">[30]</ref> that showed the importance of using a moving camera, which outperformed other viewing alternatives for remote manipulations, such as an array of static cameras and an end-effector camera. However, this system did not consider environment geometry, handle manual viewpoint shifts given task context, or afford visual explorations.</p><p>(3) Viewpoint shared-control telemanipulation system. This paradigm used the methods discussed throughout this work.</p><p>Study Setup-To simulate a remote teleoperation setting, the camera robot was placed next to the manipulation robot, and physical dividers separated the participants and robot workspace. Users controlled the robots based on what they saw on the screen. The experimenter sat next to the participants.</p><p>Study Tasks-To ensure the generalizability of our findings to a wide range of telemanipulation tasks, we developed three tasks that followed a home-care scenario in which participants would log in to a telemanipulation system to care for a friend or family member by completing the following tasks:</p><p>(1) Sock Sorting. Users picked two pairs of white socks from a bin, surrounded by other black socks, and placed them into another bin. This task involved both geometric viewpoint reasoning, i.e., maintaining a viewpoint from above with a clear view into the bin, as well semantic viewpoint reasoning, i.e., viewing the correct part of the bin to locate white socks.</p><p>(2) Table <ref type="table">Preparation</ref>. Users set the table by retrieving dinner items from a four-cube (2 &#215; 2) organizer that involved shelves measured 12 &#215; 12 &#215; 12 . Participants retrieved a plate from the top left compartment, a fork from the upper right compartment, and a spoon from the lower left compartment. The forks and spoons were placed upright in a cup on their respective shelves. This task also involved geometric viewpoint reasoning, i.e. maintaining a viewpoint of the end-effector reaching into the compartments, visual exploration, i.e., surveying items on shelves, and semantic viewpoint reasoning, i.e., refining the viewpoint within the compartments to specify a proper grasp.</p><p>(3) Pill Organization. Users picked up a pill bottle and poured a small pill into three containers: a bowl, a cup, and a real pill tray. The containers were chosen to make the task more difficult over time and to show skill level over a single task given a gradient of difficulty. This task involved semantic viewpoint reasoning to get a sufficient view of the pouring motion. A variant of this task was also used in prior work <ref type="bibr">[30]</ref>, allowing us to compare our results to prior work.</p><p>Study Procedure-A male experimenter obtained informed consent and provided detail on the study. Participants then viewed a training video on the robot-control approach and the motion controller. Participants then (1) received ten minutes of training on a particular telemanipulation condition using videos and an interactive training session, (2) performed the three tasks outlined in &#167;VI-C using the current condition, and (3) filled out a questionnaire pertaining to the current condition. This process repeated until all conditions were completed, with short breaks between each task as the experimenter reset the robot to its initial configuration and set up the workspace for the new task. Upon completion, participants responded to a demographics survey and received compensation.</p><p>Measures-To assess performance, we measured task completion time over the five tasks (sock sorting, table preparation, and pill organization &#215; 3). For each task, the participants had a maximum time of five minutes. To measure participant perceptions, we administered a questionnaire based on prior research on measuring user preferences and teamwork with a robot <ref type="bibr">[15,</ref><ref type="bibr">20]</ref>, including scales on goal understanding, trust,ease of use, robot intelligence, fluency, and predictability (Table <ref type="table">II</ref>), using a seven-point rating scale.  Results-We analyzed data from all measures using one-way repeated-measures analyses of variance (ANOVA) using control method as the within-participants variable. Figure <ref type="figure">VI-C</ref> shows data and test results from all objective and subjective measures. Our analyses provided full support for both hypotheses.</p><p>Discussion-Our results support our hypotheses that our telemanipulation system significantly improves results over the autonomous dynamic camera method on all tasks and many perceptual measures. We observed wide variance in the ADC condition results across all tasks. Because the camera in the ADC condition just moved in response to the motion of the manipulation robot, without any consideration of the task or environment, the resulting motion behavior and resulting viewpoints from the camera could substantially differ across participants, even for the same task. Because our VSMS condition considered the task and environment geometry, the quality of the viewpoint was not dictated by this level of chance, contributing to improved results with lower variance.</p><p>We expected mimicry-control to perform better than our method on all tasks and perceptual measures, though we only observe significantly better results on the table preparation task and a marginal effect on the tray pill organization task. We believe that incorporating depth perception into our method will further close the performance gap between the remote and co-located telemanipulation methods on these tasks.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VII. GENERAL DISCUSSION</head><p>In this paper, we presented a remote telemanipulation sharedcontrol method where the viewpoint is able to adapt to afford effective task execution in complex environments. We introduced a novel viewpoint adjustment algorithm designed to automatically mitigate occlusions caused by the workspace geometry, and showed how we address visual exploration and context-specific visual challenges. In this section, we outline limitations of our current methods, and discuss how our results could be applied to a wide area of robotics applications.</p><p>Limitations-Our method has limitations that suggest future extensions. First, our method does not afford depth perception to users using the on-screen interface. We will explore techniques to elicit depth effects, such as using motion parallax or stereo vision, and compare them against mimicry-control where users utilize their own depth perception while manipulating.</p><p>Our shared-control camera method benefits the remote user's view without easing the manipulation based on said awareness. We plan to explore ways to use the rich un-occluded data stream to supplement the control algorithm on the manipulation robot. For instance, while our motion optimization framework affords collision avoidance between the two arms and static objects modeled ahead of time in the environment, both robots can collide with dynamic objects. We will explore ways of providing dynamic collision avoidance given the clear external view of the manipulation point and other parts of the environment.</p><p>Our geometric occlusion avoidance algorithm also has known limitations. For instance, the forward "manipulation" vector may not be an accurate proxy for the robot's approach direction in all cases, especially with robots that have a flexible wrist. This limitation could be mitigated by using the actual approach direction of the end-effector position over some window of time points. We will investigate such alternatives, as well as explore ways of incorporating mapping, more geometric sensing, and data driven techniques, for finding effective viewpoints.</p><p>Conclusion-Our work highlights the potential of using a moving camera that considers the task and environment as part of a robot manipulation system. Our results indicate that an external viewpoint that is able to coordinate with the manipulation point, subject to the environment and task, plays an integral role in manipulation performance. This phenomenon could not only apply to telemanipulation systems but also to fully autonomous systems, where adaptable viewpoints could influence the quality of learned grasp or manipulation policies. We plan to investigate the possible benefits of this viewing paradigm in real-time telemanipulation, shared-control, and supervisory-control settings for applications such as remote home-care, telenursing, or nuclear materials handling, and also explore the methods discussed in this work to inform fully autonomous motion and task policies.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>Open-source code for the proposed methods are available at https://github.com/uwgraphics/relaxed ik.</p></note>
		</body>
		</text>
</TEI>
