skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Cueing Sequential 6DoF Rigid-Body Transformations in Augmented Reality
Augmented reality (AR) has been used to guide users in multi-step tasks, providing information about the current step (cueing) or future steps (precueing). However, existing work exploring cueing and precueing a series of rigid-body transformations requiring rotation has only examined one-degree-of-freedom (DoF) rotations alone or in conjunction with 3DoF translations. In contrast, we address sequential tasks involving 3DoF rotations and 3DoF translations. We built a testbed to compare two types of visualizations for cueing and precueing steps. In each step, a user picks up an object, rotates it in 3D while translating it in 3D, and deposits it in a target 6DoF pose. Action-based visualizations show the actions needed to carry out a step and goal-based visualizations show the desired end state of a step. We conducted a user study to evaluate these visualizations and the efficacy of precueing. Participants performed better with goal-based visualizations than with action-based visualizations, and most effectively with goal-based visualizations aligned with the Euler axis. However, only a few of our participants benefited from precues, most likely because of the cognitive load of 3D rotations.  more » « less
Award ID(s):
2037101
PAR ID:
10482129
Author(s) / Creator(s):
; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
Proceedings of the 2023 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)
ISBN:
979-8-3503-2838-7
Page Range / eLocation ID:
356 to 365
Format(s):
Medium: X
Location:
Sydney, Australia
Sponsoring Org:
National Science Foundation
More Like this
  1. We present a prototype virtual reality user interface for robot teleoperation that supports high-level specification of 3D object positions and orientations in remote assembly tasks. Users interact with virtual replicas of task objects. They asynchronously assign multiple goals in the form of 6DoF destination poses without needing to be familiar with specific robots and their capabilities, and manage and monitor the execution of these goals. The user interface employs two different spatiotemporal visualizations for assigned goals: one represents all goals within the user’s workspace (Aggregated View), while the other depicts each goal within a separate world in miniature (Timeline View). We conducted a user study of the interface without the robot system to compare how these visualizations affect user efficiency and task load. The results show that while the Aggregated View helped the participants finish the task faster, the participants preferred the Timeline View. 
    more » « less
  2. We assess the accuracy of Structure-from-Motion/Multiview stereo (SM) terrain models acquired ad hoc or without high-resolution ground control to analyze their usage as a base for inexpensive 3D bedrock geologic mapping. Our focus is on techniques that can be utilized in field projects without the use of heavy and/or expensive equipment or the placement of ground control in logistically challenging sites (e.g., steep cliff faces or remote settings). We use a Terrestrial Light Detection and Ranging (LiDAR) survey as a basis for the comparison of two types of SM models: (1) models developed from images acquired in a chartered airplane flight with ground control referenced by natural objects located on Google Earth scenes; and (2) drone flights with a georeference established solely from camera positions located by conventional, differentially corrected Global Navigation Satellite systems (GNSS). We find that all our SM models are indistinguishable in scale from the LiDAR reference model. The SM models do, however, show rigid body translations and rotations, with translations generally within the 1–5 m size of the natural objects used for ground control, the resolution of the GNSS receivers, or both. The rigid body rotations can be attributed to a poor imaging plan, which can be avoided with survey planning. Analyses of point densities in various models show a limitation of Terrestrial LiDAR point clouds as a mapping base due to the rapid falloff of resolution with distance. In contrast, SM models are characterized by relatively uniform point densities controlled by camera optics, the numbers of images, and the distance from the target. This uniform density is the product of the Multiview stereo step in SM processing that fills areas between key points and is important for bedrock geologic mapping because it affords direct interpretation on a point cloud at a relatively uniform scale throughout a model. Our results indicate that these simple methods allow SM model construction to be accurate to the range of conventional GNSS with resolutions to the submeter, even cm, scale depending on data acquisition parameters. Thus, SM models can, and should, serve as a base for high-resolution geologic mapping, particularly in a steep terrain where conventional techniques fail. Our SM models appear to provide accurate visualizations of geologic features over km scales that allow detailed geologic mapping in 3D with a relative accuracy to the decimeter or centimeter level and absolute positioning in the 2–5 m precision of GNSS; a geometric precision that will allow unprecedented new studies of any geologic system where geometry is the fundamental data. 
    more » « less
  3. For flexible goal-directed behavior, prioritizing and selecting a specific action among multiple candidates is often important. Working memory has long been assumed to play a role in prioritization and planning, while bridging cross-temporal contingencies during action selection. However, studies of working memory have mostly focused on memory for single components of an action plan, such as a rule or a stimulus, rather than management of all of these elements during planning. Therefore, it is not known how post-encoding prioritization and selection operate on the entire profile of representations for prospective actions. Here, we assessed how such control processes unfold over action representations, highlighting the role of conjunctive representations that nonlinearly integrate task-relevant features during maintenance and prioritization of action plans. For each trial, participants prepared two independent rule-based actions simultaneously, then they were retro-cued to select one as their response. Prior to the start of the trial, one rule-based action was randomly assigned to be high priority by cueing that it was more likely to be tested. We found that both full action plans were maintained as conjunctive representations during action preparation, regardless of priority. However, during output selection, the conjunctive representation of the high priority action plan was more enhanced and readily selected as an output. Further, the strength of the high priority conjunctive representation was associated with behavioral interference when the low priority action was tested. Thus, multiple alternate upcoming actions were maintained as integrated representations and served as the target of post-encoding attentional selection mechanisms to prioritize and select an action from within working memory. 
    more » « less
  4. Reasoning about 3D objects based on 2D images is challenging due to variations in appearance caused by viewing the object from different orientations. Tasks such as object classification are invariant to 3D rotations and other such as pose estimation are equivariant. However, imposing equivariance as a model constraint is typically not possible with 2D image input because we do not have an a priori model of how the image changes under out-of-plane object rotations. The only SO(3)-equivariant models that currently exist require point cloud or voxel input rather than 2D images. In this paper, we propose a novel architecture based on icosahedral group convolutions that reasons in SO(3) by learning a projection of the input image onto an icosahedron. The resulting model is approximately equivariant to rotation in SO(3). We apply this model to object pose estimation and shape classification tasks and find that it outperforms reasonable baselines. 
    more » « less
  5. Robotic pick and place tasks are symmetric under translations and rotations of both the object to be picked and the desired place pose. For example, if the pick object is rotated or translated, then the optimal pick action should also rotate or translate. The same is true for the place pose; if the desired place pose changes, then the place action should also transform accordingly. A recently proposed pick and place framework known as Transporter Net (Zeng, Florence, Tompson, Welker, Chien, Attarian, Armstrong, Krasin, Duong, Sindhwani et al., 2021) captures some of these symmetries, but not all. This paper analytically studies the symmetries present in planar robotic pick and place and proposes a method of incorporating equivariant neural models into Transporter Net in a way that captures all symmetries. The new model, which we call Equivariant Transporter Net, is equivariant to both pick and place symmetries and can immediately generalize pick and place knowledge to different pick and place poses. We evaluate the new model empirically and show that it is much more sample-efficient than the non-symmetric version, resulting in a system that can imitate demonstrated pick and place behavior using very few human demonstrations on a variety of imitation learning tasks. 
    more » « less