Mobile Augmented Reality (AR) offers a powerful way to provide spatially-aware guidance for real-world applications. In many cases, these applications involve the configuration of a camera or articulated subject, asking users to navigate several spatial degrees of freedom (DOF) at once. Most guidance for such tasks relies on decomposing available DOF into subspaces that can be more easily mapped to simple 1D or 2D visualizations. Unfortunately, different factorizations of the same motion often map to very different visual feedback, and finding the factorization that best matches a user’s intuition can be difficult. We propose an interactive approach that infers rotational degrees of freedom from short user demonstrations. Users select one or two DOFs at a time by demonstrating a small range of motion, which we use to learn a rotational frame that best aligns with user control of the object. We show that deriving visual feedback from this inferred learned rotational frame leads to improved task completion times on 6DOF guidance tasks compared to standard default reference frames used in most mixed reality applications.
more »
« less
Hierarchical motion perception as causal inference
Abstract Motion can only be defined relative to a reference frame; yet it remains unclear which reference frame guides perception. A century of psychophysical studies has produced conflicting evidence: retinotopic, egocentric, world-centric, or even object-centric. We introduce a hierarchical Bayesian model mapping retinal velocities to perceived velocities. Our model mirrors the structure in the world, in which visual elements move within causally connected reference frames. Friction renders velocities in these reference frames mostly stationary, formalized by an additional delta component (at zero) in the prior. Inverting this model automatically segments visual inputs into groups, groups into supergroups, progressively inferring structured reference frames and “perceives motion in the appropriate reference frame. Critical model predictions are supported by two experiments, and fitting our model to the data allows us to infer the subjective set of reference frames used by individual observers. Our model provides a quantitative normative justification for key Gestalt principles providing inspiration for building better models of visual processing in general.
more »
« less
- Award ID(s):
- 2143440
- PAR ID:
- 10584773
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- Nature Communications
- Volume:
- 16
- Issue:
- 1
- ISSN:
- 2041-1723
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Although modern global geometric reference frames (GRFs) such as the International Terrestrial Reference Frame (ITRF) can be used anywhere on Earth, regional reference frames (RRFs) are still used to densify geodetic control and optimize solutions for continental-scale areas and national purposes. Such RRFs can be formed by densifying the ITRF, utilizing GPS / GNSS stations common to both the ITRF and the RRF. It is possible to attach a RRF to a GRF by ensuring that some or all of the coefficients of the trajectory models in the RRF are ‘inherited’ from the trajectory models that define the GRF. This can be done on an epoch-by-epoch basis, or (our preference) via transformations that operate simultaneously in space and time. This paper documents inconsistencies in the densification of ITRF that arise when the common stations’ trajectory models ignore periodic displacements. This results in periodic coordinate biases in the RRF. We describe a generalized procedure to minimize this inconsistency when realizing any RRF aligned to the ITRF or any other ‘primary’ frame. We show the method used to realize the Argentine national frame Posiciones Geodésicas Argentinas (POSGAR) and discuss our results. Discrepancies in the periodic motion amplitudes in the vertical were reduced from 4 mm to less than 1 mm for multiple stations after applying our technique. We also propose adopting object-oriented programming terminology to describe the relationship between different reference frames, such as a regional and a global frame. This terminology assists in describing and understanding the hierarchy in geodetic reference frames.more » « less
-
Video Frame Interpolation aims to recover realistic missing frames between observed frames, generating a highframe- rate video from a low-frame-rate video. However, without additional guidance, the large motion between frames makes this problem ill-posed. Event-based Video Frame Interpolation (EVFI) addresses this challenge by using sparse, high-temporal-resolution event measurements as motion guidance. This guidance allows EVFI methods to significantly outperform frame-only methods. However, to date, EVFI methods have relied on a limited set of paired eventframe training data, severely limiting their performance and generalization capabilities. In this work, we overcome the limited data challenge by adapting pre-trained video diffusion models trained on internet-scale datasets to EVFI. We experimentally validate our approach on real-world EVFI datasets, including a new one that we introduce. Our method outperforms existing methods and generalizes across cameras far better than existing approaches.more » « less
-
Motivated by the success of fractional pixel motion in video coding, we explore the design of motion estimation with fractional-voxel resolution for compression of color attributes of dynamic 3D point clouds. Our proposed block-based fractional-voxel motion estimation scheme takes into account the fundamental differences between point clouds and videos, i.e., the irregularity of the distribution of voxels within a frame and across frames. We show that motion compensation can benefit from the higher resolution reference and more accurate displacements provided by fractional precision. Our proposed scheme significantly outperforms comparable methods that only use integer motion. The proposed scheme can be combined with and add sizeable gains to state-of-the-art systems that use transforms such as Region Adaptive Graph Fourier Transform and Region Adaptive Haar Transform.more » « less
-
null (Ed.)The robotics community continually strives to create robots that are deployable in real-world environments. Often, robots are expected to interact with human groups. To achieve this goal, we introduce a new method, the Robot-Centric Group Estimation Model (RoboGEM), which enables robots to detect groups of people. Much of the work reported in the literature focuses on dyadic interactions, leaving a gap in our understanding of how to build robots that can effectively team with larger groups of people. Moreover, many current methods rely on exocentric vision, where cameras and sensors are placed externally in the environment, rather than onboard the robot. Consequently, these methods are impractical for robots in unstructured, human-centric environments, which are novel and unpredictable. Furthermore, the majority of work on group perception is supervised, which can inhibit performance in real-world settings. RoboGEM addresses these gaps by being able to predict social groups solely from an egocentric perspective using color and depth (RGB-D) data. To achieve group predictions, RoboGEM leverages joint motion and proximity estimations. We evaluated RoboGEM against a challenging, egocentric, real-world dataset where both pedestrians and the robot are in motion simultaneously, and show RoboGEM outperformed two state-of-the-art supervised methods in detection accuracy by up to 30%, with a lower miss rate. Our work will be helpful to the robotics community, and serve as a milestone to building unsupervised systems that will enable robots to work with human groups in real-world environments.more » « less
An official website of the United States government
