We introduce Vysics, a vision-and-physics framework for a robot to build an expressive geometry and dynamics model of a single rigid body, using a seconds-long RGBD video and the robot’s proprioception. While the computer vision community has built powerful visual 3D perception algorithms, cluttered environments with heavy occlusions can limit the visibility of objects of interest. However, observed motion of partially occluded objects can imply physical interactions took place, such as contact with a robot or the environment. These inferred contacts can supplement the visible geometry with "physible geometry," which best explains the observed object motion through physics. Vysics uses a vision-based tracking and reconstruction method, BundleSDF, to estimate the trajectory and the visible geometry from an RGBD video, and an odometry-based model learning method, Physics Learning Library (PLL), to infer the "physible" geometry from the trajectory through implicit contact dynamics optimization. The visible and "physible" geometries jointly factor into optimizing a signed distance function (SDF) to represent the object shape. Vysics does not require pretraining, nor tactile or force sensors. Compared with vision-only methods, Vysics yields object models with higher geometric accuracy and better dynamics prediction in experiments where the object interacts with the robot and the environment under heavy occlusion.
more »
« less
Inferring the existence of objects from their physical interactions
A fully occluded object cannot be perceived directly, but we can still infer its existence from the effect it has on the motion and behavior of other, visible objects. Here we report the results of a behavioral experiment designed to elicit these sorts of inferences and quantify their re- liability. Our experiment leverages videos of real-world objects interacting under real-world physics (specifically, interrupted pendulum motion). We propose a preliminary model for how the mind might efficiently infer the position and number of occluded objects simply from the effect they have on the visible physics of a scene.
more »
« less
- Award ID(s):
- 2121102
- PAR ID:
- 10466904
- Publisher / Repository:
- Proceedings of the Cognitive Computational Neuroscience conference
- Date Published:
- Subject(s) / Keyword(s):
- intuitive physics, pendula, psychology, cognitive science
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
When navigating in urban environments, many of the objects that need to be tracked and avoided are heavily occluded. Planning and tracking using these partial scans can be challenging. The aim of this work is to learn to complete these partial point clouds, giving us a full understanding of the object's geometry using only partial observations. Previous methods achieve this with the help of complete, ground-truth annotations of the target objects, which are available only for simulated datasets. However, such ground truth is unavailable for real-world LiDAR data. In this work, we present a self-supervised point cloud completion algorithm, PointPnCNet, which is trained only on partial scans without assuming access to complete, ground-truth annotations. Our method achieves this via inpainting. We remove a portion of the input data and train the network to complete the missing region. As it is difficult to determine which regions were occluded in the initial cloud and which were synthetically removed, our network learns to complete the full cloud, including the missing regions in the initial partial cloud. We show that our method outperforms previous unsupervised and weakly-supervised methods on both the synthetic dataset, ShapeNet, and real-world LiDAR dataset, Semantic KITTI.more » « less
-
A large number of traffic collisions occur as a result of obstructed sight lines, such that even an advanced driver assistance system would be unable to prevent the crash. Recent work has proposed the use of around-the-corner radar systems to detect vehicles, pedestrians, and other road users in these occluded regions. Through comprehensive measurement, we show that these existing techniques cannot sense occluded moving objects in many important real-world scenarios. To solve this problem of limited coverage, we leverage multiple, curved reflectors to provide comprehensive coverage over the most important locations near an intersection. In scenarios where curved reflectors are insufficient, we evaluate the relative benefits of using additional flat planar surfaces. Using these techniques, we more than double the probability of detecting a vehicle near the intersection in three real urban locations, and enable NLoS radar sensing using an entirely new class of reflectors.more » « less
-
We present the design, implementation, and evaluation of RFusion, a robotic system that can search for and retrieve RFID-tagged items in line-of-sight, non-line-of-sight, and fully-occluded settings. RFusion consists of a robotic arm that has a camera and antenna strapped around its gripper. Our design introduces two key innovations: the first is a method that geometrically fuses RF and visual information to reduce uncertainty about the target object's location, even when the item is fully occluded. The second is a novel reinforcement-learning network that uses the fused RF-visual information to efficiently localize, maneuver toward, and grasp target items. We built an end-to-end prototype of RFusion and tested it in challenging real-world environments. Our evaluation demonstrates that RFusion localizes target items with centimeter-scale accuracy and achieves 96% success rate in retrieving fully occluded objects, even if they are under a pile. The system paves the way for novel robotic retrieval tasks in complex environments such as warehouses, manufacturing plants, and smart homes.more » « less
-
null (Ed.)We introduce a new inverse modeling method to interactively design crowd animations. Few works focus on providing succinct high-level and large-scale crowd motion modeling. Our methodology is to read in real or virtual agent trajectory data and automatically infer a set of parameterized crowd motion models. Then, components of the motion models can be mixed, matched, and altered enabling rapidly producing new crowd motions. Our results show novel animations using real-world data, using synthetic data, and imitating real-world scenarios. Moreover, by combining our method with our interactive crowd trajectory sketching tool, we can create complex spatio-temporal crowd animations in about a minute.more » « less
An official website of the United States government

