skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, July 12 until 2:00 AM ET on Saturday, July 13 due to maintenance. We apologize for the inconvenience.

Search for: All records

Creators/Authors contains: "Lin, Yunzhi"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The success of 6-DoF grasp learning with point cloud input is tempered by the computational costs resulting from their unordered nature and pre-processing needs for reducing the point cloud to a manageable size. These properties lead to failure on small objects with low point cloud cardinality. Instead of point clouds, this manuscript explores grasp generation directly from the RGB-D image input. The approach, called Keypoint-GraspNet (KGN), operates in perception space by detecting projected gripper keypoints in the image, then recovering their SE(3) poses with a PnP algorithm. Training of the network involves a synthetic dataset derived from primitive shape objects with known continuous grasp families. Trained with only single-object synthetic data, Keypoint-GraspNet achieves superior result on our single-object dataset, comparable performance with state-of-art baselines on a multi-object test set, and outperforms the most competitive baseline on small objects. Keypoint-GraspNet is more than 3x faster than tested point cloud methods. Robot experiments show high success rate, demonstrating KGN's practical potential. 
    more » « less
  2. We present a parallelized optimization method based on fast Neural Radiance Fields (NeRF) for estimating 6-DoF pose of a camera with respect to an object or scene. Given a single observed RGB image of the target, we can predict the translation and rotation of the camera by minimizing the residual between pixels rendered from a fast NeRF model and pixels in the observed image. We integrate a momentum-based camera extrinsic optimization procedure into Instant Neural Graphics Primitives, a recent exceptionally fast NeRF implementation. By introducing parallel Monte Carlo sampling into the pose estimation task, our method overcomes local minima and improves efficiency in a more extensive search space. We also show the importance of adopting a more robust pixel-based loss function to reduce error. Experiments demonstrate that our method can achieve improved generalization and robustness on both synthetic and real-world benchmarks. 
    more » « less
  3. Prior work on 6-DoF object pose estimation has largely focused on instance-level processing, in which a textured CAD model is available for each object being detected. Category-level 6- DoF pose estimation represents an important step toward developing robotic vision systems that operate in unstructured, real-world scenarios. In this work, we propose a single-stage, keypoint-based approach for category-level object pose estimation that operates on unknown object instances within a known category using a single RGB image as input. The proposed network performs 2D object detection, detects 2D keypoints, estimates 6- DoF pose, and regresses relative bounding cuboid dimensions. These quantities are estimated in a sequential fashion, leveraging the recent idea of convGRU for propagating information from easier tasks to those that are more difficult. We favor simplicity in our design choices: generic cuboid vertex coordinates, single-stage network, and monocular RGB input. We conduct extensive experiments on the challenging Objectron benchmark, outperforming state-of-the-art methods on the 3D IoU metric (27.6% higher than the MobilePose single-stage approach and 7.1 % higher than the related two-stage approach). 
    more » « less
  4. We propose a single-stage, category-level 6-DoF pose estimation algorithm that simultaneously detects and tracks instances of objects within a known category. Our method takes as input the previous and current frame from a monocular RGB video, as well as predictions from the previous frame, to predict the bounding cuboid and 6- DoF pose (up to scale). Internally, a deep network predicts distributions over object keypoints (vertices of the bounding cuboid) in image coordinates, after which a novel probabilistic filtering process integrates across estimates before computing the final pose using PnP. Our framework allows the system to take previous uncertainties into consideration when predicting the current frame, resulting in predictions that are more accurate and stable than single frame methods. Extensive experiments show that our method outperforms existing approaches on the challenging Objectron benchmark of annotated object videos. We also demonstrate the usability of our work in an augmented reality setting. 
    more » « less