Accurate pose estimation is often a requirement for robust robotic grasping and manipulation of objects placed in cluttered, tight environments, such as a shelf with multiple objects. When deep learning approaches are employed to perform this task, they typically require a large amount of training data. However, obtaining precise 6 degrees of freedom for ground-truth can be prohibitively expensive. This work therefore proposes an architecture and a training process to solve this issue. More precisely, we present a weak object detector that enables localizing objects and estimating their 6D poses in cluttered and occluded scenes. To minimize the human labor required for annotations, the proposed detector is trained with a combination of synthetic and a few weakly annotated real images (as little as 10 images per object), for which a human provides only a list of objects present in each image (no time-consuming annotations, such as bounding boxes, segmentation masks and object poses). To close the gap between real and synthetic images, we use multiple domain classifiers trained adversarially. During the inference phase, the resulting class-specific heatmaps of the weak detector are used to guide the search of 6D poses of objects. Our proposed approach is evaluated on several publiclymore »
This content will become publicly available on November 8, 2022
IMU-Assisted Learning of Single-View Rolling Shutter Correction
Rolling shutter distortion is highly undesirable for photography and computer vision algorithms (e.g., visual SLAM) because pixels can be potentially captured at different times and poses. In this paper, we propose a deep neural network to predict depth and row-wise pose from a single image for rolling shutter correction. Our contribution in this work is to incorporate inertial measurement unit (IMU) data into the pose refinement process, which, compared to the state-of-the-art, greatly enhances the pose prediction. The improved accuracy and robustness make it possible for numerous vision algorithms to use imagery captured by rolling shutter cameras and produce highly accurate results. We also extend a dataset to have real rolling shutter images, IMU data, depth maps, camera poses, and corresponding global shutter images for rolling shutter correction training. We demonstrate the efficacy of the proposed method by evaluating the performance of Direct Sparse Odometry (DSO) algorithm on rolling shutter imagery corrected using the proposed approach. Results show marked improvements of the DSO algorithm over using uncorrected imagery, validating the proposed approach.
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Conference on Robot Learning
- Sponsoring Org:
- National Science Foundation
More Like this
Airborne Validation of ICESat-2 ATLAS Data over Crevassed Surfaces and Other Complex Glacial Environments: Results from Experiments of Laser Altimeter and Kinematic GPS Data Collection from a Helicopter over a Surging Arctic Glacier (Negribreen, Svalbard)The topic of this paper is the airborne evaluation of ICESat-2 Advanced Topographic Laser Altimeter System (ATLAS) measurement capabilities and surface-height-determination over crevassed glacial terrain, with a focus on the geodetical accuracy of geophysical data collected from a helicopter. To obtain surface heights over crevassed and otherwise complex ice surface, ICESat-2 data are analyzed using the density-dimension algorithm for ice surfaces (DDA-ice), which yields surface heights at the nominal 0.7 m along-track spacing of ATLAS data. As the result of an ongoing surge, Negribreen, Svalbard, provided an ideal situation for the validation objectives in 2018 and 2019, because many different crevasse types and morphologically complex ice surfaces existed in close proximity. Airborne geophysical data, including laser altimeter data (profilometer data at 905 nm frequency), differential Global Positioning System (GPS), Inertial Measurement Unit (IMU) data, on-board-time-lapse imagery and photographs, were collected during two campaigns in summers of 2018 and 2019. Airborne experiment setup, geodetical correction and data processing steps are described here. To date, there is relatively little knowledge of the geodetical accuracy that can be obtained from kinematic data collection from a helicopter. Our study finds that (1) Kinematic GPS data collection with correction in post-processing yields higher accuracies thanmore »
Aerial images provide important situational aware- ness for responding to natural disasters such as hurricanes. They are well-suited for providing information for damage estimation and localization (DEL); i.e., characterizing the type and spatial extent of damage following a disaster. Despite recent advances in sensing and unmanned aerial systems technology, much of post-disaster aerial imagery is still taken by handheld DSLR cameras from small, manned, fixed-wing aircraft. However, these handheld cameras lack IMU information, and images are taken opportunistically post-event by operators. As such, DEL from such imagery is still a highly manual and time-consuming process. We propose an approach to both detect damage in aerial images and localize it in world coordinates, with specific focus on detecting and localizing flooding. The approach is based on using structure from motion to relate image coordinates to world coordinates via a projective transformation, using class activation mapping to detect the extent of damage in an image, and applying the projective transformation to localize damage in world coordinates. We evaluate the performance of our approach on post-event data from the 2016 Louisiana floods, and find that our approach achieves a precision of 88%. Given this high precision using limited data, we argue that thismore »
Monocular depth predictors are typically trained on large-scale training sets which are naturally biased w.r.t the distribution of camera poses. As a result, trained predic- tors fail to make reliable depth predictions for testing exam- ples captured under uncommon camera poses. To address this issue, we propose two novel techniques that exploit the camera pose during training and prediction. First, we in- troduce a simple perspective-aware data augmentation that synthesizes new training examples with more diverse views by perturbing the existing ones in a geometrically consis- tent manner. Second, we propose a conditional model that exploits the per-image camera pose as prior knowledge by encoding it as a part of the input. We show that jointly ap- plying the two methods improves depth prediction on im- ages captured under uncommon and even never-before-seen camera poses. We show that our methods improve perfor- mance when applied to a range of different predictor ar- chitectures. Lastly, we show that explicitly encoding the camera pose distribution improves the generalization per- formance of a synthetically trained depth predictor when evaluated on real images.
The decomposition of light transport into direct and global components, diffuse and specular interreflections, and subsurface scattering allows for new visualizations of light in everyday scenes. In particular, indirect light contains a myriad of information about the complex appearance of materials useful for computer vision and inverse rendering applications. In this paper, we present a new imaging technique that captures and analyzes components of indirect light via light transport using a synchronized projector-camera system. The rectified system illuminates the scene with epipolar planes corresponding to projector rows, and we vary two key parameters to capture plane-to-ray light transport between projector row and camera pixel: (1)the offset between projector row and camera row in the rolling shutter, and (2)the exposure of the camera row. We describe how this synchronized rolling shutter performs illumination multiplexing, and develop a nonlinear optimization algorithm to demultiplex the resulting 3D light transport operator. Using our system, we are able to capture live short and long-range indirect light transport, disambiguate subsurface scattering, diffuse and specular interreflections, and distinguish materials according to their subsurface scattering properties. In particular, we show the utility of indirect imaging for capturing and analyzing the hidden structure of veins in human skin.