skip to main content


Title: IMU-Assisted Learning of Single-View Rolling Shutter Correction
Rolling shutter distortion is highly undesirable for photography and computer vision algorithms (e.g., visual SLAM) because pixels can be potentially captured at different times and poses. In this paper, we propose a deep neural network to predict depth and row-wise pose from a single image for rolling shutter correction. Our contribution in this work is to incorporate inertial measurement unit (IMU) data into the pose refinement process, which, compared to the state-of-the-art, greatly enhances the pose prediction. The improved accuracy and robustness make it possible for numerous vision algorithms to use imagery captured by rolling shutter cameras and produce highly accurate results. We also extend a dataset to have real rolling shutter images, IMU data, depth maps, camera poses, and corresponding global shutter images for rolling shutter correction training. We demonstrate the efficacy of the proposed method by evaluating the performance of Direct Sparse Odometry (DSO) algorithm on rolling shutter imagery corrected using the proposed approach. Results show marked improvements of the DSO algorithm over using uncorrected imagery, validating the proposed approach.  more » « less
Award ID(s):
1637875
NSF-PAR ID:
10297594
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Conference on Robot Learning
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We propose a continuous-time spline-based formulation for visual-inertial odometry (VIO). Specifically, we model the poses as a cubic spline, whose temporal derivatives are used to synthesize linear acceleration and angular velocity, which are compared to the measurements from the inertial measurement unit (IMU) for optimal state estimation. The spline boundary conditions create constraints between the camera and the IMU, with which we formulate VIO as a constrained nonlinear optimization problem. Continuous-time pose representation makes it possible to address many VIO challenges, e.g., rolling shutter distortion and sensors that may lack synchronization. We conduct experiments on two publicly available datasets that demonstrate the state-of-the-art accuracy and real-time computational efficiency of our method. 
    more » « less
  2. Visual body signals are designated body poses that deliver an application-specific message. Such signals are widely used for fast message communication in sports (signaling by umpires and referees), transportation (naval officers and aircraft marshallers), and construction (signaling by riggers and crane operators), to list a few examples. Automatic interpretation of such signals can help maintaining safer operations in these industries, help in record-keeping for auditing or accident investigation purposes, and function as a score-keeper in sports. When automation of these signals is desired, it is traditionally performed from a viewer's perspective by running computer vision algorithms on camera feeds. However, computer vision based approaches suffer from performance deterioration in scenarios such as lighting variations, occlusions, etc., might face resolution limitations, and can be challenging to install. Our work, ViSig, breaks with tradition by instead deploying on-body sensors for signal interpretation. Our key innovation is the fusion of ultra-wideband (UWB) sensors for capturing on-body distance measurements, inertial sensors (IMU) for capturing orientation of a few body segments, and photodiodes for finger signal recognition, enabling a robust interpretation of signals. By deploying only a small number of sensors, we show that body signals can be interpreted unambiguously in many different settings, including in games of Cricket, Baseball, and Football, and in operational safety use-cases such as crane operations and flag semaphores for maritime navigation, with > 90% accuracy. Overall, we have seen substantial promise in this approach and expect a large body of future follow-on work to start using UWB and IMU fused modalities for the more general human pose estimation problems. 
    more » « less
  3. Accurate pose estimation is often a requirement for robust robotic grasping and manipulation of objects placed in cluttered, tight environments, such as a shelf with multiple objects. When deep learning approaches are employed to perform this task, they typically require a large amount of training data. However, obtaining precise 6 degrees of freedom for ground-truth can be prohibitively expensive. This work therefore proposes an architecture and a training process to solve this issue. More precisely, we present a weak object detector that enables localizing objects and estimating their 6D poses in cluttered and occluded scenes. To minimize the human labor required for annotations, the proposed detector is trained with a combination of synthetic and a few weakly annotated real images (as little as 10 images per object), for which a human provides only a list of objects present in each image (no time-consuming annotations, such as bounding boxes, segmentation masks and object poses). To close the gap between real and synthetic images, we use multiple domain classifiers trained adversarially. During the inference phase, the resulting class-specific heatmaps of the weak detector are used to guide the search of 6D poses of objects. Our proposed approach is evaluated on several publicly available datasets for pose estimation. We also evaluated our model on classification and localization in unsupervised and semi-supervised settings. The results clearly indicate that this approach could provide an efficient way toward fully automating the training process of computer vision models used in robotics. 
    more » « less
  4. The topic of this paper is the airborne evaluation of ICESat-2 Advanced Topographic Laser Altimeter System (ATLAS) measurement capabilities and surface-height-determination over crevassed glacial terrain, with a focus on the geodetical accuracy of geophysical data collected from a helicopter. To obtain surface heights over crevassed and otherwise complex ice surface, ICESat-2 data are analyzed using the density-dimension algorithm for ice surfaces (DDA-ice), which yields surface heights at the nominal 0.7 m along-track spacing of ATLAS data. As the result of an ongoing surge, Negribreen, Svalbard, provided an ideal situation for the validation objectives in 2018 and 2019, because many different crevasse types and morphologically complex ice surfaces existed in close proximity. Airborne geophysical data, including laser altimeter data (profilometer data at 905 nm frequency), differential Global Positioning System (GPS), Inertial Measurement Unit (IMU) data, on-board-time-lapse imagery and photographs, were collected during two campaigns in summers of 2018 and 2019. Airborne experiment setup, geodetical correction and data processing steps are described here. To date, there is relatively little knowledge of the geodetical accuracy that can be obtained from kinematic data collection from a helicopter. Our study finds that (1) Kinematic GPS data collection with correction in post-processing yields higher accuracies than Real-Time-Kinematic (RTK) data collection. (2) Processing of only the rover data using the Natural Resources Canada Spatial Reference System Precise Point Positioning (CSRS-PPP) software is sufficiently accurate for the sub-satellite validation purpose. (3) Distances between ICESat-2 ground tracks and airborne ground tracks were generally better than 25 m, while distance between predicted and actual ICESat-2 ground track was on the order of 9 m, which allows direct comparison of ice-surface heights and spatial statistical characteristics of crevasses from the satellite and airborne measurements. (4) The Lasertech Universal Laser System (ULS), operated at up to 300 m above ground level, yields full return frequency (400 Hz) and 0.06–0.08 m on-ice along-track spacing of height measurements. (5) Cross-over differences of airborne laser altimeter data are −0.172 ± 2.564 m along straight paths, which implies a precision of approximately 2.6 m for ICESat-2 validation experiments in crevassed terrain. (6) In summary, the comparatively light-weight experiment setup of a suite of small survey equipment mounted on a Eurocopter (Helicopter AS-350) and kinematic GPS data analyzed in post-processing using CSRS-PPP leads to high accuracy repeats of the ICESat-2 tracks. The technical results (1)–(6) indicate that direct comparison of ice-surface heights and crevasse depths from the ICESat-2 and airborne laser altimeter data is warranted. Numerical evaluation of height comparisons utilizes spatial surface roughness measures. The final result of the validation is that ICESat-2 ATLAS data, analyzed with the DDA-ice, facilitate surface-height determination over crevassed terrain, in good agreement with airborne data, including spatial characteristics, such as surface roughness, crevasse spacing and depth, which are key informants on the deformation and dynamics of a glacier during surge. 
    more » « less
  5. Long-range target detection in thermal infrared imagery is a challenging research problem due to the low resolution and limited detail captured by thermal sensors. The limited size and variability in thermal image datasets for small target detection is also a major constraint for the development of accurate and robust detection algorithms. To address both the sensor and data constraints, we propose a novel convolutional neural network (CNN) feature extraction architecture designed for small object detection in data-limited settings. More specifically, we focus on long-range ground-based thermal vehicle detection, but also show the effectiveness of the proposed algorithm on drone and satellite aerial imagery. The design of the proposed architecture is inspired by an analysis of popular object detectors as well as custom-designed networks. We find that restricted receptive fields (rather than more globalized features, as is the trend), along with less down sampling of feature maps and attenuated processing of fine-grained features, lead to greatly improved detection rates while mitigating the model’s capacity to overfit on small or poorly varied datasets. Our approach achieves state-of-the-art results on the Defense Systems Information Analysis Center (DSIAC) automated target recognition (ATR) and the Tiny Object Detection in Aerial Images (AI-TOD) datasets. 
    more » « less