skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras
Event-based cameras have shown great promise in a variety of situations where frame based cameras suffer, such as high speed motions and high dynamic range scenes. However, developing algorithms for event measurements requires a new class of hand crafted algorithms. Deep learning has shown great success in providing model free solutions to many problems in the vision community, but existing networks have been developed with frame based images in mind, and there does not exist the wealth of labeled data for events as there does for images for supervised training. To these points, we present EV-FlowNet, a novel self-supervised deep learning pipeline for optical flow estimation for event based cameras. In particular, we introduce an image based representation of a given event stream, which is fed into a self-supervised neural network as the sole input. The corresponding grayscale images captured from the same camera at the same time as the events are then used as a supervisory signal to provide a loss function at training time, given the estimated flow from the network. We show that the resulting network is able to accurately predict optical flow from events only in a variety of different scenes, with performance competitive to image based networks. This method not only allows for accurate estimation of dense optical flow, but also provides a framework for the transfer of other self-supervised methods to the event-based domain.  more » « less
Award ID(s):
1703319
PAR ID:
10076460
Author(s) / Creator(s):
Date Published:
Journal Name:
Robotics: Science and Systems
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Event-based motion field estimation is an important task. However, current optical flow methods face challenges: learning-based approaches, often frame-based and relying on CNNs, lack cross-domain transferability, while model-based methods, though more robust, are less accurate. To address the limitations of optical flow estimation, recent works have focused on normal flow, which can be more reliably measured in regions with limited texture or strong edges. However, existing normal flow estimators are predominantly model-based and suffer from high errors. In this paper, we propose a novel supervised point-based method for normal flow estimation that overcomes the limitations of existing event learning-based approaches. Using a local point cloud encoder, our method directly estimates per-event normal flow from raw events, offering multiple unique advantages: 1) It produces temporally and spatially sharp predictions. 2) It supports more diverse data augmentation, such as random rotation, to improve robustness across various domains. 3) It naturally supports uncertainty quantification via ensemble inference, which benefits downstream tasks. 4) It enables training and inference on undistorted data in normalized camera coordinates, improving transferability across cameras. Extensive experiments demonstrate that our method achieves better and more consistent performance than state-of-the-art methods when transferred across different datasets. Leveraging this transferability, we train our model on the union of datasets and release it for public use. Finally, we introduce an egomotion solver based on a maximum-margin problem that uses normal flow and IMU to achieve strong performance in challenging scenarios. Codes are available at github.com/dhyuan99/VecKM flow. 
    more » « less
  2. Reconstruction of 3D neural fields from posed images has emerged as a promising method for self-supervised representation learning. The key challenge preventing the deployment of these 3D scene learners on large-scale video data is their dependence on precise camera poses from structure-from-motion, which is prohibitively expensive to run at scale. We propose a method that jointly reconstructs camera poses and 3D neural scene representations online and in a single forward pass. We estimate poses by first lifting frame-to-frame optical flow to 3D scene flow via differentiable rendering, preserving locality and shift-equivariance of the image processing backbone. SE(3) camera pose estimation is then performed via a weighted least-squares fit to the scene flow field. This formulation enables us to jointly supervise pose estimation and a generalizable neural scene representation via re-rendering the input video, and thus, train end-to-end and fully self-supervised on real-world video datasets. We demonstrate that our method performs robustly on diverse, real-world video, notably on sequences traditionally challenging to optimization-based pose estimation techniques. 
    more » « less
  3. Reconstruction of 3D neural fields from posed images has emerged as a promising method for self-supervised representation learning. The key challenge preventing the deployment of these 3D scene learners on large-scale video data is their dependence on precise camera poses from structure-from-motion, which is prohibitively expensive to run at scale. We propose a method that jointly reconstructs camera poses and 3D neural scene representations online and in a single forward pass. We estimate poses by first lifting frame-to-frame optical flow to 3D scene flow via differentiable rendering, preserving locality and shift-equivariance of the image processing backbone. SE(3) camera pose estimation is then performed via a weighted least-squares fit to the scene flow field. This formulation enables us to jointly supervise pose estimation and a generalizable neural scene representation via re-rendering the input video, and thus, train end-to-end and fully self-supervised on real-world video datasets. We demonstrate that our method performs robustly on diverse, real-world video, notably on sequences traditionally challenging to optimization-based pose estimation techniques. 
    more » « less
  4. Image stitching involves combining multiple images of the same scene captured from different viewpoints into a single image with an expanded field of view. While this technique has various applications in computer vision, traditional methods rely on the successive stitching of image pairs taken from multiple cameras. While this approach is effective for organized camera arrays, it can pose challenges for unstructured ones, especially when handling scene overlaps. This paper presents a deep learning-based approach for stitching images from large unstructured camera sets covering complex scenes. Our method processes images concurrently by using the SandFall algorithm to transform data from multiple cameras into a reduced fixed array, thereby minimizing data loss. A customized convolutional neural network then processes these data to produce the final image. By stitching images simultaneously, our method avoids the potential cascading errors seen in sequential pairwise stitching while offering improved time efficiency. In addition, we detail an unsupervised training method for the network utilizing metrics from Generative Adversarial Networks supplemented with supervised learning. Our testing revealed that the proposed approach operates in roughly ∼1/7th the time of many traditional methods on both CPU and GPU platforms, achieving results consistent with established methods. 
    more » « less
  5. Self-supervised depth estimation has recently demonstrated promising performance compared to the supervised methods on challenging indoor scenes. However, the majority of efforts mainly focus on exploiting photometric and geometric consistency via forward image warping and backward image warping, based on monocular videos or stereo image pairs. The influence of defocus blur to depth estimation is neglected, resulting in a limited performance for objects and scenes in out of focus. In this work, we propose the first framework for simultaneous depth estimation from a single image and image focal stacks using depth-from-defocus and depth-from-focus algorithms. The proposed network is able to learn optimal depth mapping from the information contained in the blur of a single image, generate a simulated image focal stack and all-in-focus image, and train a depth estimator from an image focal stack. In addition to the validation of our method on both synthetic NYUv2 dataset and real DSLR dataset, we also collect our own dataset using a DSLR camera and further verify on it. Experiments demonstrate that our system surpasses the state-of-the-art supervised depth estimation method over 4% in accuracy and achieves superb performance among the methods without direct supervision on the synthesized NYUv2 dataset, which has been rarely explored. 
    more » « less