skip to main content

Title: Computational Mirrors: Blind Inverse Light Transport by Deep Matrix Factorization
We recover a video of the motion taking place in a hidden scene by observing changes in indirect illumination in a nearby uncalibrated visible region. We solve this problem by factoring the observed video into a matrix product between the unknown hidden scene video and an unknown light transport matrix. This task is extremely ill-posed as any non-negative factorization will satisfy the data. Inspired by recent work on the Deep Image Prior, we parameterize the factor matrices using randomly initialized convolutional neural networks trained in a one-off manner, and show that this results in decompositions that reflect the true motion in the hidden scene.
; ; ; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
Advances in neural information processing systems
Sponsoring Org:
National Science Foundation
More Like this
  1. We present a passive non-line-of-sight method that infers the number of people or activity of a person from the observation of a blank wall in an unknown room. Our technique analyzes complex imperceptible changes in indirect illumination in a video of the wall to reveal a signal that is correlated with motion in the hidden part of a scene. We use this signal to classify between zero, one, or two moving people, or the activity of a person in the hidden scene. We train two convolutional neural networks using data collected from 20 different scenes, and achieve an accuracy of 94% for both tasks in unseen test environments and real-time online settings. Unlike other passive non-line-of-sight methods, the technique does not rely on known occluders or controllable light sources, and generalizes to unknown rooms with no recalibration. We analyze the generalization and robustness of our method with both real and synthetic data, and study the effect of the scene parameters on the signal quality.
  2. This paper proposes a representational model for image pairs such as consecutive video frames that are related by local pixel displacements, in the hope that the model may shed light on motion perception in primary visual cortex (V1). The model couples the following two components: (1) the vector representations of local contents of images and (2) the matrix representations of local pixel displacements caused by the relative motions between the agent and the objects in the 3D scene. When the image frame undergoes changes due to local pixel displacements, the vectors are multiplied by the matrices that represent the local displacements. Thus the vector representation is equivariant as it varies according to the local displacements. Our experiments show that our model can learn Gabor-like filter pairs of quadrature phases. The profiles of the learned filters match those of simple cells in Macaque V1. Moreover, we demonstrate that the model can learn to infer local motions in either a supervised or unsupervised manner. With such a simple model, we achieve competitive results on optical flow estimation.
  3. Disentangling the sources of visual motion in a dynamic scene during self-movement or ego motion is important for autonomous navigation and tracking. In the dynamic image segments of a video frame containing independently moving objects, optic flow relative to the next frame is the sum of the motion fields generated due to camera and object motion. The traditional ego-motion estimation methods assume the scene to be static, and the recent deep learning-based methods do not separate pixel velocities into object- and ego-motion components. We propose a learning-based approach to predict both ego-motion parameters and object-motion field (OMF) from image sequences using a convolutional autoencoder while being robust to variations due to the unconstrained scene depth. This is achieved by: 1) training with continuous ego-motion constraints that allow solving for ego-motion parameters independently of depth and 2) learning a sparsely activated overcomplete ego-motion field (EMF) basis set, which eliminates the irrelevant components in both static and dynamic segments for the task of ego-motion estimation. In order to learn the EMF basis set, we propose a new differentiable sparsity penalty function that approximates the number of nonzero activations in the bottleneck layer of the autoencoder and enforces sparsity more effectively than L1-more »and L2-norm-based penalties. Unlike the existing direct ego-motion estimation methods, the predicted global EMF can be used to extract OMF directly by comparing it against the optic flow. Compared with the state-of-the-art baselines, the proposed model performs favorably on pixelwise object- and ego-motion estimation tasks when evaluated on real and synthetic data sets of dynamic scenes.« less
  4. Non-line-of-sight (NLOS) imaging is a rapidly advancing technology that provides asymmetric vision: seeing without being seen. Though limited in accuracy, resolution, and depth recovery compared to active methods, the capabilities of passive methods are especially surprising because they typically use only a single, inexpensive digital camera. One of the largest challenges in passive NLOS imaging is ambient background light, which limits the dynamic range of the measurement while carrying no useful information about the hidden part of the scene. In this work we propose a new reconstruction approach that uses an optimized linear transformation to balance the rejection of uninformative light with the retention of informative light, resulting in fast (video-rate) reconstructions of hidden scenes from photographs of a blank wall under high ambient light conditions.
  5. Abstract

    Non-Line-Of-Sight (NLOS) imaging aims at recovering the 3D geometry of objects that are hidden from the direct line of sight. One major challenge with this technique is the weak available multibounce signal limiting scene size, capture speed, and reconstruction quality. To overcome this obstacle, we introduce a multipixel time-of-flight non-line-of-sight imaging method combining specifically designed Single Photon Avalanche Diode (SPAD) array detectors with a fast reconstruction algorithm that captures and reconstructs live low-latency videos of non-line-of-sight scenes with natural non-retroreflective objects. We develop a model of the signal-to-noise-ratio of non-line-of-sight imaging and use it to devise a method that reconstructs the scene such that signal-to-noise-ratio, motion blur, angular resolution, and depth resolution are all independent of scene depth suggesting that reconstruction of very large scenes may be possible.