skip to main content


Title: Unsupervised Non-Rigid Image Distortion Removal via Grid Deformation
Many computer vision problems face difficulties when imaging through turbulent refractive media (e.g., air and water) due to the refraction and scattering of light. These effects cause geometric distortion that requires either hand-crafted physical priors or supervised learning methods to remove. In this paper, we present a novel unsupervised network to recover the latent distortion-free image. The key idea is to model non-rigid distortions as deformable grids. Our network consists of a grid deformer that estimates the distortion field and an image generator that outputs the distortion-free image. By leveraging the positional encoding operator, we can simplify the network structure while maintaining fine spatial details in the recovered images. Our method doesn’t need to be trained on labeled data and has good transferability across various turbulent image datasets with different types of distortions. Extensive experiments on both simulated and real-captured turbulent images demonstrate that our method can remove both air and water distortions without much customization.  more » « less
Award ID(s):
1948524 1909192
NSF-PAR ID:
10320131
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
CVF/IEEE International Conference on Computer Vision
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The fluctuation of the water surface causes refractive distortions that severely downgrade the image of an under- water scene. Here, we present the distortion-guided network (DG-Net) for restoring distortion-free underwater images. The key idea is to use a distortion map to guide net- work training. The distortion map models the pixel displacement caused by water refraction. We first use a physically constrained convolutional network to estimate the distortion map from the refracted image. We then use a gen- erative adversarial network guided by the distortion map to restore the sharp distortion-free image. Since the distortion map indicates correspondences between the distorted image and the distortion-free one, it guides the network to make better predictions. We evaluate our network on several real and synthetic underwater image datasets and show that it out-performs the state-of-the-art algorithms, especially in presence of large distortions. We also show results of complex scenarios, including outdoor swimming pool images captured by drone and indoor aquarium images taken by cellphone camera. 
    more » « less
  2. Mobile Augmented Reality (AR), which overlays digital content on the real-world scenes surrounding a user, is bringing immersive interactive experiences where the real and virtual worlds are tightly coupled. To enable seamless and precise AR experiences, an image recognition system that can accurately recognize the object in the camera view with low system latency is required. However, due to the pervasiveness and severity of image distortions, an effective and robust image recognition solution for “in the wild” mobile AR is still elusive. In this article, we present CollabAR, an edge-assisted system that provides distortion-tolerant image recognition for mobile AR with imperceptible system latency. CollabAR incorporates both distortion-tolerant and collaborative image recognition modules in its design. The former enables distortion-adaptive image recognition to improve the robustness against image distortions, while the latter exploits the spatial-temporal correlation among mobile AR users to improve recognition accuracy. Moreover, as it is difficult to collect a large-scale image distortion dataset, we propose a Cycle-Consistent Generative Adversarial Network-based data augmentation method to synthesize realistic image distortion. Our evaluation demonstrates that CollabAR achieves over 85% recognition accuracy for “in the wild” images with severe distortions, while reducing the end-to-end system latency to as low as 18.2 ms. 
    more » « less
  3. Mobile Augmented Reality (AR), which overlays digital content on the real-world scenes surrounding a user, is bringing immersive interactive experiences where the real and virtual worlds are tightly coupled. To enable seamless and precise AR experiences, an image recognition system that can accurately recognize the object in the camera view with low system latency is required. However, due to the pervasiveness and severity of image distortions, an effective and robust image recognition solution for mobile AR is still elusive. In this paper, we present CollabAR, an edge-assisted system that provides distortion-tolerant image recognition for mobile AR with imperceptible system latency. CollabAR incorporates both distortion-tolerant and collaborative image recognition modules in its design. The former enables distortion-adaptive image recognition to improve the robustness against image distortions, while the latter exploits the `spatial-temporal' correlation among mobile AR users to improve recognition accuracy. We implement CollabAR on four different commodity devices, and evaluate its performance on two multi-view image datasets. Our evaluation demonstrates that CollabAR achieves over 96% recognition accuracy for images with severe distortions, while reducing the end-to-end system latency to as low as 17.8ms for commodity mobile devices. 
    more » « less
  4. Abstract

    Ever since the first image of a coral reef was captured in 1885, people worldwide have been accumulating images of coral reefscapes that document the historic conditions of reefs. However, these innumerable reefscape images suffer from perspective distortion, which reduces the apparent size of distant taxa, rendering the images unusable for quantitative analysis of reef conditions. Here we solve this century-long distortion problem by developing a novel computer-vision algorithm,ReScape, which removes the perspective distortion from reefscape images by transforming them into top-down views, making them usable for quantitative analysis of reef conditions. In doing so, we demonstrate the first-ever ecological application and extension of inverse-perspective mapping—a foundational technique used in the autonomous-driving industry. TheReScapealgorithm is composed of seven functions that (1) calibrate the camera lens, (2) remove the inherent lens-induced image distortions, (3) detect the scene’s horizon line, (4) remove the camera-roll angle, (5) detect the transformable reef area, (6) detect the scene’s perspective geometry, and (7) apply brute-force inverse-perspective mapping. The performance of theReScapealgorithm was evaluated by transforming the perspective of 125 reefscape images. Eighty-five percent of the images had no processing errors and of those, 95% were successfully transformed into top-down views.ReScapewas validated by demonstrating that same-length transects, placed increasingly further from the camera, became the same length after transformation. The mission of theReScapealgorithm is to (i) unlock historical information about coral-reef conditions from previously unquantified periods and localities, (ii) enable citizen scientists and recreational photographers to contribute reefscape images to the scientific process, and (iii) provide a new survey technique that can rigorously assess relatively large areas of coral reefs, and other marine and even terrestrial ecosystems, worldwide. To facilitate this mission, we compiled theReScapealgorithm into a free, user-friendly App that does not require any coding experience. Equipped with theReScapeApp, scientists can improve the management and prediction of the future of coral reefs by uncovering historical information from reefscape-image archives and by using reefscape images as a new, rapid survey method, opening a new era of coral-reef monitoring.

     
    more » « less
  5. Abstract Purpose

    Synthetic digital mammogram (SDM) is a 2D image generated from digital breast tomosynthesis (DBT) and used as a substitute for a full‐field digital mammogram (FFDM) to reduce the radiation dose for breast cancer screening. The previous deep learning‐based method used FFDM images as the ground truth, and trained a single neural network to directly generate SDM images with similar appearances (e.g., intensity distribution, textures) to the FFDM images. However, the FFDM image has a different texture pattern from DBT. The difference in texture pattern might make the training of the neural network unstable and result in high‐intensity distortion, which makes it hard to decrease intensity distortion and increase perceptual similarity (e.g., generate similar textures) at the same time. Clinically, radiologists want to have a 2D synthesized image that feels like an FFDM image in vision and preserves local structures such as both mass and microcalcifications (MCs) in DBT because radiologists have been trained on reading FFDM images for a long time, while local structures are important for diagnosis. In this study, we proposed to use a deep convolutional neural network to learn the transformation to generate SDM from DBT.

    Method

    To decrease intensity distortion and increase perceptual similarity, a multi‐scale cascaded network (MSCN) is proposed to generate low‐frequency structures (e.g., intensity distribution) and high‐frequency structures (e.g., textures) separately. The MSCN consist of two cascaded sub‐networks: the first sub‐network is used to predict the low‐frequency part of the FFDM image; the second sub‐network is used to generate a full SDM image with textures similar to the FFDM image based on the prediction of the first sub‐network. The mean‐squared error (MSE) objective function is used to train the first sub‐network, termed low‐frequency network, to generate a low‐frequency SDM image. The gradient‐guided generative adversarial network's objective function is to train the second sub‐network, termed high‐frequency network, to generate a full SDM image with textures similar to the FFDM image.

    Results

    1646 cases with FFDM and DBT were retrospectively collected from the Hologic Selenia system for training and validation dataset, and 145 cases with masses or MC clusters were independently collected from the Hologic Selenia system for testing dataset. For comparison, the baseline network has the same architecture as the high‐frequency network and directly generates a full SDM image. Compared to the baseline method, the proposed MSCN improves the peak‐to‐noise ratio from 25.3 to 27.9 dB and improves the structural similarity from 0.703 to 0.724, and significantly increases the perceptual similarity.

    Conclusions

    The proposed method can stabilize the training and generate SDM images with lower intensity distortion and higher perceptual similarity.

     
    more » « less