skip to main content


Title: Comprehensive Underwater Object Tracking Benchmark Dataset and Underwater Image Enhancement With GAN
Abstract—Current state-of-the-art object tracking methods have largely benefited from the public availability of numerous benchmark datasets. However, the focus has been on open-air imagery and much less on underwater visual data. Inherent underwater distortions, such as color loss, poor contrast, and underexposure, caused by attenuation of light, refraction, and scattering, greatly affect the visual quality of underwater data, and as such, existing open-air trackers perform less efficiently on such data. To help bridge this gap, this article proposes a first comprehensive underwater object tracking (UOT100) benchmark dataset to facilitate the development of tracking algorithms well-suited for underwater environments. The proposed dataset consists of 104 underwater video sequences and more than 74 000 annotated frames derived from both natural and artificial underwater videos, with great varieties of distortions. We benchmark the performance of 20 state-of-the-art object tracking algorithms and further introduce a cascaded residual network for underwater image enhancement model to improve tracking accuracy and success rate of trackers. Our experimental results demonstrate the shortcomings of existing tracking algorithms on underwater data and how our generative adversarial network (GAN)-based enhancement model can be used to improve tracking performance. We also evaluate the visual quality of our model’s output against existing GAN-based methods using well-accepted quality metrics and demonstrate that our model yields better visual data. Index Terms—Underwater benchmark dataset, underwater generative adversarial network (GAN), underwater image enhancement (UIE), underwater object tracking (UOT).  more » « less
Award ID(s):
1942053
NSF-PAR ID:
10309916
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE Journal of Oceanic Engineering
ISSN:
0364-9059
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    In-situ visual observations of marine organisms is crucial to developing behavioural understandings and their relations to their surrounding ecosystem. Typically, these observations are collected via divers, tags, and remotely-operated or human-piloted vehicles. Recently, however, autonomous underwater vehicles equipped with cameras and embedded computers with GPU capabilities are being developed for a variety of applications, and in particular, can be used to supplement these existing data collection mechanisms where human operation or tags are more difficult. Existing approaches have focused on using fully-supervised tracking methods, but labelled data for many underwater species are severely lacking. Semi-supervised trackers may offer alternative tracking solutions because they require less data than fully-supervised counterparts. However, because there are not existing realistic underwater tracking datasets, the performance of semi-supervised tracking algorithms in the marine domain is not well understood. To better evaluate their performance and utility, in this paper we provide (1) a novel dataset specific to marine animals located athttp://warp.whoi.edu/vmat/, (2) an evaluation of state-of-the-art semi-supervised algorithms in the context of underwater animal tracking, and (3) an evaluation of real-world performance through demonstrations using a semi-supervised algorithm on-board an autonomous underwater vehicle to track marine animals in the wild.

     
    more » « less
  2. The fluctuation of the water surface causes refractive distortions that severely downgrade the image of an under- water scene. Here, we present the distortion-guided network (DG-Net) for restoring distortion-free underwater images. The key idea is to use a distortion map to guide net- work training. The distortion map models the pixel displacement caused by water refraction. We first use a physically constrained convolutional network to estimate the distortion map from the refracted image. We then use a gen- erative adversarial network guided by the distortion map to restore the sharp distortion-free image. Since the distortion map indicates correspondences between the distorted image and the distortion-free one, it guides the network to make better predictions. We evaluate our network on several real and synthetic underwater image datasets and show that it out-performs the state-of-the-art algorithms, especially in presence of large distortions. We also show results of complex scenarios, including outdoor swimming pool images captured by drone and indoor aquarium images taken by cellphone camera. 
    more » « less
  3. Underwater image enhancement and turbidity removal (dehazing) is a very challenging problem, not only due to the sheer variety of environments where it is applicable, but also due to the lack of high-resolution, labelled image data. In this paper, we present a novel, two-step deep learning approach for underwater image dehazing and colour correction. In iDehaze, we leverage computer graphics to physically model light propagation in underwater conditions. Specifically, we construct a three-dimensional, photorealistic simulation of underwater environments, and use them to gather a large supervised training dataset. We then train a deep convolutional neural network to remove the haze in these images, then train a second network to transform the colour space of the dehazed images onto a target domain. Experiments demonstrate that our two-step iDehaze method is substantially more effective at producing high-quality underwater images, achieving state-of-the-art performance on multiple datasets. Code, data and benchmarks will be open sourced. 
    more » « less
  4. null (Ed.)
    The ocean is a vast three-dimensional space that is poorly explored and understood, and harbors unobserved life and processes that are vital to ecosystem function. To fully interrogate the space, novel algorithms and robotic platforms are required to scale up observations. Locating animals of interest and extended visual observations in the water column are particularly challenging objectives. Towards that end, we present a novel Machine Learning-integrated Tracking (or ML-Tracking) algorithm for underwater vehicle control that builds on the class of algorithms known as tracking-by-detection. By coupling a multi-object detector (trained on in situ underwater image data), a 3D stereo tracker, and a supervisor module to oversee the mission, we show how ML-Tracking can create robust tracks needed for long duration observations, as well as enable fully automated acquisition of objects for targeted sampling. Using a remotely operated vehicle as a proxy for an autonomous underwater vehicle, we demonstrate continuous input from the ML-Tracking algorithm to the vehicle controller during a record, 5+ hr continuous observation of a midwater gelatinous animal known as a siphonophore. These efforts clearly demonstrate the potential that tracking-by-detection algorithms can have on exploration in unexplored environments and discovery of undiscovered life in our ocean. 
    more » « less
  5. This paper presents an attention-based, deep learning framework that converts robot camera frames with dynamic content into static frames to more easily apply simultaneous localization and mapping (SLAM) algorithms. The vast majority of SLAM methods have difficulty in the presence of dynamic objects appearing in the environment and occluding the area being captured by the camera. Despite past attempts to deal with dynamic objects, challenges remain to reconstruct large, occluded areas with complex backgrounds. Our proposed Dynamic-GAN framework employs a generative adversarial network to remove dynamic objects from a scene and inpaint a static image free of dynamic objects. The Dynamic-GAN framework utilizes spatial-temporal transformers, and a novel spatial-temporal loss function. The evaluation of Dynamic-GAN was comprehensively conducted both quantitatively and qualitatively by testing it on benchmark datasets, and on a mobile robot in indoor navigation environments. As people appeared dynamically in close proximity to the robot, results showed that large, feature-rich occluded areas can be accurately reconstructed with our attention-based deep learning framework for dynamic object removal. Through experiments we demonstrate that our proposed algorithm has up to 25% better performance on average as compared to the standard benchmark algorithms. 
    more » « less