<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Towards Multi-Robot Shipwreck Mapping</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10296055</idno>
					<idno type="doi"></idno>
					<title level='j'>Advanced Marine Robotics Technical Committee Workshop on Active Perception at IEEE International Conference on Robotics and Automation (ICRA)</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Marios Xanthidis</author><author>Bharat Joshi</author><author>Nare Karapetyan</author><author>Monika Roznere</author><author>Weihan Wang</author><author>James Johnson</author><author>Alberto Quattrini Li</author><author>Jesse Casana</author><author>Philippos Mordohai</author><author>Srihari Nelakuditi</author><author>Ioannis. Rekleitis</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[This paper introduces on-going work about a novel methodology for cooperative mapping of an underwater structure by a team of robots, focusing on accurate photorealistic mapping of shipwrecks. Submerged vessels present a history capsule and they appear all over the world; as such it is important to capture their state through visual sensors. The work in literature addresses the problem with a single expensive robot or robots with similar capabilities that loosely cooperate with each other. The proposed methodology utilizes vision as the primary sensor. Two types of robots, termed distal and proximal observers, having distinct roles, operate around the structure. The first type keeps a distance from the wreck providing a "bird's"-eye-view of the wreck in sync with the pose of the vehicles of the other type. The second type operates near the wreck mapping in detail the exterior of the vessel. Preliminary results illustrate the potential of the proposed strategy.
I. INTRODUCTIONThis paper initiates the discussion on underwater structure mapping at different scales utilizing a team of cooperative robots. Underwater structure modeling is crucial for operating in different natural and man-made environments. These environments are diverse and include shipwrecks, oilrigs and hydroelectric dams, submerged historical sites, and cave systems. Operating in the underwater domain is dangerous, tedious, labor intensive, and physically exhausting for humans; fortunately, underwater robots can enable such operations. However, the underwater domain poses unique challenges, including absence of localization systems (e.g., GPS) and communication infrastructure (e.g., WiFi), challenging visibility conditions, and external forces (currents). This paper proposes a novel methodology for cooperative mapping of an underwater structure by a team of robots. In particular, this paper introduces and discusses the following research questions with some preliminary results on this ongoing work:• RQ1: How to robustly achieve cooperative localization in the presence of occlusions? • RQ2: How to fuse the different sources of information on-board in real time for reconstruction? • RQ3: How should the co-robots cooperate for the mapping task?]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>&#8226; RQ4: How to efficiently and robustly use limited resource communication channels to share information between a team of robots and between robots and operator? Utilizing an Autonomous Underwater Vehicle (AUV) to map an underwater structure presents the following dilemma. On one hand, if the AUV is close enough to map the observed details accurately, it is missing the big picture, i.e., in what direction is the most unknown part, and also, incremental SLAM generates drift. On the other hand, if the AUV is far enough to sense a large portion of the structure, details are missing due to the underwater sensing conditions. The proposed approach utilizes AUVs in two distinct roles: proximal observers are AUVs which operate near the underwater structure (see Fig. <ref type="figure">1</ref>); distal observers are AUVs which operate at a distance, keeping a large portion of the structure together with the proximal observer(s) in their field of view; see Fig. <ref type="figure">3a</ref>. Central to the proposed approach is the ability of the distal observers to detect and localize the proximal observers via a cooperative localization framework <ref type="bibr">[1]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. RELATED WORK</head><p>Wreck mapping has been studied using a variety of techniques all around the world. Photogrammetry of manually obtained images resulted in mosaics in Demesticha et al. <ref type="bibr">[3]</ref>, or from an ROV, see Nornes et al. <ref type="bibr">[4]</ref>. While the Arrows EU project provides an overview of robotic technology use <ref type="bibr">[5]</ref>. Menna et al. <ref type="bibr">[6]</ref> provide a comprehensive review of techniques used. Mapping projects extend from Italy <ref type="bibr">[7]</ref>, Spain <ref type="bibr">[8]</ref>, Canada <ref type="bibr">[9]</ref>, Qatar <ref type="bibr">[10]</ref>, up to the arctic <ref type="bibr">[11]</ref>. With the most famous wreck explorations of the Titanic <ref type="bibr">[12]</ref> and the Antikythera <ref type="bibr">[13]</ref> shipwrecks. Kurazume et al. <ref type="bibr">[14]</ref>, <ref type="bibr">[15]</ref> first introduced the idea of localizing two robots based on mutual observations, termed it as cooperative positioning system, although in the literature it is most often termed as Cooperative Localization (CL) <ref type="bibr">[1]</ref>. Dieudonn&#233; et al. <ref type="bibr">[16]</ref> proved that CL with an arbitrary set of sensors is NP-hard, while Roumeliotis and Rekleitis <ref type="bibr">[17]</ref> provided a theoretical analysis of factors that affect error growth in a CL system. Further studies have examined the performance <ref type="bibr">[18]</ref>, the sensing modalities <ref type="bibr">[19]</ref>, and consistency <ref type="bibr">[20]</ref>, proposing decentralized solutions <ref type="bibr">[21]</ref> and integration with inertial sensors <ref type="bibr">[22]</ref>.</p><p>Active sensing methods for dense 3-D modeling have been reported in the literature, but none of them benefit from two or more tightly collaborating robots, such as the ones proposed here. Several approaches require enumerating and simulating sensing from discrete pose hypotheses in 6-D <ref type="bibr">[23]</ref>- <ref type="bibr">[27]</ref> incurring high computational cost; others are limited to 2-D slices of constant height or depth <ref type="bibr">[28]</ref>, <ref type="bibr">[29]</ref> or require coarse initial models <ref type="bibr">[30]</ref>, <ref type="bibr">[31]</ref>; while most operate on occupancy grids drastically reducing the resolution of the reconstructed surfaces. Exploration strategies <ref type="bibr">[32]</ref>, <ref type="bibr">[33]</ref> that guide a robot towards frontier voxels without requiring sampling in pose space are closely related to our work, but they are limited to single robot, rely on a prior map, and do not model uncertainty. Multirobot 3-D reconstruction methods have been presented <ref type="bibr">[34]</ref>, but robots are assigned areas to map independently without tight cooperation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. THE PROPOSED SYSTEM</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Overview</head><p>The main idea of our approach is to have a team of co-robots collaborating with a human operator. There are two types of robots: proximal observers, which will operate close to the structure in order to produce an accurate map, and distal observers, which will be at distance maintaining the global picture of the structure and the pose of the proximal observer. Currently Cooperative Localization (see III-B); Motion strategies for navigating near obstacles (see III-C); Underwater State Estimation <ref type="bibr">[35]</ref>; and photorealistic reconstruction (see III-E) have been explored, while, the remaining components of the proposed approach are still under investigation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Cooperative Localization</head><p>First, we need to address the problem of cooperative localization: its solution provides the relative pose between Distal and Proximal Observers. We recently proposed a deep learning based framework for detecting the relative pose between the two robots <ref type="bibr">[2]</ref> from a single image.</p><p>To overcome the challenge of obtaining training data with accurate 6D poses, we utilize the Unreal Engine 4 to generate a rendered dataset by projecting the 3D model of the robot swimming over underwater images. However, these rendered images differ from real underwater images due to color loss and poor visibility quality in the underwater domain. Thus, we employ CycleGAN <ref type="bibr">[36]</ref>, an image-to-image translation network, to bridge the gap between rendered and real images, producing a synthetic dataset containing images that are closer in appearance to real underwater images. In effect, a CNN is trained on the synthetic dataset and then tested on real underwater images, as shown in Fig. <ref type="figure">2</ref>.</p><p>We modified YOLOv3 <ref type="bibr">[37]</ref> by adding a pose regression decoder, whereas the object detection decoder and backbone encoder remain unchanged. The pose regression decoder predicts projected keypoint locations of 8 corners corresponding to the 3D the model of the robot in an image, along with their confidence scores. An image is divided into grids, and each image patch corresponding to grid votes for the object detection box. Instead of using all the grid cells for 2D keypoint prediction, we select grid cells that fall inside the object bounding box, thus focusing on regions that belong to the robot. From these candidate keypoint predictions, the most dependable 2D keypoint candidates for each 3D keypoint are selected to yield a set of 2D-to-3D correspondences. More specifically, we select the 12 most confident keypoint predictions for each corner of the robot's 3D model. These selected 2D keypoints are used in the RANSAC-based PnP <ref type="bibr">[38]</ref> algorithm to obtain a robust 6D pose estimate.</p><p>The proposed framework has been tested in different environments -pool, ocean -and different cameras including an Aqua2 robot and several GoPro cameras, demonstrating its robustness with respect to variations in underwater environment, camera intrinsics, and color calibration, Fig. <ref type="figure">3</ref>. Moreover, we demonstrate better accuracy in terms of translation and orientation error on pool dataset compared to the state-of-the-art methods <ref type="bibr">[39]</ref>.</p><p>Our pose estimation framework might fail to produce a correct pose estimate sometimes when the bounding box detection is not accurate. Hence, relative pose of the robot from the observer can be determined accurately for each frame apart from failure cases. We are currently fusing other sensor data, e.g., IMU and pressure-based depth sensor to detect discrepancies in estimated pose across multiple frames and further improve the pose estimate.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Proximal Observer: Active Exploration</head><p>The main goal of the proximal observer is to operate safely in proximity to the target structure and collect close observations. In the absence of prior information, the proximal observers will greedily explore the underwater structure. We plan to investigate two different directions:</p><p>1) Learning-based Exploration: A Deep Learning framework navigating the underwater robot to collect data in proximity to the structure is explored. Recently, we proposed a CNN framework <ref type="bibr">[41]</ref> which is trained based on the way human divers collect data. In this framework, human operators annotated the training dataset, by marking the best direction of motion and orientation to guide the autonomous underwater robot for shipwreck coverage. Although due to the nature of the technique some limited unpredictable behavior is expected, we speculate that such techniques will avoid any issues that might arise from state estimation uncertainty.</p><p>2) Optimization-based Exploration: A path-optimizationbased technique is considered to navigate the robot near the structure maximizing visibility and information gain. Past work introduced AquaNav <ref type="bibr">[42]</ref>, a robust underwater navigation framework utilizing a path optimization planner that allows extensions in the form of cost functions. We will investigate the addition of novel cost functions maximizing visibility of target structures, determined on-line by a CNN similarly to <ref type="bibr">[43]</ref>. Such a technique would require sufficient localization capabilities that could utilize a robust SLAM method such as SVIN2 <ref type="bibr">[35]</ref>, or can be provided locally by dead reckoning. We expect that although robust localization is needed, unlike the first approach, such a framework could ensure a predictable and safe performance with completeness, if paired with a high-level global planner. Preliminary results in simulation already show strong potential for this approach, as can be observed in Fig. <ref type="figure">4</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Distal Observer: Active Positioning</head><p>The main objective of the distal observer is to track the proximal observers while simultaneously keeping in frame from a larger distance a bigger portion of the target structure. To achieve such behavior, the distal observers have to be capable to visually recognize and localize the proximal observers, predict their motions, visually recognize relevant segments of the target structure, and plan motions that satisfy the desired tracking objectives.</p><p>1) Proximal Observers Tracking: Prior work has already successfully addressed this issue for the Aqua2 underwater robot <ref type="bibr">[2]</ref>. The method introduced a DNN technique trained on realistic simulated data that enabled an underwater robot using cameras to extract the relative position and orientation of another visible robot.</p><p>2) Structure Segment Recognition: Past research has already produced a CNN-based recognition method to corals <ref type="bibr">[43]</ref>. We plan to investigate similar ways to recognise human-made underwater structures, such as shipwrecks, by employing DNNs trained on annotated datasets.</p><p>3) Motion Prediction: The distal observer has to produce motions that successfully track the proximal observer, thus the future positions of the tracked robot should be provided or extracted. This problem could be resolved either by utilizing a motion predictor combined with fast replanning, or, when allowed by the underwater communication conditions, the plans of the proximal observers could be shared directly to inform motion planning.</p><p>4) Navigation and Motion Planning: Given the map, areas of interest of the target underwater structure represented as segments, and a predicted trajectory of the proximal observer, the distal observer has to produce a sequence of motions that satisfy our objectives. Having already utilized AquaNav <ref type="bibr">[42]</ref> -a powerful autonomous underwater navigation package for safe operation in proximity -similarly to the planning problem formulation of the proximal observers, we will focus on enhancing the optimization process with novel cost functions encouraging the visibility of the future positions of the tracked robot and the target structure at a desired distance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E. Photo-realistic reconstruction</head><p>Our goal in this project is to obtain photorealistic 3D models useful to the robots and their operators. Even though 3D point clouds may be sufficient for obstacle avoidance, they are not well suited for tasks such as visibility estimation or detection of geometric features such as openings and locations of high curvature. They are also a poor form of visualization for users to appreciate and understand underwater structures. We therefore aim to represent surface models using triangular meshes.</p><p>We propose a representation comprising a set of 3-D keypoints detected on the left (reference) image of the first stereo pair and reconstructed in 3-D via triangulation after they are matched on the right image. The keypoints are connected on the images to form triangles which are then lifted to 3-D after their vertices have been reconstructed. The proposed processing steps are as follows -see also Fig. <ref type="figure">5</ref>, which shows a sketch of the approach.</p><p>1) Extract keypoints from the first image of the object and reconstruct them in 3-D. Harris corners or any other feature detection mechanism can be used here. 2) Apply 2D Delaunay triangulation on the keypoints on the image to obtain a mesh. 3) Select as active keypoint the one with the smallest expected reconstruction error still above the tolerance. 4) Compute the Next-Best View (NBV) for the active keypoint using the approach of Freundlich et al. <ref type="bibr">[44]</ref>, <ref type="bibr">[45]</ref>. 5) Update position estimates and error covariance matrices for all visible keypoints based on new images and label as "explained" those with acceptable expected error. 6) Detect new keypoints in newly observed parts of the scene. 7) Insert as new keypoints the vertices that were generated due to triangle subdivision. 8) Attach new keypoints to the mesh and return to Step 3 unless all keypoints are explained. As the robot approaches a surface, triangles project onto larger areas in the image plane and new keypoints may be detected in their interior. When this occurs, the planarity of these triangles will be assessed and non-planar triangles will be subdivided to allow a tighter approximation of the surface. A classifier will be trained to make these decisions considering image appearance and reprojection errors computed by warping one image to another via the mesh. The output of this module will be a 3-D mesh comprising compact triangles, since their projections satisfy the Delaunay condition on some views, which will be texture-mapped by blending the input images with weights emphasizing fronto-parallel images observing the triangles at high resolution.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>F. Underwater Information Sharing</head><p>The underwater domain is a very harsh environment for communication, which is primarily based on acoustic devices characterized by very low data rate (tens of kilobits), relatively high packet loss, and distance-dependent performance <ref type="bibr">[46]</ref>- <ref type="bibr">[48]</ref>. This requires decisions over time on what information to share between robots and human operators.</p><p>We propose a hybrid representation comprising, on one hand, a set of 3-D keypoints and triangles, and, on the other hand, an octree-based occupancy grid <ref type="bibr">[49]</ref>. The keypoints and triangles capture the details of the surface with the highest precision that has been achieved so far, while the octree naturally allows the hierarchical visualization and transmission of the current state of the estimated model.</p><p>Initially, a coarse level of the octree with a few bits per voxel, indicating whether it is free, occupied or unknown, will be transmitted. Information about voxels at finer levels will be transmitted subsequently, if their parent voxels have been subdivided. Additional information to give the operator a better view of the current surface estimate and the progress of the modeling process, such as 3-D reconstruction uncertainty within each voxel, will also be transmitted progressively. Finally, as the octree reaches a steady state and voxels at the finest resolution have been assigned stable labels, the list of keypoints and triangles will be transmitted. Note that each voxel is marked with the timestamp of the last transmission and last update, so that they can be possibly added back to the transmission list: a metric that identifies the change will determine the importance of retransmission, that can be used by the optimization method proposed in the following section.</p><p>We propose to optimize data transmission by performing a joint cross-layer optimization of the data representation at the application layer, and the operations of the communication network.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. DISCUSSION</head><p>Our research on underwater structure mapping and inspection requires advanced state estimation, active sensing, and communications and will enable more robust situational awareness, autonomy, and robot coordination. The main contributions include the use of co-robots in tight cooperation that increases robustness and efficiency of the system inspecting an underwater structure by assigning different roles to the robots; an explicit cross-layer optimization for communication that improves the communication channel utilization under very limited bandwidth; a sensor data fusion approach that can be run in real time on-board the robot. These contributions will be deployed and demonstrated on inexpensive vehicles, such as the BlueROV2. The software developed to enable these operations will be open-sourced to facilitate research in the challenging field of marine robotics.</p><p>Taking a wider view, we aspire that our work will reduce the cost and risk of having divers perform exploration and mapping tasks, possibly under adverse conditions. We explicitly plan to rely on low-cost platforms to reduce the cost of underwater robotic operations. Reducing the cost and risk would enable broader mapping efforts leading to the discovery of historically important, valuable or otherwise interesting artifacts. Human divers will be engaged only after the robots have found promising evidence. Besides government entities and non-profit organizations, such as museums, the proposed research will benefit archaeologists who can monitor whether a recovery is needed for preservation as mandated by the UNESCO Convention on the Protection of the Underwater Cultural Heritage <ref type="bibr">[50]</ref>.</p></div></body>
		</text>
</TEI>
