<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>SM/VIO: Robust Underwater State Estimation Switching Between Model-based and Visual Inertial Odometry</title></titleStmt>
			<publicationStmt>
				<publisher>IEEE</publisher>
				<date>05/29/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10475069</idno>
					<idno type="doi">10.1109/ICRA48891.2023.10161407</idno>
					<title level='j'>Proceedings  IEEE International Conference on Robotics and Automation</title>
<idno>1050-4729</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Bharat Joshi</author><author>Hunter Damron</author><author>Sharmin Rahman</author><author>Ioannis Rekleitis</author><author>IEEE</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[This paper addresses the robustness problem of visual-inertial state estimation for underwater operations. Underwater robots operating in a challenging environment are required to know their pose at all times. All vision-based localization schemes are prone to failure due to poor visibility conditions, color loss, and lack of features. The proposed approach utilizes a model of the robot's kinematics together with proprioceptive sensors to maintain the pose estimate during visual-inertial odometry (VIO) failures. Furthermore, the trajectories from successful VIO and the ones from the model-driven odometry are integrated in a coherent set that maintains a consistent pose at all times. Health-monitoring tracks the VIO process ensuring timely switches between the two estimators. Finally, loop closure is implemented on the overall trajectory. The resulting framework is a robust estimator switching between model-based and visual-inertial odometry (SM/VIO). Experimental results from numerous deployments of the Aqua2 vehicle demonstrate the robustness of our approach over coral reefs and a shipwreck.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>This paper proposes a novel framework for solving the robustness problem of state estimation underwater. Central to any autonomous operation is the ability of the robot to know where it is with respect to the environment, a task described under the general term of state estimation. Over the years many different approaches have been proposed; however, state estimation underwater is a challenging problem that still remains open. Vision provides rich semantic information and through place recognition results in loop closures. Unfortunately, as demonstrated in recent work on comparing numerous open-source packages of visual and visual/inertial state estimation <ref type="bibr">[1]</ref>, <ref type="bibr">[2]</ref>, in an underwater environment there are frequent failures for a variety of reasons. In contrast to above water scenarios, GPS-based localization is impossible. In addition to the traditional difficulties of vision-based localization, the underwater environment is prone to rapid changes in lighting conditions, limited visibility, and loss of contrast and color information with depth. Light scattering from suspended plankton and other matter causes "snow effects" and blurring, while the incident angle at which light rays hit the surface of the water can change the visibility at different times of the day <ref type="bibr">[3]</ref>. Finally, as light travels at increasing depths, different parts of its spectrum are absorbed; red is the first color that is seen as black, and eventually orange, yellow, green, and blue follow <ref type="bibr">[4]</ref>, <ref type="bibr">[5]</ref>. In addition to all the above underwater specific challenges, an unknown environment often presents areas where there are no visible landmarks. For example, in Fig. <ref type="figure">1</ref> an Aqua2 <ref type="bibr">[6]</ref> Autonomous Underwater Vehicle (AUV) mapping the deck of a shipwreck reaches the starboard side where the front cameras see only empty water with no features.</p><p>Visual inertial odometry (VIO) has been used for state estimation in a multitude of environments such as indoor, outdoor and even gained some traction in harsh environments such as underwater <ref type="bibr">[7]</ref>. While most VIO research often focuses on improving accuracy, robustness is as critical for autonomous operations. If VIO fails during deployment the results could be catastrophic leading to vehicle loss. From our early investigations <ref type="bibr">[1]</ref>, <ref type="bibr">[2]</ref>, many vision-based approaches diverge, or outright fail, sometimes at random; however, deploying a vehicle underwater in autonomous mode requires that it will return to base, or a collection point, during every deployment. It is very important for AUVs to be able to keep track of their pose; even with diminished accuracy; over the whole operation. We propose switching between VIO and a model-based estimator addressing the accuracy and robustness of state estimation by identifying failure modes, generating robust predictors for estimator divergence/failure, always producing a pose estimate.</p><p>The core of the proposed approach is a robust switching estimator framework, which always provides a realistic estimate reflecting the true state of the vehicle. First of all the health of VIO <ref type="bibr">[8]</ref>, <ref type="bibr">[9]</ref> is monitored by tracking the number of features detected, their spatial distribution, their quality, and their temporal continuity. By utilizing the measures described above when an estimator starts diverging, before complete failure, an alternative estimator is introduced based on sensor inputs robust to underwater environment changes. For example, there is a model-based estimator <ref type="bibr">[10]</ref>, <ref type="bibr">[11]</ref> used for controlling the Aqua2 vehicles combining the inertial and water depth signals together with the flipper configuration and velocity <ref type="bibr">[12]</ref>, <ref type="bibr">[13]</ref>; when the visual/inertial input deteriorates, the proposed system switches to the model-based estimator until the visual/inertial estimates are valid again. The choice of switching-based loosely coupled fusion of odometry estimates ensures flexibility in choosing both the VIO and the conservative estimator in a modular fashion. The two estimators switch back and forth based on the health status of the VIO estimator. Finally, a loopclosure framework ensures the consistent improvement of the combined estimator. Our main contribution is a robust switching-based state estimation framework termed Robust Switching Model-based/Visual Inertial Odometry (SM/VIO) capable of keeping track of an AUV even when VIO fails. This allows the AUV to carry out underlying tasks such as path planning, coverage, and performing motion patterns maintaining a steady pose and relocalize when visiting previous areas. Extensive experiments over different terrains validate the contribution of the proposed robust switching estimator framework in maintaining a realistic pose of the AUV at all times. In contrast, state-of-the-art VIO algorithms <ref type="bibr">[7]</ref>, <ref type="bibr">[14]</ref>- <ref type="bibr">[17]</ref> result in a much higher error or even complete failure.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. RELATED WORK</head><p>In recent years a plethora of open source packages addressing the problem of vision-based state estimation has appeared <ref type="bibr">[8]</ref>, <ref type="bibr">[9]</ref>, <ref type="bibr">[16]</ref>, <ref type="bibr">[18]</ref>- <ref type="bibr">[26]</ref>. Quattrini Li et al. <ref type="bibr">[1]</ref> compared several packages on a variety of datasets to measure the performance in different environments. Extending the above comparison with a focus on the underwater domain, Joshi et al. <ref type="bibr">[2]</ref> investigated the performance of VIO packages. The above comparisons demonstrated that many packages require special motions <ref type="bibr">[19]</ref>, or only work for a limited number of images <ref type="bibr">[27]</ref>, <ref type="bibr">[28]</ref>, or are strictly offline <ref type="bibr">[29]</ref>. Furthermore, intermittent failures were observed, the most common explanation being the random nature of the RANSAC technique <ref type="bibr">[30]</ref> utilized by most of them. The underwater state estimation approach SVIn2 by Rahman et al. <ref type="bibr">[7]</ref> demonstrated improved accuracy and robustness; however, it did not provide any assurances for uninterrupted estimates, which is the focus of this paper.</p><p>Utilizing an AUV to explore an underwater environment has gained popularity over the years. Sonar and stereo estimation for object modeling has been proposed in <ref type="bibr">[31]</ref>, <ref type="bibr">[32]</ref>. Nornes et al. <ref type="bibr">[33]</ref> acquired stereo images utilizing an ROV off the coast of Trondheim Harbour, Norway. In <ref type="bibr">[34]</ref> a deepwater ROV is adopted to map, survey, sample, and excavate a shipwreck area. Sedlazeck et al. <ref type="bibr">[35]</ref>, reconstructed a shipwreck in 3D by pre-processing images collected by an ROV and applying a Structure from Motion based algorithm. The images used for testing such an algorithm contained some structure and a lot of background, where only water was visible. Submerged structures were reconstructed in 3D <ref type="bibr">[36]</ref>. Finally, recent work by Nisar et al. <ref type="bibr">[37]</ref> proposed the use of a model-based estimator to calculate external forces in addition to the pose of aerial vehicles, ignoring failure modes of VIO. In all previous work, when the state estimation failed there was no recovery. In contrast, the proposed approach of SM/VIO for underwater environments addresses the VIO failure and the AUV can continue operations until reaching another feature-rich area.</p><p>The use of switching estimators (also called observers) has not been applied in many mobile robotics applications and not, to our knowledge, to an AUV. Liu <ref type="bibr">[38]</ref> presented a generic approach for non-linear systems. Suzuki et al. <ref type="bibr">[39]</ref> utilized a switching observer to model ground properties together with the robot's kinematics. Manderson et al. <ref type="bibr">[40]</ref> utilized a model estimator in conjunction with Direct Sparse Odometry <ref type="bibr">[41]</ref> without monitoring the health, switching estimators, and merging the two trajectories into one.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. THE PROPOSED SYSTEM</head><p>a) Overview: The proposed approach (SM/VIO) utilizes a model-based estimator termed primitive estimator (PE), utilizing the water depth sensor, the IMU, and the motor commands to propagate the state of the AUV forward when the visual-inertial estimator fails; see Fig. <ref type="figure">3</ref>(a) for an estimate from PE. It is worth noting that the AUV is using the same model to navigate, as such the PE estimate of the lawnmower pattern in Fig. <ref type="figure">3</ref>(a) follows the exact pattern, however, does not correspond to the actual trajectory which was affected by external forces (e.g. water current). When VIO is consistent it is the preferred estimator having higher accuracy due to the exteroceptive sensors (vision and acoustic). Key to the proposed approach is a health monitor process that tracks the performance of VIO over time and informs a decision for switching between VIO and PE; see Fig. <ref type="figure">3(c</ref>) for the switching estimator trajectory, where the switch points are marked green. When the VIO restarts tracking successfully, the health monitor informs the switch from the PE to the VIO. Throughout this process a consistent pose is maintained. More specifically, when the VIO fails, the PE is initialized with the last accurate pose from VIO, and when the VIO restarts the last pose of PE is utilized. Finally, during VIO controlled operations, loop closure is performed, also optimizing the PE produced trajectories; the complete framework is outlined in Fig. <ref type="figure">2</ref>. Following the approach of Joshi et al. <ref type="bibr">[42]</ref>, the stable 3D features are tracked and their position is updated after every loop closure, thus resulting into a consistent point cloud. Next we discuss the individual components of SM/VIO.</p><p>The target vehicle is the Aqua2 AUV <ref type="bibr">[6]</ref>, an amphibious hexapod robot. Underwater, Aqua2 utilizes the motion from six flippers, each actuated independently by an electric motor. The robot's pose is described using the vector x = W p T I , W q T I , W p T I = [x, y, z] represents the position of the robot in the world frame, and W q T I = [q w , q x , q y , q z ] is the quaternion representing the robot's attitude. Aqua2 vehicles are equipped with three cameras, an IMU, and a water pressure sensor.</p><p>b) Primitive Estimator: The primitive estimator maintains a local copy of the robot's pose [ W p T I , W q T I ], which is updated at a rate of 100 Hz. The IMU provides an absolute 5193 Fig. <ref type="figure">2</ref>: Overview of the switching estimator. measurement of W q T I . The velocity of the robot is estimated by the forward speed command v x and the heave (up/down) command v z sent to the Aqua2 during field trails. These commands are used by Aqua2 to perform motion primitives and control the flipper motion. Since the same commands are used for Aqua2 control and PE predictions, the resulting PE trajectories will look perfectly aligned with the desired motion primitives; this is a drawback of just using PE prediction. At each time step t, the position is updated by</p><p>where W R I is the rotation matrix corresponding to W q I . Because the water pressure sensor provides an absolute measurement of depth, this measurement is used instead of the above estimate for z. Moreover, the forward velocity estimates are correct only up to scale depending on external forces (e.g. ocean currents) and acceleration measurements error accumulation. Hence, before integrating the PE trajectory into the robust switching estimator, we scale the PE trajectory using the scaling factor between the PE and the VIO trajectory, as explained later. c) SVIn2 Review: We use a VIO system that fuses information from visual, inertial, water pressure (depth), and acoustic (sonar) sensors presented in Rahman et al. <ref type="bibr">[7]</ref>- <ref type="bibr">[9]</ref>, termed SVIn2. More specifically, SVIn2 estimates the state of the robot by minimizing a joint estimate of the reprojection error and the IMU error, with the addition of the sonar error and the water depth error. SVIn2 performs non-linear optimization on sliding-window keyframes using the reprojection error and the IMU error term formulation similar to Leutenegger et al. <ref type="bibr">[14]</ref>. The depth error term can be calculated as the difference between the AUV's position along the z direction and the water depth measurement provided by a pressure sensor.</p><p>Loop-closing and relocalization is achieved using the binary bag-of-words place recognition module DBoW2 <ref type="bibr">[43]</ref>. The loop closure module maintains a pose graph with odometry edges between successive keyframes and a loop-closure odometry edge is added between the current keyframe and a loop closure candidate when they have enough descriptor matches and pass PnP RANSAC-based geometric verification. For a complete description, please refer to <ref type="bibr">[7]</ref>.</p><p>d) Health Monitoring: As described in earlier studies <ref type="bibr">[1]</ref>, <ref type="bibr">[2]</ref>, estimators often diverge or outright fail even in conditions where they were working before; intermittent failures are much more challenging in the field. Robustness measures and divergence predictors are crucial in detecting imminent failures. To monitor the health of the vision-based state estimator, we employ the following criteria hierarchically; the most important criterion is checked first. The VIO health is evaluated based on the following conditions hierarchically and considered untrustworthy based on:</p><p>1) Keyframe detection. If a keyframe has not been detected after kf wait time seconds the VIO has failed. The only exception is when the system is stationary (zero velocity).</p><p>2) The number of triangulated 3D keypoints that have feature detections in the current keyframe is less than a specified threshold, min kps. We found that min kps between 10-20 worked well.</p><p>3) The number of feature detections per quadrant, in the current keyframe, is less than a specified threshold, min kps per quadrant. To account for situations where there are high number of features detected robustly in a small area; see Fig. <ref type="figure">3(e-f</ref>) where the bottom two quadrants contain all the features. The quadrant criterion is applied only if the total number of feature detections is less than 10 &#215; min kps per quadrant.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>4)</head><p>The ratio of new keypoints to the total keypoints is more than 0.75. The newly triangulated points are those that were not observable previously. 5) The ratio of keypoints with feature detector response less than the average feature detector response in the current keyframe to the total keypoints is more than 0.85. The choice of a high threshold for the ratio is motivated by the fact that hierarchically more important criteria have already been satisfied. Hence, this criterion has less importance overall. Please note, the choice of the above parameters is flexible. For instance, the minimum number of tracked keypoints should be higher than the minimum number of points required for relative camera pose estimation using epipolar geometry. Thus, these parameters should only be taken as reference. During our experiments, we found out that changing the parameters slightly does not change the performance of the switching estimator greatly and the parameters where selected through experimental verification.</p><p>e) Integration of SVIn2 and Primitive Estimator results: Utilizing the framework described in Rahman et al. <ref type="bibr">[8]</ref>, <ref type="bibr">[9]</ref> the graph SLAM formulation, based on the Ceres package <ref type="bibr">[44]</ref>, is augmented to consider estimates from multiple observers thus maintaining the history of the estimates and enabling loop closures.</p><p>We denote the poses SVIn2 and PE as W T sv and W T pe , respectively, representing them as homogeneous 4 &#215; 4 transformation matrices. The goal of the integration process is to provide a robust switching estimator pose W T ro which matches W T sv locally when SVIn2 is properly running, and matches W T pe locally when SVIn2 is reporting failure. To find the scaling factor between SVIn2 and PE, we compute the ratio of the two trajectory lengths when both estimators are tracking well. More specifically, we compute the relative distance travelled as estimated by PE and SVIn2 between successive keyframes at time t and t + 1 and compute the scaling factor s as:</p><p>The scaling factor keeps updating over time whenever SVIn2 is tracking, to account for any changes in external factors. For the sake of convenience, we assume that the PE pose W T pe is appropriately scaled by the scaling factor, s, match the SVIn2 scale. Initially, when SVIn2 starts tracking, W T sv is equivalent to W T ro . When SVIn2 fails, we keep track of robust estimator pose W T st ro and primitive estimator pose W T st pe at switching time, st. When PE is working normally, we compute the relative displacement of the current PE pose with respect to PE pose at the time of switching by W T st -1 pe &#8226; W T pe . This local displacement is then applied to the robust estimator pose using Eq. 3 while making sure that the robust estimator pose tracks the PE pose locally during this time.</p><p>It should be noted that W T st ro &#8226; W T st -1 pe remains constant until SVIn2 starts tracking again.</p><p>Similarly, when switching from the primitive estimator back to SVIn2, the robust estimator tracks the local displacement from SVIn2 using Eq. 4 with W T st sv remaining constant until next switch to PE occurs.</p><p>We make sure that the robust estimator tracks PE locally when SVIn2 fails and tracks SVIn2 again when it recovers as VIO is the preferred estimator maintaining robust uninterrupted pose estimate. As SVIn2 is capable of maintaining an accurate estimate in the presence of brief failures of visual tracking by relying on inertial data, it is not desirable to switch between SVIn2 and PE back and forth frequently, as this introduces additional noise. To reduce frequent switching between estimators, we wait for a small number of successive tracking failures to switch from SVIn2 to PE and vice-versa.</p><p>When the VIO frontend can not detect or track enough keypoints to initialize a new keyframe, no keyframe information is generated. In this case, we wait for the specified time (set as a parameter) if we do not receive any keyframe information from SVIn2 for kf wait time (generally set between 1 to 3 secs), we switch to the primitive estimator. Furthermore, we need to introduce these keyframes into the pose graph differently than regular keyframes as they only contain the odometry information from PE. These keyframes cannot be used for loop closure as they do not possess the keyframe image, features, and the 3D keypoints (used for geometric verification using PnP RANSAC) associated with them. It is worth noting that even if the SVIn2 health status is bad, we can use the keyframes originated from SVIn2 for loop closure.</p><p>IV. EXPERIMENTS a) Datasets: The Aqua2 AUV has been deployed in a variety of challenging environments including shipwrecks, see Fig. <ref type="figure">4(a)</ref>; areas with sand and coral heads, see Fig. <ref type="figure">4(b)</ref>; and coral reefs, see Fig. <ref type="figure">4(c</ref>). During each deployment, Aqua2 performs predefined trajectory patterns while using the odometry information from the PE. We have tested our approach on the following datasets:</p><p>&#8226; lawnmower over shipwreck: The Aqua2 AUV performing a lawnmower pattern over the Stavronikita shipwreck, Barbados. During operations around shipwrecks, a common challenge is the lack of features when the wreck is out of the field of view; for example, while mapping the superstructure, the AUV can move over the side of the wreck (see Fig. <ref type="figure">3(e-f</ref>)), thus facing the open water with no reference. Since VIO is not able to track while facing open water, the AUV's pose cannot be estimated correctly without using the PE. We obtain the ground truth trajectory for the section with the shipwreck in view by using COLMAP <ref type="bibr">[45]</ref>, scale enforced using the rig constraints.</p><p>&#8226; squares over coral reef: The Aqua2 AUV performed square patterns over an area with sand and coral heads, Barbados; see Fig. <ref type="figure">4(b)</ref>. During operations over coral reefs, drop-offs present similar conditions as wrecks, where the vehicle is facing blue water or a sandy patch. In addition, patches of sand present featureimpoverished areas where VIO fails.</p><p>&#8226; lawnmower over coral reef: The Aqua2 performed a lawnmower pattern over a coral reef, Barbados; see Fig. <ref type="figure">4(c</ref>). During operations, the VIO was able to track successfully the whole trajectory. This dataset was later artificially degraded to simulate loss of visual tracking. Utilizing the consistent track produced by VIO as ground truth, a quantitative study of the switching estimator is presented. It's worth mentioning that COLMAP was not able to register images during strips with fast rotation; hence not used for ground truth.</p><p>b) Trajectory Estimation: Trajectories were produced with PE, SVIn2, and the proposed SM/VIO estimators. Figure <ref type="figure">5</ref> presents the resulting trajectories for the three datasets. In all cases the PE trajectory (blue dash-dotted line) accurately traced the requested pattern as the primitive estimator was also used to guide the robot. The VIO (SVIn2) (red dash-dotted line) diverged when visual tracking failed. Finally the proposed estimator SM/VIO (solid red and blue line with green diamonds marking the switching of estimators) tracked consistently the pose of the AUV.</p><p>The shipwreck lawnmower dataset presents a very challenging scenario, the AUV swims over the deck, VIO tracks consistently the feature-rich clutter (Fig. <ref type="figure">3(d)</ref>), then the AUV approaches the sides of the wreck, the number of features is reduced (Fig. <ref type="figure">3</ref>(e)) until it goes over the side (Fig. <ref type="figure">3(f)</ref>) and faces blue water. As the detected features are drastically reduced the VIO continues forward, moving further away from the true position. It is worth noting that several loop closures kept the VIO estimate close enough to the wreck structure but in the wrong area. The proposed framework switched to the PE upon loss of visual tracking as can be seen from the green diamond signifying the switch in Fig. <ref type="figure">5(a)</ref>. COLMAP was able to register images in sections with shipwreck in view.</p><p>In the reef square dataset the AUV performed three squares over an area with some coral heads and a large sandy patch. As can be seen from Fig. <ref type="figure">4</ref>  The shipwreck lawnmower dataset was used to compare with well known VIO packages <ref type="bibr">[9]</ref>, <ref type="bibr">[14]</ref>- <ref type="bibr">[16]</ref>. The ground truth is obtained using COLMAP <ref type="bibr">[45]</ref> which was able to track images with shipwreck in view as it does not require continuous tracking. We compared the performance of various VIO algorithms with COLMAP baseline using root mean squared average translation error (ATE) metric after se3 alignment. As can be seen in II, the proposed estimator maintained a pose estimate and exhibited the least RMSE, in contrast other algorithms deviated after losing track. OpenVINS <ref type="bibr">[16]</ref> was not able to recover after losing track when the shipwreck is out of view and has a very high error. It is worth noting that all the VIO algorithms lose track when the robot approaches the side of the wreck facing blue water. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. CONCLUSION</head><p>The presented estimator robustly tracked an AUV even when traveling through blue water or over a featureless sandy patch. The proposed system uses an Aqua2 vehicle <ref type="bibr">[6]</ref> and the SVIn2 <ref type="bibr">[9]</ref> VIO approach; however, any AUV with a well-understood motion model can be utilized together with any accurate VIO package. Recent deep learning based inertial odometry approaches <ref type="bibr">[46]</ref>- <ref type="bibr">[48]</ref> can also serve as a conservative alternative estimator. An evaluation of visual features for the underwater domain <ref type="bibr">[49]</ref>- <ref type="bibr">[51]</ref> will contribute additional information to the VIO health monitor.</p><p>Future use of the proposed approach will be to combine it with coral classification algorithms <ref type="bibr">[52]</ref>, <ref type="bibr">[53]</ref> in order to extract accurate coral counts over trajectories <ref type="bibr">[54]</ref> and models of the underlying reef geometry, and for mapping underwater structures <ref type="bibr">[55]</ref>. We are currently working on extending the Aqua2 vehicle operations inside underwater caves. The challenging lighting conditions in conjunction with the extreme environment require the localization abilities of the vehicle to be robust even when one of the sensors fails temporarily.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Authorized licensed use limited to: University of South Carolina. Downloaded on November 21,2023 at 23:52:47 UTC from IEEE Xplore. Restrictions apply.</p></note>
		</body>
		</text>
</TEI>
