<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Proprioceptive Invariant Robot State Estimation</title></titleStmt>
			<publicationStmt>
				<publisher>arXiv</publisher>
				<date>11/07/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10565191</idno>
					<idno type="doi"></idno>
					
					<author>Tzu-Yuan Lin</author><author>Tingjun Li</author><author>Wenzhe Tong</author><author>Maani Ghaffari</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[This paper reports on developing a real-time invariant proprioceptive robot state estimation framework called DRIFT. A didactic introduction to invariant Kalman filtering is provided to make this cutting-edge symmetry-preserving approach accessible to a broader range of robotics applications. Furthermore, this work dives into the development of a proprioceptive state estimation framework for dead reckoning that only consumes data from an onboard inertial measurement unit and kinematics of the robot, with two optional modules, a contact estimator and a gyro filter for low-cost robots, enabling a significant capability on a variety of robotics platforms to track the robot's state over long trajectories in the absence of perceptual data. Extensive real-world experiments using a legged robot, an indoor wheeled robot, a field robot, and a full-size vehicle, as well as simulation results with a marine robot, are provided to understand the limits of DRIFT.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>Autonomous robots can greatly benefit humanity by taking over dangerous and tedious jobs, such as extraplanetary exploration, rescue missions in disaster scenes, warehouse logistics, and daily home assistance for elders <ref type="bibr">[1]</ref>. In order to accomplish the above tasks, autonomous robots must be able to navigate reliably in various environments and maintain stability. Modern motion planning and control algorithms rely heavily on accurate estimations of the robot's states, i.e., orientation, position, and velocity. Subsequently, accurate state estimation is a key capability in robot autonomy.</p><p>To handle noisy and biased data, the fusion of different sensory data is necessary and is the standard practice in robotics <ref type="bibr">[2]</ref>. Exteroceptive sensors such as cameras and LiDARs provide perceptual information that can assist with global localization and navigation. However, these types of sensors often operate at a low update rate  and are sensitive to perceptually degraded scenarios such as illumination changes. On the other hand, proprioceptive sensors such as Inertial Measurement Units (IMU) and joint encoders operate regardless of external factors such as lighting conditions and often at high frequencies (100-1000 Hz or higher). The highfrequency information is often important for agile navigation in challenging terrains. In addition, being agnostic to illumination changes, proprioceptive sensors can offer robust sensing modalities in extreme or featureless environments such as dense fog or sand storms. As such, designing an accurate and real-time proprioceptive state estimation algorithm is desirable.</p><p>Funding for M. Ghaffari was in part provided by NSF Award No. 2118818. This work was also partly supported by the Amazon Research Awards. (Corresponding author: Tzu-Yuan Lin.)</p><p>The authors are with the University of Michigan, Ann Arbor, MI 48109, USA. {tzuyuan, tingjunl, wenzhet, maanigj}@umich.edu. Fig. <ref type="figure">1</ref>. Estimated trajectory from DRIFT overlapped with the satellite image at the University of Michigan North Campus. A Clearpath Robotics Husky robot was driven on the sidewalk for 55 minutes, with a total path of around 3 kilometers. Consuming proprioceptive measurements only, DRIFT can produce highly accurate estimations for long-horizon operations. This experiment demonstrates the potential of DRIFT to be a reliable odometry system in perceptually degraded situations.</p><p>Contrary to Simultaneous Localization and Mapping (SLAM) systems, which focus on achieving globally consistent estimations, this proprioceptive estimator aims at obtaining highaccuracy local estimates without relying on perceptual inputs, serving as a reliable state tracking source for perception and control systems in visually challenging environments <ref type="bibr">[3]</ref><ref type="bibr">[4]</ref><ref type="bibr">[5]</ref><ref type="bibr">[6]</ref>.</p><p>To develop proprioceptive state estimators, one often deals with nonlinear robot dynamics and measurement models. The extended Kalman Filter (EKF) <ref type="bibr">[2,</ref><ref type="bibr">[7]</ref><ref type="bibr">[8]</ref><ref type="bibr">[9]</ref> can be easily modified and has been implemented on various robotic platforms <ref type="bibr">[10]</ref><ref type="bibr">[11]</ref><ref type="bibr">[12]</ref>. To address the linearization error induced in the EKF framework, many researchers have looked into leveraging symmetry <ref type="bibr">[13]</ref> in the filtering framework. In particular, Barrau and Bonnabel <ref type="bibr">[14,</ref><ref type="bibr">15]</ref> developed an invariant observer, coined Invariant extended Kalman Filter (InEKF), by modeling the state evolution on matrix Lie Groups and tracking the error on the corresponding Lie algebra. The theory of the invariant observer design is founded on estimation error being invariant under the action of a matrix Lie Group. Precisely saying, they show that if the propagation model satisfies the group affine property <ref type="bibr">[14]</ref>, it induces a log-linear property of the error, i.e., the nonlinear error can be exactly recovered from the time-varying linear differential equation. The InEKF has been shown to achieve better convergence and consistency in many applications, including SLAM <ref type="bibr">[16]</ref><ref type="bibr">[17]</ref><ref type="bibr">[18]</ref>, legged robot state estimation <ref type="bibr">[13,</ref><ref type="bibr">[19]</ref><ref type="bibr">[20]</ref><ref type="bibr">[21]</ref>, visual-inertial odometry <ref type="bibr">[22,</ref><ref type="bibr">23]</ref>, underwater navigation <ref type="bibr">[24]</ref> and aerial robot navigation <ref type="bibr">[25,</ref><ref type="bibr">26]</ref>.</p><p>Current implementations of the InEKF remain tailored for specific applications, and modifying existing libraries for different robotics applications remains cumbersome. In this work, we propose DRIFT: Dead Reckoning In Field Time, a real-time symmetry-preserving state estimation library that is directly applicable to various robotic platforms, including legged robots, indoor and outdoor wheeled robots, full-size vehicles, and marine robots.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Contributions</head><p>This work evolves from our prior work <ref type="bibr">[27]</ref>, in which we developed the contact estimation module for legged robot state estimation. In this work, we propose a unified framework that works for various robotic platforms. We additionally propose the gyro filter module for low-cost robots. Furthermore, we release our new real-time software, which supports various robots by default and allows easy expansion with new modalities. Lastly, we conduct extensive field experiments to assess the limits of the proposed framework on a variety of platforms, including a full-size offroad vehicle.</p><p>In particular, this work has the following contributions.</p><p>i. An open-source symmetry-preserving robot state estimation library that works in real-time on a variety of robotic platforms. The software can be found in <ref type="url">https://github.com/UMich-CURLY/drift</ref>. ii. A didactic introduction to invariant Kalman filtering. iii. A proprioceptive state estimation framework for dead reckoning that only consumes data from an onboard Inertial Measurement Unit (IMU) and kinematics for the robot, with two optional modules, a contact estimator and a gyro filter for low-cost robots. iv. Real-world experiments on legged robots, indoor and outdoor wheeled robots, full-size vehicles, and simulation results on marine robots to verify the proposed framework.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Outline</head><p>The remainder of this article is organized as follows. Sec. II reviews the literature on different state estimation methods. Sec. III provides some necessary backgrounds for developing an invariant filtering algorithm. Then, a didactic introduction to the invariant Kalman filtering is laid out in Sec. IV. Sec. V details DRIFT's framework and methodology. Sec. VI presents extensive evaluations using different robotic platforms. Limitations and future work ideas are discussed in Sec. VII, and Sec. VIII concludes the article.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. LITERATURE REVIEW</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Symmetry in Robot State Estimation</head><p>Symmetry in Lie Groups has been explored since the 1970s <ref type="bibr">[28]</ref><ref type="bibr">[29]</ref><ref type="bibr">[30]</ref><ref type="bibr">[31]</ref>. Despite some early efforts on state estimation using Lie Groups <ref type="bibr">[32,</ref><ref type="bibr">33]</ref>, the standard extended Kalman filter remained the mainstream method <ref type="bibr">[2,</ref><ref type="bibr">7,</ref><ref type="bibr">9]</ref> for decades due to its simplicity. With the standard EKF, the non-linear state dynamics and the measurement model are repeatedly linearized using the Taylor series at the current estimate <ref type="bibr">[2,</ref><ref type="bibr">34]</ref>. Although intuitive and easy to implement, the standard EKF can suffer from degrading performance when the dynamics become highly nonlinear, and bad initialization of states can cause the filter to diverge. In addition, because the linearization is evaluated at the current estimate, unobservable states can lead to incorrect linearization, causing spurious correlations.</p><p>The standard EKF evolves the state on Euclidean spaces; however, in robotics applications, the state often evolves on a manifold. Alternatively, the state evolution can be modeled using Lie groups, such as quaternion <ref type="bibr">[35,</ref><ref type="bibr">36]</ref> and SO(3) <ref type="bibr">[37,</ref><ref type="bibr">38]</ref>. One of the early symmetry-preserving observers was proposed by Aghannan and Rouchon <ref type="bibr">[39]</ref>, where they designed an invariant observer for Lagrangian systems that is invariant under changes of the configuration coordinate. While inspiring, this method requires heavy modeling of the Lagrangian dynamics and might not be easily transferable to complex robot dynamics.</p><p>Contrary to the standard EKF, which tracks the state directly, the error-state (or indirect) extended Kalman Filter (ErEKF) <ref type="bibr">[40]</ref><ref type="bibr">[41]</ref><ref type="bibr">[42]</ref><ref type="bibr">[43]</ref> tracks the state errors and their evolution. By tracking the error, the dynamics in ErEKF become linear under small error assumptions <ref type="bibr">[40]</ref>. However, despite the near-linear dynamics, the ErEKF still parameterizes the state in Euclidean spaces. To be more specific, the attitude is often represented using Euler angles. These representations contain singularities (i.e., do not cover the entire rotation manifold), and the filter can consequently be trapped in the famous Gimbal lock. The multiplicative (or quaternion) EKF (MEKF) <ref type="bibr">[41,</ref><ref type="bibr">43,</ref><ref type="bibr">44]</ref> addresses this problem by tracking the rotation using quaternions and updating the state using multiplicative actions. The orientation error is modeled as a 3-vector on the tangent space of SO(3) to maintain the nonsingular covariance <ref type="bibr">[41]</ref>. The MEKF has been successfully implemented on many platforms, including aerial vehicles <ref type="bibr">[45]</ref><ref type="bibr">[46]</ref><ref type="bibr">[47]</ref>, legged robots <ref type="bibr">[48,</ref><ref type="bibr">49]</ref>, and marine robots <ref type="bibr">[50]</ref>. Nevertheless, the linearized error propagation of the MEKF still depends on the estimated states, which can degrade the filter performance.</p><p>In invariant observer design, the estimated error is invariant under the action of a matrix Lie group <ref type="bibr">[51]</ref>. This resulted in developing the InEKF <ref type="bibr">[14,</ref><ref type="bibr">15,</ref><ref type="bibr">52,</ref><ref type="bibr">53]</ref> where the state is represented using a matrix Lie group, and the error is tracked in the corresponding Lie algebra. If the system satisfies the "group-affine" property, the estimation error follows a "loglinear" dynamics in the Lie algebra <ref type="bibr">[14,</ref><ref type="bibr">15,</ref><ref type="bibr">53]</ref>. Namely, the system linearization is independent of the state estimates. Because of this invariance property, the InEKF can achieve stronger convergence on many state estimation applications, including SLAM <ref type="bibr">[16]</ref><ref type="bibr">[17]</ref><ref type="bibr">[18]</ref>, legged robot <ref type="bibr">[19]</ref><ref type="bibr">[20]</ref><ref type="bibr">[21]</ref><ref type="bibr">54]</ref>, visualinertial odometry <ref type="bibr">[22,</ref><ref type="bibr">23]</ref>, underwater robots <ref type="bibr">[24]</ref> and aerial vehicles <ref type="bibr">[25,</ref><ref type="bibr">26,</ref><ref type="bibr">55]</ref>. In addition to filtering, some researchers also investigated the potential of symmetry-preserving structures in optimization-based smoothing frameworks <ref type="bibr">[56]</ref><ref type="bibr">[57]</ref><ref type="bibr">[58]</ref><ref type="bibr">[59]</ref>. The invariant smoother framework can perform better than InEKF at the cost of higher computational time <ref type="bibr">[56,</ref><ref type="bibr">58]</ref>.</p><p>More recently, a generalization of the InEKF framework, the Equivariant Filter (EqF), has been proposed by van Goor et al. <ref type="bibr">[60]</ref> and Mahony and Trumpf <ref type="bibr">[61]</ref>. In EqF, the state evolves on a homogeneous space (smooth manifolds with transitive Lie group actions). By finding an appropriate lift, the system dynamics can be realized on symmetry Lie groups. The correction, on the other hand, is derived by linearizing the error kinematics with respect to a fixed origin on the homogeneous space <ref type="bibr">[60]</ref><ref type="bibr">[61]</ref><ref type="bibr">[62]</ref>. Because the EqF does not require the system to be modeled explicitly on Lie groups, it relaxes the group-affine constraint posed in the InEKF and works with any system with equivariant properties. In fact, it can be reduced to the InEKF when the corresponding state evolves on matrix Lie groups and satisfies the group-affine property <ref type="bibr">[61,</ref><ref type="bibr">62]</ref>. Moreover, since the state is modeled on homogeneous spaces, EqF opens up the opportunity to estimate the input bias without breaking the symmetry of the system <ref type="bibr">[63,</ref><ref type="bibr">64]</ref>, which is not feasible in the invariant framework <ref type="bibr">[15,</ref><ref type="bibr">19]</ref>. The EqF demonstrates great potential in different applications <ref type="bibr">[65]</ref><ref type="bibr">[66]</ref><ref type="bibr">[67]</ref> and can be integrated into DRIFT in the future.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Legged Robots</head><p>The controller of legged robots often requires accurate and high-frequency state estimations to maintain the robot's stability. This can be achieved by fusing multiple onboard sensors such as IMU, joint encoders, and cameras.</p><p>1) State Estimation: One simple way to obtain the state of a legged robot is through leg kinematics, known as leg odometry <ref type="bibr">[68]</ref>. By assuming the contact foot to be static in the world frame, one can infer the robot's state using forward kinematic functions and encoder measurements. Leg odometry is simple and has been successfully implemented on many early legged robotic platforms, including CMU Amhler <ref type="bibr">[68]</ref>, RHex hexapod <ref type="bibr">[69]</ref>, and MARLO <ref type="bibr">[70]</ref>. However, it often performs poorly due to noisy encoder data, inaccuracies in kinematic modeling, and foot slip <ref type="bibr">[68]</ref>. This results in noisy velocity and unbounded position and orientation estimates. To address this, Lin et al. <ref type="bibr">[69]</ref> used prior terrain information and three non-colinear contact feet to estimate the instantaneous robot's pose through kinematics. Although achieving better accuracy, this algorithm cannot operate when two or more feet are in the aerial phase. As a result, it significantly hampers any agile movements of the robot.</p><p>Modern legged robots are often equipped with extra sensors like IMU, camera, and LiDAR. Lin et al. <ref type="bibr">[71]</ref> fused leg kinematics with IMU measurements using an EKF for the RHex hexapod. They showed that by fusing the two measurements, the algorithm outperforms methods that only use one modality in isolation. In addition to IMU and leg odometry, researchers also explored fusing different sensor modalities within an EKF framework, including fusing leg odometry with a magnetometer and GPS <ref type="bibr">[12]</ref>, and leg odometry with optical flow <ref type="bibr">[72]</ref>. By fusing multiple sensor measurements, the above methods improve the estimation accuracy of pure leg odometry. However, they still require heavy modeling of the robot's dynamics, which can be cumbersome and inaccurate.</p><p>Bloesch et al. <ref type="bibr">[73]</ref> proposed observability-constrained ErEKF to fuse IMU and kinematic measurements. In this framework, the process model does not depend on the robot or the gait motion. Instead, the readings from an inertial sensor are integrated to obtain the prediction (also known as the IMU strapdown modeling). Hence, the need for complicated dynamic modeling is circumvented <ref type="bibr">[40]</ref>. Another innovation of this work is the state augmentation of the contact foot position. This augmented position serves as a fixed reference position in the world frame. Therefore, the mismatch between new kinematic measurements and the predicted contact state can then be used as the correction model. IMU strapdown has become a common approach for modern state estimators <ref type="bibr">[3,</ref><ref type="bibr">48,</ref><ref type="bibr">74,</ref><ref type="bibr">75]</ref>. Rotella et al. <ref type="bibr">[49]</ref> includes the foot orientation measurements from forward kinematics in an ErEKF for humanoid robots. However, this method only works when the foot does not rotate during the contact phase.</p><p>In the context of optimization-based methods instead of filtering, Hartley et al. <ref type="bibr">[76]</ref> proposed a factor graph formulation to fuse leg odometry with IMU measurements on a bipedal Cassie robot. They introduced new forward kinematics and pre-integrated contact factors to include additional constraints posed by the non-slip contact assumption. Hartley et al. <ref type="bibr">[3]</ref> further incorporated visual information from semi-direct visual odometry (SVO) <ref type="bibr">[77]</ref> to form a visual-inertial-legged smoother. Instead of using SVO as an additional module, Wisth et al. <ref type="bibr">[78]</ref> proposed VILENS (Visual Inertial LEgged Navigation System), which tightly integrates visual features from an RGB-D camera into the cost function. VILENS is implemented on an ANYmal robot and showed improved performance over previous methods. Built upon VILENS, the same group introduced the foot velocity integration factor in <ref type="bibr">[79]</ref>. By integrating the foot velocity instead of position, a slow-varying velocity bias can be inferred from vision to account for foot slip events. Kim et al. <ref type="bibr">[80]</ref> exploited the body velocity estimation from a camera and dropped the need for the contact estimation module. Their method was implemented on an MIT mini cheetah and shown to achieve superior performance on slippery terrains. When only proprioceptive sensors were presented, Kim et al. <ref type="bibr">[58]</ref> proposed a sliprejection mechanism in addition to a proprioceptive smoother framework using an MIT mini cheetah. This method improved the robustness of state estimation even in the absence of camera information. Smoothing approaches have the privilege of leveraging past information during optimization. However, this comes at the cost of higher computational efficiency than filtering, especially during long-horizon experiments, where the past factors could grow unboundedly. Moreover, the above methods did not take advantage of the existing symmetry structure present in the problem, which improves the performance and consistency of the estimator.</p><p>Hartley et al. <ref type="bibr">[19]</ref> introduced the InEKF to the legged robot community by designing a contact-aided InEKF filtering. Similar to the work of Bloesch et al. <ref type="bibr">[73]</ref>, the IMU strapdown model was used for propagation, and the contact states were augmented for correction. However, in this work, the states were tracked on the Lie group SE k (3) <ref type="bibr">[53]</ref>, and the error dynamics evolved in the corresponding Lie algebra. Hartley et al. <ref type="bibr">[19]</ref> showed that the contact-inertial process model satisfies the group-affine property <ref type="bibr">[15]</ref>, and the resulting linearization matrix is independent of the estimated state. The contactaided InEKF demonstrated state-of-the-art performance on a bipedal Cassie robot and has since been modified for different applications <ref type="bibr">[20,</ref><ref type="bibr">21,</ref><ref type="bibr">54,</ref><ref type="bibr">81]</ref>.</p><p>Existing open-sourced libraries legged robot state estimation, include Pronto proposed by Camurri et al. <ref type="bibr">[10]</ref>, EKFbased framework for both exteroceptive and proprioceptive sensor fusion. The framework supports multiple bipedal and quadruped robots. However, Pronto is based on the traditional EKF and the state-dependent linearization could lead to sub-optimal results. Another example is Cerberus proposed by Yang et al. <ref type="bibr">[82]</ref>, which is a factor graph framework for visual-inertial-leg odometry. One key feature of Cerberus is the ability to estimate kinematic parameters online. However, Cerberus relies on visual information for the kinematic parameters estimation, which can lead to degraded performance in visually challenging environments. Moreover, Cerberus is reported to operate at 20 Hz, which might be insufficient for some control algorithms. In addition to the above two libraries, Hartley et al. <ref type="bibr">[19]</ref> also released their source code for contactaided InEKF for Cassie. In DRIFT, we adopt the software by Hartley et al. <ref type="bibr">[19]</ref>, and our previous work <ref type="bibr">[20]</ref>, and enable modular extensions to different modalities.</p><p>2) Contact Estimation: Reliable contact estimation that accurately captures the zero velocity events on foot is of crucial importance for legged robot state estimators. Modelbased approaches segment the touchdown event of robot legs or prosthetic legs by thresholding of the estimated Ground Reaction Force (GRF) from the general equation of motion <ref type="bibr">[83]</ref><ref type="bibr">[84]</ref><ref type="bibr">[85]</ref>. Although this method can detect touchdown events, the estimated GRF is often noisy and unreliable, especially for lightweight robots. De Luca et al. <ref type="bibr">[86]</ref>, Haddadin et al. <ref type="bibr">[87]</ref> proposed a Generalized Momentum (GM) method for detecting contact events on robot manipulators. This GM-based method is, in fact, a filtered version of the work of Focchi et al. <ref type="bibr">[83]</ref>. Although GM-based methods mitigate the noise problem in GRF estimation, an empirical threshold on the cutoff frequency is still required.</p><p>Hwangbo et al. <ref type="bibr">[88]</ref> used a probabilistic representation of the contact state and a Hidden Markov Model (HMM) to fuse the dynamics and kinematics for contact estimation by adopting a Monte-Carlo sampling algorithm to compute the transition model and verifying against GM-based methods. Jenelten et al. <ref type="bibr">[89]</ref> expanded the HMM method and focused on slippage detection. They demonstrate ANYmal <ref type="bibr">[90]</ref>, a quadruped robot, walking stably on slippery ground. The above two methods aim to detect contact as early as possible for the controller to maintain stability; however, we aim to detect contact intervals for state estimation on various terrains.</p><p>Bledt et al. <ref type="bibr">[91]</ref> leveraged the GM-based methods and the probabilistic representation of contact states by using a Kalman filter to fuse the gait phase scheduler information from the controller with the GRF and demonstrate that estimated contacts can assist the controller in reducing the bouncing event upon touchdown. However, this method assumes the leg phase to be periodic as it uses the gait scheduler information in the prediction step of the Kalman filter. It could experience a loss in performance when the phase is heavily violated as the robot interacts with uneven terrains.</p><p>Camurri et al. <ref type="bibr">[92]</ref> used logistic regression to learn the GRF threshold for contact detection. This work compares against heuristic-based thresholding on GRF using a base state estimator. The result shows that the logistic regression classifier can double the performance of the state estimator. However, compared to deep learning methods, the performance of a logistic regression classifier gets saturated as the amount of data increases <ref type="bibr">[93,</ref><ref type="bibr">94]</ref>. Moreover, this method requires a specific training procedure for different gait, loading conditions of the robot, and individual terrain properties.</p><p>Rotella et al. <ref type="bibr">[95]</ref> used a fuzzy C-means clustering for the probability of contacts in all six end-effector degrees of freedom. They integrate the contact estimator with a base state estimator and show their approach performs considerably better than implementations purely based on measured normal force. However, this method assumes contact wrench sensors and additional IMU are available at each end-effector. Furthermore, this method was only tested in simulation. Its performance on real robots remains unknown. Piperakis et al. <ref type="bibr">[96]</ref> proposed an unsupervised learning method for humanoid gait phase estimation by employing Gaussian Mixture Models (GMMs) for clustering and comparing to the ground-truth data and leg odometry. However, this work also assumes the availability of wrench/force sensors at each end-effector, and the clustering result is affected by the gait and data density.</p><p>The above methods either assume the availability of wrench/force sensors or are restricted by the nature of simple regression and are thus unable to generalize to different scenarios. In contrast, our work proposes a multi-modal deep learning-based contact estimator that does not require contact sensors and can generalize well to different gaits and terrain properties. Moreover, as more data becomes available, the network performance can be improved.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Wheeled Robots</head><p>Wheeled robots and autonomous vehicles state estimation problems often treat odometer measurements and dynamics constraints as given, which increases the accuracy and robustness of the localization systems <ref type="bibr">[97,</ref><ref type="bibr">98]</ref>. However, such methods require high computational power to process image flow. The IMU-based algorithms are more universal and approachable for wheeled robots, which can contribute to emergency events detection such as collision and slippage <ref type="bibr">[99,</ref><ref type="bibr">100]</ref>.</p><p>For outdoor wheeled robots, GPS plays an important role in exteroceptive localization which uses multiple satellites to obtain the robot's position in the earth coordinates. Never-theless, the GPS accuracy is limited by radio signal transmission time, and the slow-down effect of water vapor and other particles in the atmosphere counts for the propagation delay <ref type="bibr">[101]</ref>. The occlusions and reflections of and between the buildings, namely multipath fading, will also reduce the GPS localization accuracy. To overcome these limitations, beaconbased methods, such as RFID <ref type="bibr">[102,</ref><ref type="bibr">103]</ref>, QR-code <ref type="bibr">[104,</ref><ref type="bibr">105]</ref>, WiFi or Bluetooth <ref type="bibr">[106]</ref><ref type="bibr">[107]</ref><ref type="bibr">[108]</ref><ref type="bibr">[109]</ref>, and UWB <ref type="bibr">[110,</ref><ref type="bibr">111]</ref>, are used for urban and indoor environments localization.</p><p>In terms of proprioceptive state estimation for four-wheeled chassis robots, typically autonomous vehicles, the measurements are IMU readings, steering wheel angle, and wheel angular velocity, which is the most common sensor configuration for vehicle velocity estimation <ref type="bibr">[112,</ref><ref type="bibr">113]</ref>. Nonlinear observer (NLO), nonlinear unknown input observer (NUIO), and reduced nonlinear observer (RNLO) are proposed for vehicle lateral and longitudinal velocity estimation by Imsland et al. <ref type="bibr">[112,</ref><ref type="bibr">114]</ref>, Guo et al. <ref type="bibr">[115]</ref>. Wenzel et al. <ref type="bibr">[116]</ref> invented the dual extended Kalman filter (DEKF) for combined estimation of vehicle states and vehicle parameters.</p><p>Mobile robot filtering-based localization performance can be increased by augmenting uncertain parameters into the state <ref type="bibr">[117]</ref>. Nevertheless, with model-based filters such as the Kalman filter, the performance highly depends on how accurately the stochastic model can capture the underlying physical process. Kwon et al. <ref type="bibr">[118]</ref> proposed a combined Kalman filter-perturbation estimator (CKF) for wheeled robots, which does not require any uncertainty model except for the noise statistics. The perturbation estimator generates equivalent perturbations based on the nominal state equation and action model of the robot, which is simultaneously corrected by the estimates of perturbation. The perturbation estimator's integral control module leads to a decrease in the localization error.</p><p>A dead-reckoning system has an unavoidable drift over time because, first, the integration of sensors' random walk and, secondly, inaccuracies of the system modeling for the action and observation models. Welte et al. <ref type="bibr">[119]</ref> presented a deadreckoning model that can fuse all common sensors and then merge the redundant ones, increasing the robustness compared with the traditional Zero Velocity Update (ZUPT) method. The work also proposed a method to compensate for the systematic sensor and observation model errors, which relies on the Rauch-Tung-Striebel smoothing method to estimate the robot state and further calibrate the system model parameters.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Marine Robots</head><p>Underwater environments are often featureless with limited visibility, and objects underwater are frequently subject to unexpected external forces from ocean currents or surges. This poses extra challenges for Autonomous Underwater Vehicles (AUVs) state estimation. To make matters worse, some common sensors for ground robots, such as joint encoders and GPS, cannot be directly applied to AUVs. However, many widely used modern underwater localization systems depend on accurate dead reckoning solutions <ref type="bibr">[120]</ref>. We position dead reckoning algorithms such as DRIFT as a submodule of a more complex localization system. Comprehensive reviews on underwater state estimators can be found in <ref type="bibr">[120]</ref><ref type="bibr">[121]</ref><ref type="bibr">[122]</ref>.</p><p>There are several specialized sensors for underwater navigation. Because the water pressure can roughly translate to the depth below sea level, a pressure sensor can be used to obtain depth measurements. In addition to pressure sensing, acoustic sensors such as Doppler Velocity Logs (DVL) <ref type="bibr">[123]</ref> also play a big role in underwater navigation. A DVL can provide ground-referenced velocity measurements. This is achieved by emitting directional acoustic beams to the seabed and using the Doppler effect on the reflected acoustic signals. Unlike the above sensors that are mounted directly on the robot, transponder systems <ref type="bibr">[124]</ref>, such as a Long Baseline system (LBL), use floating beacons to provide positioning measurements from triangulation. Although transponder systems allow direct positioning measurements, they require additional beacons to be deployed, limiting the operation range of AUVs.</p><p>EKF has been implemented on multiple AUVs due to its simplicity <ref type="bibr">[11,</ref><ref type="bibr">125,</ref><ref type="bibr">126]</ref>. Karras et al. <ref type="bibr">[127]</ref> proposed a Dual Unscented Kalman Filter (DUKF) framework to fuse measurements from a laser sensor and an IMU. The filter jointly estimated the 6D pose of the robot and its dynamic parameters. The DUKF was still sensitive to initialization, and a specific closed-loop control scheme was required during initialization. Instead of online optimization of the dynamic parameters, Hegrenaes and Hallingstad <ref type="bibr">[128]</ref> formulated a more accurate vehicle model in a model-aided EKF framework. The kinetic model was integrated to provide velocity measurements for filter correction. Sea current velocity was treated as a slowly varying bias and was estimated along with the robot's state. The method was evaluated using a HUGIN 4500 AUV on various field tests. The IMU strapdown modeling was also adopted for marine robotic applications to circumvent complex modeling of the vehicle. Using an ErEKF framework, Miller et al. <ref type="bibr">[129]</ref> fused multiple sensors, including an IMU, an LBL, a DVL, a pressure sensor, and an attitude sensor for UAV state estimation. Additional calibration parameters, such as the speed of sound in seawater, were also estimated to reduce systematic errors. This framework was shown to be robust to sporadic sensor failures. However, a prolonged failure of the DVL or LBL can degrade the performance, and the requirement of LBL still limits the operation range.</p><p>Potokar et al. <ref type="bibr">[24]</ref> used an InEKF to estimate the 6D pose of an AUV. The velocity readings from a DVL, and depth measurements from a pressure sensor were fused with the IMU strapdown model. Because the depth measurements do not obey the left-invariant formulation, which we will discuss later in Sec. III, pseudo measurements with infinity covariances were introduced to allow the incorporation of such measurements. This framework was evaluated in a simulation environment and was shown to have better convergence over the MEKF. The algorithm was implemented in Python and was demonstrated to have real-time performance. DRIFT is a unified state estimator that directly supports multiple robots. Modular implementation in C++ enables faster computation on resourced-constrained devices and allows easy expansion with new modalities. In addition, for convenience, we provide a ROS wrapper for real-time communication on robots.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. PRELIMINARIES AND NOTATION</head><p>Matrix Lie groups <ref type="bibr">[2,</ref><ref type="bibr">130,</ref><ref type="bibr">131]</ref> provide natural (exponential) coordinates that exploits symmetries of the space <ref type="bibr">[61,</ref><ref type="bibr">[132]</ref><ref type="bibr">[133]</ref><ref type="bibr">[134]</ref><ref type="bibr">[135]</ref><ref type="bibr">[136]</ref>. State estimation is the problem of determining a robot's position, orientation, and velocity that are vital for robot control <ref type="bibr">[2]</ref>. An interesting class of state estimators that can be run at high frequency, e.g., 2 kHz, are based on InEKF <ref type="bibr">[14,</ref><ref type="bibr">15,</ref><ref type="bibr">19,</ref><ref type="bibr">53]</ref>. The theory of invariant observer design is based on the estimation error being invariant under the action of a matrix Lie group. The fundamental result is that by correct parametrization of the error variable, a wide range of nonlinear problems can lead to log-linear error dynamics <ref type="bibr">[14,</ref><ref type="bibr">51,</ref><ref type="bibr">53]</ref>.</p><p>Consider a deterministic Linear Time-Invariant (LTI) process model &#7819; = Ax + Bu. Let x be an estimate of x, i.e., &#7819; = Ax + Bu. Define the error e := x -x. Then &#279; = &#7819; -&#7819; = A(x -x) = Ae is an autonomous differential equation. Given an initial condition e(0) = e 0 , we can solve for the error at any time via e(t) = exp(At)e 0 . In other words, error propagation is independent of the system trajectory, i.e., state estimate. The Invariant EKF <ref type="bibr">[14]</ref> generalizes this observation to linear-exponential systems, e.g., rigid body systems, using the framework of matrix Lie groups.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Process Dynamics on Lie Groups</head><p>A process dynamics evolving on the Lie group G, for state</p><p>Xt denotes an estimate of the state. The state estimation error is defined using right or left multiplication of X -1 t . Definition 1 (Left and Right Invariant Error). The right-and left-invariant errors between two trajectories X t and Xt are:</p><p>where L &#8712; G is an arbitrary element of the group.</p><p>Theorem 1 (Autonomous Error Dynamics <ref type="bibr">[14]</ref>). A system is group affine if the dynamics, f ut (&#8226;), satisfies:</p><p>for all t &gt; 0 and X 1 , X 2 &#8712; G. Furthermore, if this condition is satisfied, the right-and left-invariant error dynamics are trajectory-independent and satisfy:</p><p>I denotes the group identity element (I n for n &#215; n identity matrix). Define A t to be a dimg &#215; dimg matrix, where g is the Lie algebra (tangent space at the identity) of G, satisfying</p><p>For all t &#8805; 0, let &#958; t &#8712; R dimg be the solution of the linear differential equation</p><p>Theorem 2 (Log-Linear Property of the Error <ref type="bibr">[14]</ref>). Consider the right-invariant error, &#951; t , between two trajectories (possibly far apart). For arbitrary initial error &#958; 0 &#8712; R dimg , if &#951; 0 = exp(&#958; 0 ), then for all t &#8805; 0, &#951; t = exp(&#958; t ); that is, the nonlinear estimation error &#951; t can be exactly recovered from the time-varying linear differential equation.</p><p>For an example of the difference with Euler angle parametrization, see Hartley et al. <ref type="bibr">[19,</ref><ref type="bibr">Sec. 4</ref>].</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Associated Noisy System</head><p>A noisy process dynamics evolving on the Lie group take the following form.</p><p>where w &#8743; t &#8712; g is a continuous white noise whose covariance matrix is denoted by Q t . Then the equivalent noisy error dynamics can be shown to be</p><p>where for every X &#8712; G, the adjoint action, Ad X : g &#8594; g, is a Lie algebra isomorphism that enables change of frames (matrix similarity).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Invariant Observation Model</head><p>During the correction step of a Kalman filter, the error is updated using incoming sensor measurements. If observations take a particular form, then the linearized observation model and the innovation will also be autonomous. This happens when the measurement, Y t k , can be written as either</p><p>b is a constant vector and V t k is a vector of Gaussian noise.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. INVARIANT KALMAN FILTERING</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Left-Invariant Extended Kalman Filter</head><p>We now derive the Left-Invariant Extended Kalman Filter (LI-EKF). In particular, the filter tracks the state mean and covariance as parameters and corrects the error via a "linear" update rule. As we will see, the innovation is in the Lie algebra, and the update rule follows the group operation for integration.</p><p>1) Propagation: In the propagation (or prediction) step, the mean can be directly computed using the process model. However, for the covariance propagation, we use the leftinvariant log-linear error dynamics. As such, the state's mean evolves on the group while the covariance is tracked in the Lie algebra.</p><p>2) Update: In the update (or correction) step, we use &#951; = exp(&#958;) &#8776; I + &#958; &#8743; and neglect the higher order terms to derive the linear update error equation as follows.</p><p>To arrange all terms according to the vector form of &#958;, define the measurement Jacobian, H, such that H&#958; = &#958; &#8743; b . Then it follows that</p><p>where Nk := X-1</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>3) LI-EKF Result:</head><p>To summarize, we have the following two steps (as usual for a Kalman filter).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1) LI-EKF Propagation:</head><p>2) LI-EKF Update:</p><p>where</p><p>Given these equations, once we know A l t and H matrices, we can implement the LI-EKF.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Right-Invariant Extended Kalman Filter</head><p>Next, we derive the Right-Invariant Extended Kalman Filter (RI-EKF) similarly.</p><p>1) Propagation: In the propagation step, the mean can be directly computed using the process model. However, for the covariance propagation, we use the right-invariant log-linear error dynamics. As such, the state's mean evolves on the group while the covariance is tracked in the Lie algebra.</p><p>2) Update: We use &#951; = exp(&#958;) &#8776; I + &#958; &#8743; and neglect the higher order terms to derive the linear update error equation as follows.</p><p>To arrange all terms according to the vector form of &#958;, define the measurement Jacobian, H, such that H&#958; = -&#958; &#8743; b . Then it follows that</p><p>where Nk := Xt k Cov(V k ) XT t k .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>3) RI-EKF Result:</head><p>To summarize, we have the following two steps.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1) RI-EKF Propagation:</head><p>2) RI-EKF Update:</p><p>where</p><p>Given these equations, once we know A r t and H matrices, we can implement the RI-EKF.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Switching Between Left and Right-Invariant Errors</head><p>We can switch between the left and right error forms through the use of the adjoint map. Remark 1. This transformation is exact, which means that we can easily switch between the covariance of the left and right invariant errors using P r t = Ad Xt P l t Ad T Xt .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. DRIFT: A SYMMETRY-PRESERVING ROBOT STATE ESTIMATION LIBRARY</head><p>DRIFT can be used as a standalone C++ program, and we additionally provide an optional Robot Operating System (ROS) wrapper for easy communication between programs. The library can also be easily expanded with new sensor modalities. In addition, we provide two additional modules, a contact estimator and a gyro filter, to enable the proposed framework to be applied to low-cost robotic platforms. Fig. <ref type="figure">2</ref> shows the architecture implemented in this work. The proposed framework can be seamlessly extended to other existing and emerging symmetry-preserving filters in the future.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. State Representation</head><p>In most robotics applications, we are interested in tracking the orientation, velocity, and position of the robot with respect to a fixed world frame. We denote R t &#8712; SO(3), the rotation matrix that takes the coordinate frame from the body to the world, v t &#8712; R 3 , and p t &#8712; R 3 as the body velocity and position in the world frame. Following Barrau <ref type="bibr">[53]</ref>, we can define our state as a direct isometries group X t &#8712; SE l+2 (3):</p><p>where d lt &#8712; R 3 is the augmented vectors, e.g., foot contact position in the world frame for the legged robot or landmark positions. In the minimal setup for wheeled or marine robots, the navigation state simply reduces to SE 2 (3).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. IMU Propagation</head><p>An IMU measures the angular velocity and the linear acceleration in the body frame. By integrating the measurements from an IMU, we can predict the robot's motion, enabling filter propagation without dealing with complicated dynamics <ref type="bibr">[19,</ref><ref type="bibr">40]</ref>.</p><p>In DRIFT, we model the IMU as corrupted by additive white Gaussian noise.</p><p>where, GP represents a Gaussian process and &#948;(t -t &#8242; ) is the Dirac delta function. With this, we define the input u t to the system as: u t := &#969;T t , &#227;T t T .</p><p>1) Continuous Dynamics: We can write down the continuous dynamic process model:</p><p>) Here, (&#8226;) &#215; denotes a 3 &#215; 3 skew-symmetric matrix, and g is the known gravity vector.</p><p>For legged robots, we assume the foot contact position in the world frame remains constant during the same contact period. As a result, the dynamics is only affected by the rotated white Gaussian noise:</p><p>where qt is the encoder measurements, and h R (q t ) is the orientation of the contact frame in the IMU (body) frame calculated from forward kinematics. For the sake of readability, we derive the propagation model with one contact augmentation in the following derivation. The propagation model for robots without foot contacts can be easily deduced by removing the corresponding column and row for the contact position.</p><p>Collecting the above equations into the matrix form and separating the noise term, we can obtain the deterministic dynamic function f ut (&#8226;):</p><p>We can verify that f ut (&#8226;) satisfies (3) and thus is group-affine, and from Theorem 1, the right-invariant error dynamics is given by (4). Using g ut (exp(&#958;)) := (A t &#958;) &#8743; + O(&#8741;&#958;&#8741; 2 ) from Theorem 1 and following the derivation from Sec. IV-B, we can obtain the state-independent linearized error dynamics:</p><p>In summary, with f ut (&#8226;), A r t , and Q t = Cov(w t ), the IMU propagation can then be computed using <ref type="bibr">(17)</ref>.</p><p>2) Integration: In practice, IMU provides discrete samplings of the continuous inputs. We assume the inputs to be constant between two timestamps and perform the propagation by integrating between time t k and t k+1 :</p><p>The exponential map and the integration for SO(3) has closedform solutions <ref type="bibr">[19,</ref><ref type="bibr">43]</ref>, leading to</p><p>with</p><p>We can propagate the covariance using a discretized state transition matrix &#934; r = exp(A r &#8710;t) and</p><p>3) Bias Augmentation: In practice, IMU is often corrupted by slow-varying biases. To compensate for this, augmenting the biases as additional states in the system is common. However, no existing matrix Lie group can describe this additional state with the pose without violating the groupaffine property <ref type="bibr">[53]</ref>. Adding IMU biases breaks the symmetry and makes the linearization matrix dependent on the estimated states <ref type="bibr">[15,</ref><ref type="bibr">19]</ref>. Nevertheless, this imperfect InEKF still outperforms traditional approaches like EKF <ref type="bibr">[19]</ref>.</p><p>To account for the biases, we model the IMU as being corrupted by additive low-frequency signals and white Gaussian noise:</p><p>The system's state then becomes a tuple of the state defined in <ref type="bibr">(21)</ref> and the augmented biases:</p><p>The new augmented state has a right-invariant error in the form of: e r t = ( Xt X -1 t , &#952;t -&#952; t ) := (&#951; t , &#950; t ). With the bias augmented, we can obtain the continuous system dynamics:</p><p>The contact position is not affected by biases. Therefore, for legged robots, we have the same contact dynamics as <ref type="bibr">(24)</ref>. Since the biases are slow-varying signals, we can model their dynamics using random walks. That is, white Gaussian noises govern the dynamics as follows.</p><p>With the new continuous dynamics, we can obtain our linearized error dynamics following Section IV-B:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Velocity Correction</head><p>Body velocity measurements provide a generic correction model that can work on any robotic platform. Specifically, the filer requires the ground-referenced body velocity <ref type="bibr">[21,</ref><ref type="bibr">24]</ref>. For some robotic applications, ground-referenced body velocity can be directly measured from the sensors, such as DVL for marine robots. However, accurate body velocity measurement is not always readily available. For wheeled robots with no lateral movements, one practical solution is to use the encoder measurements along with the nonholonomic constraints (i.e., velocity constraints that cannot be integrated) to form pseudo measurements for correction:</p><p>where qr and ql are angular velocity readings from the wheel encoders and r is the wheel radius. The nonholonomic constraints provide the pseudo measurements in the y and z components under the assumption that the robot does not drift side-way nor jump up vertically. 1) Correction Model: In DRIFT, the velocity correction model is implemented independently of the velocity source. This allows the library to be generic and to be applied to any robot with any source of velocity measurements. Here, we again assume the velocity measurements to be corrupted by white Gaussian noise:</p><p>The body velocity is measured in the body frame. As a result, a right-invariant measurement model is used here.</p><p>We proceed to find H as follows.</p><p>Then, we can perform the update step using <ref type="bibr">(18)</ref>, where after eliminating zeros H = 0 1,3 I 0 , and</p><p>2) Observability Analysis: Theorem 2 reveals the log-linear property of the error dynamics. Consequently, one can perform linear observability analysis using the linear error dynamic matrix <ref type="bibr">[53]</ref>, which, in our case, is time-invariant and nilpotent (with a degree of three):</p><p>Accordingly, the observability matrix becomes:</p><p>The roll and pitch are observable, as well as the velocity. However, since gravity only contains a value in the z axis, the yaw angle is not observable.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Contact Correction</head><p>Foot contacts provide additional constraints for legged robot state estimation by assuming the contact foot velocity in the world frame to be zero over the contact period. This allows the estimator to observe the body velocity from the forward kinematics models. In fact, this has become a core module for many modern legged robot state estimation algorithms <ref type="bibr">[10,</ref><ref type="bibr">19,</ref><ref type="bibr">73]</ref>. In DRIFT, we integrate the contact-inertial process model for legged robots proposed by Hartley et al. <ref type="bibr">[19]</ref>. We assume the encoder measurements to be affected by zero-mean white Gaussian noise:</p><p>1) Contact State Augmentation: When a contact point is detected, we append it to the state using the forward kinematics and the current position estimate:</p><p>where h p (&#8226;) is the forward kinematics function that maps the encoder measurements to the foot position in the body frame. The corresponding covariance can be augmented using</p><p>The augmented contact state remains in the state throughout the contact phase. When the contact constraint breaks (i.e., when the foot is lifted), the contact state is marginalized via</p><p>2) Correction Model: Once the foot enters the contact phase, the augmented contact position follows <ref type="bibr">(24)</ref> in the propagation step. When a new encoder measurement is obtained, the right-invariant correction model is given by</p><p>The linearized measurement matrix H and the noise matrix N are as follows.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E. Contact Estimator</head><p>The contact correction model provides a means to correct the predicted state reliably. However, obtaining reliable contact estimation is often challenging, and false contact detection can introduce additional biases into the system. Moreover, for some low-cost legged robots, foot contact sensors are often not readily available. In the spirit of making the proposed framework more generalizable to different robotic platforms, we propose an additional contact detection module for robots without dedicated contact sensors <ref type="bibr">[27]</ref>.</p><p>The proposed contact estimator is a self-supervised lightweight neural network that only takes measurements from an IMU and joint encoders as input. The neural network can operate at 830 Hz on an NVIDIA Jetson AGX Xavier, which can be easily attached to existing robots.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1) Contact State Representation:</head><p>To design the network, we define our contact state C as a vector of binary values:</p><p>for bipedal robots where c l &#8712; {0, 1} for each leg l, with 1 indicate a firm contact and 0 as no contact.</p><p>Depending on the robot's motion, the contact states can have a finite set of combinations, from all feet on the ground to all in the air. As a result, we can address the problem using a classification framework. That is, for each timestamp, we estimate which foot configurations are posed by the robot. In order to follow typical classification pipelines, we map our contact state vector c to an integer S by treating c as a binary value and using binary-to-decimal conversion as the function. For quadruped robot, S &#8712; {0, 1, . . . , 15}, and for bipedal, S &#8712; {0, 1, 2, 3}.</p><p>2) Input Data:</p><p>The contact estimator depends solely on proprioceptive measurements, including joint encoders, IMU, and kinematics. For a synchronized time n, we concatenated the above measurements as z n = q n qn a n &#969; n d ln &#7691;ln . In order for the network to incorporate temporal information, we append previous w -1 measurements to the current data:</p><p>. . . z T n T . Therefore, at each time n, the network takes in a 2D matrix D n and estimates the current contact configuration S n .</p><p>3) Network Structure: The network comprises 2 convolutional blocks and 3 fully connected layers. Each convolutional block consists of 2 one-dimensional convolutions and a one-dimensional max pooling. We choose a one-dimensional kernel to improve efficiency and reduce memory usage. For nonlinearity, we employ ReLU as the activation function. To prevent the network from overfitting, we add a dropout mechanism at the end of the second convolutional block.</p><p>With the deep features extracted from the convolutional blocks, the fully connected layers convert them into the contact configuration S n . Again, to prevent the network from overfitting, we employ dropout mechanisms for the first 2 fully connected layers. Finally, the cross-entropy loss is applied for the classification task:</p><p>where P j is the probability output from the network of state j, and P i is the probability of the ground truth state.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>F. Gyro Filter</head><p>Low-cost IMUs can deteriorate performance. This is especially worse for states that are not observable, such as the yaw angle. To mitigate this issue, we propose an additional gyro filter that fuses angular velocity measurements from different sources, such as additional IMUs or kinematic models. The gyro filter is a linear Kalman filter, with its state defined as:</p><p>where &#969; is a 3-vector representing the angular velocity in the base IMU frame, and b g denotes the corresponding biases.</p><p>1) Propagation: In the propagation step, we assume the measurement biases to be slowing-varying quantities, which remain the same between two timestamps. As a result, we can formulate an integration model by taking the differences between two consecutive measurements and canceling out the biases as:</p><p>Here, &#969;&#945; denotes the angular velocity measurements from one sensing source, P k is the filter covariance, and Q k is the process noise covariance.</p><p>2) Correction: A second source of angular velocity measurements, &#969;&#946; , can be used to correct the filter using a linear measurement model:</p><p>The corresponding H matrix is I 3&#215;3 I 3&#215;3 if the measurement is biased, and I 3&#215;3 0 3&#215;3 if the measurement is unbiased. With this model, the filter can be corrected following the standard EKF correction algorithm <ref type="bibr">[2]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. EXPERIMENTAL RESULTS</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Contact Estimation</head><p>To train the proposed contact estimator, we create opensource contact data sets using an MIT Mini Cheetah robot <ref type="bibr">[137]</ref>. We record all proprioceptive sensor measurements, including joint encoders data, foot position and velocities, IMU measurements, and estimated joint torques from an MIT controller <ref type="bibr">[138]</ref>. All data are upsampled to 1000 Hz to match the IMU frequency. The data sets are created over 8 different terrains, including asphalt roads, concrete, forest, grass, middle pebbles, small pebbles, rock road, and sidewalks. In addition to the above terrains, several sequences of the robot walking in the air (i.e., not having contact with the ground while operating the same gait.) are recorded to provide negative examples to the network. We collect around 1,000,000 data points and reserve some sequences for state estimation tests. The rest of the data sets are separated into testing, validation, and training sets with the ratio of 15%, 15%, and 70%. Fig. <ref type="figure">3</ref> shows examples of the different terrain types in the contact data sets.</p><p>The ground truth labels are generated using an offline algorithm, which takes the robot's foot height in the hip frame as input. The algorithm applies a low-pass filter and uses future and past data points around the current time stamp to extract local minima and maxima. Moreover, we observe a bouncing effect on the robot's foot upon touchdown after inspecting slow-motion videos of Mini Cheetah's walking patterns. The bouncing results in a sudden change of foot height in the signal, causing false positives which can be removed by applying the low-pass filter.</p><p>We present the accuracy, false positive rate (FPR), and false negative rate (FNR) of the proposed method against other baselines. For baselines, we implement a model-based approach <ref type="bibr">[83]</ref><ref type="bibr">[84]</ref><ref type="bibr">[85]</ref>, where the estimated ground reaction force (GRF) is computed via the general equation of motion with TABLE I ACCURACY COMPARISON AGAINST BASELINES. Sequence Method % Accuracy % False Positive Rate % False Negative Rate Leg RF Leg LF Leg RH Leg LH Leg Avg Leg Avg Leg Avg Concrete Test Sequence GRF Thresholding 73.43 70.02 71.69 70.04 71.30 37.07 13.24 Gait Cycle 85.66 84.98 84.68 85.11 85.11 22.95 0.00 Proposed Method 98.34 97.87 97.95 98.56 98.18 1.45 2.51 Grass Test Sequence GRF Thresholding 82.55 78.93 84.62 82.48 82.14 26.87 0.63 Gait Cycle 92.41 92.38 91.04 90.55 91.59 10.95 3.53 Proposed Method 98.08 97.57 97.73 97.73 97.78 2.35 1.98 Forest Test Sequence GRF Thresholding 80.99 80.09 82.75 83.24 81.77 26.54 1.84 Gait Cycle 83.03 82.56 84.44 84.28 83.58 24.71 0.08 Proposed Method 97.05 96.62 97.24 97.40 97.08 2.82 3.12</p><p>a low-pass filter. A fixed threshold is set to the estimated GRF to detect the contact events. We would like to highlight that because there is no direct access to the motor current on the Mini Cheetah robot, we use the torque command from the controller to approximate the actual torque on the actuators. We denote this method as GRF thresholding. In addition, we also obtain the gait cycle command from the MIT controller <ref type="bibr">[138]</ref> to serve as a second contact estimation baseline.</p><p>Table <ref type="table">I</ref> lists the accuracy, FPR, and FNR of the compared methods on three test sequences. We can see that the proposed method achieves the highest accuracy across all sequences and the GRF thresholding has the worst performance. Although the baselines obtain slightly lower FNRs, our method maintains significantly lower FPRs. Lower FPR is crucial for state estimation tasks as false positive contact events can introduce biased measurements into the system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Legged Robot</head><p>We present the state estimation results using DRIFT with the proposed contact estimator for the Mini Cheetah robot. The robot walks on an outdoor grassy field, where we use a motion capture system to obtain the ground truth pose. We deploy the proposed contact estimator, as well as the two baseline contact estimation methods with DRIFT. Fig. <ref type="figure">4</ref> shows the estimated trajectory. Qualitatively, DRIFT with the proposed contact estimator produces a trajectory closer to the ground truth. In addition, the two baseline methods produce extra height (z) drifts. We argue this is the result of the high false positive rates of the baseline methods. This experiment confirms DRIFT's support for accurate legged robot state estimation. Moreover, the contact estimation module allows robots without contact sensors, such as the MIT Mini Cheetah, to obtain reliable contact estimation results, which are essential to contact-aided state estimation algorithms.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Indoor Wheeled Robot</head><p>For indoor applications, we deploy DRIFT on a Fetch robot. A picture of the Fetch robot is shown in Fig. <ref type="figure">5</ref>. Fetch is equipped with a low-cost IMU and wheel encoders on each wheel, which we use as the input to DRIFT. We collected data sets in a lab environment, where Fetch drove on a polished concrete surface for several sequences. We obtained the ground truth poses from a motion capture system. We additionally implement a MEKF <ref type="bibr">[43]</ref> to serve as a baseline algorithm. Both DRIFT and MEKF take only IMU and encoder measurements as input and estimate the 6D pose of the robot. The lowcost IMU on Fetch can cause the unobservable yaw angle to degrade and further reduce the accuracy of the overall state estimation. As a result, we implemented an additional gyro filter described in Sec. V-F to fuse the yaw angular velocity measurement from the IMU with the encoder measurements.</p><p>Fig. <ref type="figure">6</ref> shows a top-down view of the estimated trajectory on the 01_square sequence. We observe that because of the inaccurate yaw angular velocity measurements, both DRIFT and MEKF experience significant drift after turning. However, with the help of the gyro filter, we can obtain a much better Fig. <ref type="figure">4</ref>. The estimated trajectories from DRIFT using different contact estimation methods on the Mini Cheetah data set. The robot walks on a grassy field with a motion capture system, which is used for ground truth capturing. With the proposed contact estimator, DRIFT produces the best trajectory. Contrarily, the two baseline methods introduce significant drift in the height (z) axis. This figure is generated with the aid of the Python package evo <ref type="bibr">[139]</ref>.</p><p>TABLE II THE RMSE OF RELATIVE POSE ERROR (RPE) FOR FETCH. WE REPORT THE RPE IN THE UNIT OF DRIFT PER METER. 01&#729;square 02&#729;um 03&#729;eight 04&#729;random 05&#729;right&#729;turn 06&#729;left&#729;turn 07&#729;origin Avg Std Trajectory 10.59 14.15 11.61 38.91 15.64 22.29 11.26 17.78 N/A Length (m) Duration (sec) 43.720 49.62 34.62 99.06 39.72 51.71 46.68 52.16 N/A MEKF [43] Trans. (m/m) 0.1012 0.0634 0.0855 0.0990 0.0853 0.0435 0.1126 0.0844 0.0182 Rot. (&#176;/m) 1.0555 5.5752 3.8387 6.9809 3.9911 1.6304 2.4502 3.6460 1.1016 DRIFT Trans. (m/m) 0.0688 0.0587 0.0629 0.0863 0.0857 0.0436 0.0786 0.0692 0.0156 Rot. (&#176;/m) 1.9766 5.3704 3.6134 6.6857 3.6606 1.5580 2.4741 3.6198 1.8587 DRIFT Trans. (m/m) 0.0565 0.0585 0.0566 0.0801 0.0613 0.0438 0.0558 0.0590 0.0127 (Gyro Filter) Rot. (&#176;/m) 1.5723 5.3865 3.8561 6.7959 1.9193 1.9246 3.4866 3.5631 1.2172 trajectory estimation. In addition, we notice that the estimated trajectory from DRIFT is much smoother than MEKF.</p><p>Since DRIFT is an odometry framework instead of a full SLAM system, we follow Sturm et al. <ref type="bibr">[140]</ref> and report the root-mean-square error (RMSE) of the Relative Pose Error (RPE), in the unit of drift per meter in Table <ref type="table">II</ref>. DRIFT performs consistently better than the baseline. With the aid of the gyro filter, the position estimation is further improved, which likely results from a more accurate yaw angle estimation. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Outdoor Wheeled Robot</head><p>We deploy DRIFT on a Clearpath Robotics Husky robot. Husky is a four-wheeled differential-drive robot, as shown in Fig. <ref type="figure">1</ref>. The robot is equipped with two VectorNav VN-100 IMUs and wheel encoders on each side. We conducted both short and long-horizon experiments using the Husky robot.</p><p>TABLE III THE RMSE OF RELATIVE POSE ERROR (RPE) FOR HUSKY SHORT-HORIZON EXPERIMENT. WE REPORT THE RPE IN THE UNIT OF DRIFT PER METER. 1&#729;rectangle 2&#729;batman 3&#729;broken&#729;cat 4&#729;um 5&#729;type&#729;z 6&#729;rocker 7&#729;peace 8&#729;cheetah 9&#729;m Avg Std Trajectory 20.56 38.47 90.43 39.11 59.89 39.46 40.56 50.37 63.72 49.17 N/A Length (m) Duration (sec) 49.78 77.92 154.57 67.09 90.48 67.49 66.50 92.01 100.07 85.10 N/A MEKF [43] Trans. (m/m) 0.0477 0.0812 0.1139 0.0625 0.0652 0.0680 0.0749 0.0829 0.0755 0.0747 0.0384 Rot. (&#176;/m) 1.3738 4.8355 2.5033 1.6061 1.9539 1.6396 1.4865 1.6416 1.3962 2.0485 1.0650 DRIFT Trans. (m/m) 0.0482 0.0781 0.0881 0.0564 0.0678 0.0627 0.0728 0.0817 0.0755 0.0701 0.0351 Rot. (&#176;/m) 1.0152 4.8559 1.7097 1.3381 1.3089 1.3126 1.1606 1.6321 0.8657 1.6888 1.2657 The short-horizon data were collected on an outdoor grassy field with a motion capture system, which we used to obtain ground truth poses. For the long-horizon experiment, we drove the robot on the sidewalk for 55 minutes. The total trajectory path is around 3 kilometers. Since no motion capture system is possible for such large-scale experiments, we used the GPS signals as a proxy for the ground truth trajectory. To better assess the performance of DRIFT, we again used a MEKF <ref type="bibr">[43]</ref> as the baseline algorithm.</p><p>Table <ref type="table">III</ref> reports the RPE of the short-horizon experiment. On average, DRIFT performs better than the MEKF. This is expected as both the state-transition matrix A and linearized measurement matrix H of the InEKF are independent of the estimated states, while the corresponding matrices in the MEKF depend on the estimated states.</p><p>The benefit of using InEKF becomes even more obvious in long-horizon operations. Fig. <ref type="figure">1</ref> and <ref type="figure">7</ref> show the bird's-eye view of the estimated trajectory for the long-horizon experiment. We can see DRIFT performs significantly better than MEKF, with the final drift in the xy plane to be 71.41 m and 166.40 m, respectively. We highlight that DRIFT uses proprioceptive data only. This demonstrates DRIFT's capability of being a reliable odometry source for visual-SLAM systems in perceptually degraded environments, even for long-horizon operations.</p><p>We also compare the estimated velocity in the world frame  with the ground truth velocity in Fig. <ref type="figure">8</ref>. The ground truth velocity is obtained by differentiating the ground truth pose from the motion capture system. We see the estimated velocity converges to the true velocity, which agrees with the observability analysis in Sec. V.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E. Off-road Vehicle</head><p>DRIFT also supports full-size vehicles with a single shaft encoder. We evaluate DRIFT on data gathered with the support of Neya Systems from their full-size military test vehicle, the Polaris MRZR. The data was collected with support from Neya Systems at two of their off-road test facilities. As shown in Fig. <ref type="figure">9</ref>, the test facility is a featureless forest area with leafy ground. In this experiment, we obtain GPS signals as approximations to the ground truth positions.</p><p>Compared to a motion capture system, GPS provides less accurate positional signals. As a result, we compute the final drift and drift percentage in Table IV instead of the RPE. We define the drift percentage as Final Drift/Trajectory Length. From the table, we see that DRIFT achieves low drift percentages across the three sequences and outperforms the MEKF. Fig. <ref type="figure">10</ref> shows the bird's-eye view of two off-road sequences, with trajectory lengths of 1.48 km and 1.26 km. DRIFT performs significantly better than MEKF in both sequences. This experiment demonstrates that with only an IMU and shaft encoder as inputs, DRIFT can produce highly accurate odometry results for full-size vehicles in off-road environments. These off-road environments are often featureless or with repetitive features. Since DRIFT only relies on proprioceptive measurements, it can serve as a reliable odometry source for perception systems in such scenarios.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>F. Marine Robot</head><p>For marine robots, ground-reference velocity can be obtained via a DVL. The DVL is an acoustic sensor that emits directional acoustic beams to the seabed. The seabedreferenced velocity is then computed using the Doppler effect of the reflected acoustic signals. This enables us to correct the predicted state using the velocity correction method described in Sec. V-C.</p><p>To verify the efficacy of the proposed framework on marine robots, we set up experiments in an open-sourced marine simulation software, Stonefish <ref type="bibr">[141]</ref>. An IQUA Robotics Girona500 Autonomous Underwater Vehicle (AUV) was simulated with an IMU and a DVL attached to the robot, as shown in Fig. <ref type="figure">11</ref>. The sensor data is corrupted by unbiased Gaussian noises. We used the noise parameters from the VectorNav VN-100 IMU manual to assign the simulation noise parameters. Specifically, we applied &#963; = &#961;&#8226; &#8730; f s to recover the standard deviation, &#963;, from the noise density, &#961;, and operating frequency, f s . Table V lists the noise parameters used in the simulation. We set the IMU and DVL frequencies to 200 Hz and 20 Hz.</p><p>Fig. <ref type="figure">11</ref> shows the estimated trajectories from the underwater data set. Qualitatively, we see both DRIFT and MEKF perform well on this sequence. In particular, DRIFT performs slightly better in position estimation in the x and y axes, as well as the roll and pitch estimation. Moreover, the roll and pitch estimations converge to the true values after a perturbation around 36 sec, which agrees with our observability analysis in Sec. V-C.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>G. Runtime Analysis</head><p>We perform runtime evaluations using a personal laptop with an Intel i5-11400H CPU and an NVIDIA Jetson AGX Xavier (CPU). The Jetson AGX Xavier is commonly used by many robotic platforms. We record the processing time for every propagation and correction method in the InEKF filter and the gyro filter, as presented in Table <ref type="table">VI</ref>. DRIFT can operate at an extremely high frequency using CPU-only computation, even on the resourced-constrained Jetson AGX</p><p>TABLE IV FINAL DRIFT AND DRIFT PERCENTAGE OF THE OFF-ROAD VEHICLE DATA SET, WHERE THE FULL-SIZE VEHICLE OPERATES IN A FOREST AREA FOR MORE THAN 1 KILOMETER. THE DRIFT PERCENTAGE IS COMPUTED BY (FINAL DRIFT)/(TRAJECTORY LENGTH). 1&#729;follow 2&#729;leader&#729;follow 3&#729;long&#729;run Avg Trajectory 1481.9 1264.0 1785.4 1510.4 Length (m) Duration (sec) 317.0 424.7 605.8 449.2 MEKF [43] Final Drift (m) 187.4 29.6 392.1 203.0 Drift Percentage (%) 12.7 2.3 22.0 12.3 DRIFT Final Drift (m) 30.2 25.8 97.3 51.1 Drift Percentage (%) 2.0 2.0 5.5 3.18 TABLE V NOISE PARAMETERS USED IN THE UNDERWATER SIMULATION. Measurement Type Noise Parameter Angular Velocity 0.0035 &#176;/sec/ &#8730; Hz Linear Acceleration 0.0014 m/sec/ &#8730; Hz DVL Sensor 0.02626 m/sec Xavier. For the optional contact estimator, the inference speed on an NVIDIA RTX 3090 GPU is approximately 1100 Hz, and the inference speed on a Jetson AGX Xavier (GPU) is around 830 Hz after TensorRT optimization. VII. DISCUSSION AND FUTURE WORK One limitation of DRIFT is the assumption of nonholonomic constraints. These constraints can be detached from the robot's actual behavior. Learning such constraints provides a way to</p><p>TABLE VI RUNTIME EVALUATION OVER DATASETS ON A LAPTOP CPU AND AN NVIDIA JETSON AGX XAVIER. FOR THE JETSON AGX XAVIER, ALL COMPUTATIONS ARE OPERATED ON THE CPU ONLY. i5-11400H AGX Xavier (CPU) Unit: &#181;s mean std mean std InEKF propagation 11.33 4.00 18.35 4.19 propagation with contact 10.32 4.76 22.56 7.21 velocity correction 9.91 4.80 18.46 6.66 contact correction 17.46 9.78 29.39 13.07 Gyro Filter propagation 2.57 3.46 3.96 2.28 correction 2.85 2.89 4.64 4.40</p><p>use sensory inputs instead of assumptions <ref type="bibr">[27,</ref><ref type="bibr">142,</ref><ref type="bibr">143]</ref>. Moreover, the nonholonomic constraints are violated when the robot drifts. Slip detection and friction estimation are challenging and necessary tasks for robot state estimation <ref type="bibr">[99]</ref>.</p><p>Instead of relying on the nonholonomic assumption, marine robots can obtain ground-referenced velocity measurements from a DVL. However, it is worth noticing that a reliable velocity measurement requires a sufficient number of emitted beams to have a line of sights on the seabed, known as bottomlock <ref type="bibr">[123]</ref>. If the robot operates at an extreme roll and pitch angle, or if the robot is too far away from the seabed, the bottom-lock might be lost. A typical DVL can obtain bottomlock for ranges up to 500 m, depending on the frequency of the sensor <ref type="bibr">[144]</ref>. In this work, however, we assumed the DVL can always obtain the bottom-lock.</p><p>Lastly, DRIFT is a real-time dead-reckoning system. While dead-reckoning might be sufficient for some applications, in many cases it can not replace a localization or SLAM system, where global consistency is the objective. As a result, we position DRIFT as an odometry module of a perception system or a dead-reckoning system for applications that do not require global consistency. A natural next step will then be to incorporate DRIFT into a visual SLAM system for real-time globally consistent state estimation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VIII. CONCLUSION</head><p>We developed DRIFT, an open-source, real-time proprioceptive state estimator for multiple robotic platforms. DRIFT is based on the invariant Kalman filtering, leading to significantly better consistency in robot state estimation tasks. Optionally, DRIFT provides two additional modules, a contact estimator and a gyro filter, to assist operations on low-cost robots. We evaluate DRIFT using various robotic platforms with realworld experiments, including a legged robot, an indoor service robot, a field robot, and a full-size vehicle. Additionally, we report simulation results of an underwater vehicle. These experiments verify DRIFT's capability of being a reliable odometry source, even for long-horizon operations. Future directions include extending the framework to support a more general equivariant filter <ref type="bibr">[60,</ref><ref type="bibr">64]</ref>, broader incorporation with symmetry-preserving perception and control algorithms <ref type="bibr">[13]</ref>, and incorporating DRIFT into a more comprehensive localization system for global consistency.</p></div></body>
		</text>
</TEI>
