<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Learning IMU Bias with Diffusion Model</title></titleStmt>
			<publicationStmt>
				<publisher>2025 IEEE International Conference on Robotics and Automation (ICRA)</publisher>
				<date>05/19/2025</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10625680</idno>
					<idno type="doi"></idno>
					
					<author>S Zhou</author><author>S Katragadda</author><author>G Huang</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Motion sensing and tracking with IMU data is essential for spatial intelligence, which however is challenging due to the presence of time-varying stochastic bias. IMU bias is affected by various factors such as temperature and vibration, making it highly complex and difficult to model analytically. Recent data-driven approaches using deep learning have shown promise in predicting bias from IMU readings. However, these methods often treat the task as a regression problem, overlooking the stochatic nature of bias. In contrast, we model bias, conditioned on IMU readings, as a probabilistic distribution and design a conditional diffusion model to approximate this distribution. Through this approach, we achieve improved performance and make predictions that align more closely with the known behavior of bias.• We experimentally validate that the proposed diffusion model achieves more accurate bias prediction, confirming that our probabilistic modeling approach is effective, outperforming regression-based methods, both with direct and indirect supervision.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>3D motion tracking is essential to endow mobile devices and autonomous vehicles with spatial intelligence. Due to the recent advancements in MEMS sensing technology, 6axis IMUs measuring angular velocity and linear acceleration have become ubiquitous and made it possible to estimate 3D motion for sensor platforms at edge with compact size, minimal weight, low power consumption and cost (SWaP-C). However, naive integration of IMU measurements to offer 3D odometry (i.e., acceleration, rotation and velocity) or dead reckoning -without aiding sources such as GPS and vision -often is not reliable and diverges in a very short period of time. Better solutions of inertial-only odometry (IOO) than naive inertial integration are desperately needed in practice. For example, consider hand tracking in mobile AR/VR applications, highly dynamic hands can easily move out of the tracking camera's field of view (FOV), leaving only IMU data available to keep motion tracking alive.</p><p>If IMU measurements were clean and noise free, then naive inertial integration would solve the IOO problem. The reality is much bitter, primarily due to the time-varying stochastic biases that significantly corrupt the inertial signals. As such, in order to find a better IOO solution, it is almost inevitable to better find IMU bias, which is precisely what this paper seeks to address. IMU bias represents an offset of the output from the input value and encompasses many different types of bias parameters such as in-run bias stability, turn-on bias repeatability, and bias over temperature. Many unforeseeable factors such as temperature and vibrations can affect the IMU bias, which makes it impossible to correctly model This work was partially supported by the University of Delaware (UD) College of Engineering, NSF (SCH-2014264, IIS-2410019), Google ARCore, and Meta Reality Labs. 1 The authors are with the Robot Perception and Navigation Group (RPNG), University of Delaware, Newark, DE 19716, USA. Email: {shzhou, saimouli, ghuang}@udel.edu it <ref type="bibr">[1]</ref>, although there are simplified but useful models such as random walk widely used in practice <ref type="bibr">[2]</ref>, <ref type="bibr">[3]</ref>.</p><p>With the emerging of deep learning, there are attempts to model IMU bias in a data-driven manner with neural networks <ref type="bibr">[4]</ref>. These approaches have demonstrated the possibility of regressing bias from IMU readings and subsequently integrating the IMU data to estimate motion with reasonable accuracy over short periods. In particular, one may use a differentiable integration module to integrate IMU readings with the predicted bias removed, and compare the result motion with the ground truth <ref type="bibr">[5]</ref>, <ref type="bibr">[6]</ref>. However, it cannot guarantee the predicted correction to IMU reading is the actual bias. This is because there exists other correction to the IMU reading that can achieve the same or even better integrated motion result, but very different from the true bias. When the supervision is provided indirectly through the integrated motion, the network can learn to make these spurious predictions instead of the real bias. This may not generalize to new data, because the learned correction is not an intrinsic property of IMU, as bias does. Alternatively, one can directly use ground truth bias for supervision <ref type="bibr">[7]</ref>. This method currently only shows to work when integrated with camera in an optimization based VINS system. As we show in the experiment, the performance of this method is inferior compared with indirectly supervised methods. Both approaches assume a single true bias value for a given IMU reading, framing the problem as a regression task.</p><p>In this paper, we propose to model the IMU bias naturally as a probability distribution conditioned on the inertial reading, instead of a fixed value. This formulation, combined with direct supervision, allows for more accurate and faithful bias prediction. To model this complex distribution, we leverage diffusion model, which has shown promising results in capturing distributions with high uncertainity in tasks such as action planning <ref type="bibr">[8]</ref> and human trajectory prediction <ref type="bibr">[9]</ref>. In particular, we design a conditional diffusion model that takes feature extracted from the IMU reading as an additional condition code to approximate the underlying IMUconditioned bias distribution. The IOO with the proposed diffusion model is shown to outperform the regressionbased approaches (with both direct and indirect supervision). Additionally, our predicted bias closely resembles to the ground truth in terms of magnitude and variation patterns, showing superior accuracy and generalization.</p><p>In summary, the main contributions of this paper include:</p><p>&#8226; We, for the first time, design a lightweight diffusion model to learn IMU bias for IOO in a data-driven manner, by naturally modeling bias as a probability distribution conditioned on inertial measurements.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. RELATED WORK</head><p>Many IOO methods exist and can be categorized into model-free and model-based approaches, depending on whether or not the IMU bias is explicitly modeled.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Model-free Method</head><p>Early work explores to leverage motion pattern, with primary applications in Pedestrian Dead Reckoning (PDR) scenarios. Heuristic algorithms such as step counting algorithm with step length estimation <ref type="bibr">[10]</ref> and stationary period detection with zero velocity update (ZUPT) <ref type="bibr">[11]</ref> are explored. A system combine multiple heuristic algorithms working on mobile phone is presented in <ref type="bibr">[12]</ref>. In recent years, there is attempt to use deep learning neural network, to learn to regress the motion from IMU reading end-to-end <ref type="bibr">[13]</ref>, <ref type="bibr">[14]</ref>, <ref type="bibr">[15]</ref>, <ref type="bibr">[16]</ref>, <ref type="bibr">[17]</ref>, <ref type="bibr">[18]</ref>, <ref type="bibr">[19]</ref>. These methods show promising results on PDR scenarios, suppressing the classical method. Positional displacement and velocity are explored as the target for network prediction. Some work leverages the equivariance in the IMU reading, as a way to enable self-supervised learning <ref type="bibr">[20]</ref> or boost the performance <ref type="bibr">[21]</ref>, further pushing the limit of this method. However, these methods still implicitly rely on motion pattern. Essentially, these methods use deep learning to capture motion pattern in a data-driven fashion. Noticeably, <ref type="bibr">[19]</ref> shows such end-to-end learning can work in drone-racing scenario, though it only works when training and testing is on the same trajectory. In this case, the high-speed drone motion for a particular trajectory becomes a complex motion pattern. This shows deep learning can learn non-trivial motion pattern. Yet, it still can't break the theoretical limitation of the reliance on the patterned motion. In this work, we consider general scenario without patterned motion assumption. In this scenario, model-free method shows inferior performance because it struggles to find motion pattern.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Model-based Method</head><p>Model-based method aims to estimate the bias from IMU readings, then remove the bias, and use integration to get motion estimation. Early analysis of IMU bias shows many factors such as temperature, vibration and impacts, all affect IMU bias <ref type="bibr">[1]</ref>. However, the compound effect is hard to model with analytical model. Popular in the system with IMU and other sensors, random walk model <ref type="bibr">[22]</ref> is a simplified choice for bias modeling. It models the bias evolution as a Brownian noise process. However, such model has limited accuracy, and it can't be used without other sensors. Also, it typically requires collecting long period stationary IMU readings for offline calibration to get model parameters.</p><p>Recent deep learning methods offer new way to model bias. Since the end goal is to remove the bias, this approach is also referred as denoising approach. Since bias is not directly available as data, some approaches use indirect supervision from integrated motion, leveraging a differentiable integration process. The first work <ref type="bibr">[23]</ref> estimates gyro bias only, with integrated rotation as training data. <ref type="bibr">[5]</ref> proposes to use supervision from integrated pre-integration terms to regress bias. Recent work <ref type="bibr">[6]</ref> uses integrated motion to regress both bias and its uncertainty. It achieves state of the art result on a few datasets. However, indirect supervision has a misalignment between their training target and the network output. Since multiple IMU readings can produce the same integrated result, supervising with integrated result can't guarantee the network can learn the actual bias instead of predicting other signals. Our experiment shows methods trained with indirect supervision will make spurious prediction that is very different from actual bias. This may hurt the generalization ability, since other signals might not generalize to new data even for the same IMU.</p><p>Close to our work, <ref type="bibr">[7]</ref> proposes to use direct supervision from bias for training. However, it only demonstrates the performance when fusing the bias prediction with vision in a joint factor graph system. As our experiment shows, such direct supervision under regression setting will have limited accuracy. Our method follow the deep learning approach for bias modeling using direct bias supervision. However, different from all the work mentioned above, we deviates from the regression formulation, and treats the bias given IMU reading as a conditional probability distribution.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. INERTIAL-ONLY ODOMETRY</head><p>While inertial navigation systems (INS) aided by different exteroceptive sensors (such as vision and GPS) have been widely studied in the literature (e.g., see <ref type="bibr">[24]</ref>), IOO requires further investigation as aiding sensors can easily degrade or fail in practice. In this section, we will revisit the IOO problem from an INS perspective while focusing on the bias modeling challenges.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Inertial Navigation</head><p>IOO shares the same IMU kinematics as INS to estimate motion (i.e., position, rotation and velocity) using IMU (accelerometer and gyroscope) measurements. Each accelerometer measures proper acceleration on only one axis, and are therefore usually found in groups of three orthogonal devices on a single low cost MEMS chip. However, lowcost accelerometer measurements are far from ideal and are corrupted by noise and bias:</p><p>where I G q &#175;is the unit quaternion that represents the rotation from the global frame of reference {G} to the IMU frame {I} (i.e., corresponding to the rotation matrix C( I G q &#175;)), G a is the true acceleration of the IMU in the global frame {G}, G g is the gravitational acceleration expressed in {G}, and n a &#8764; N (0, N a ) is zero-mean, white Gaussian noise, and b a is the bias changing over time. Like the accelerometer, gyroscope measures angular velocity of the sensor and suffers from noise and bias, and sometimes, misalignment and scale errors.</p><p>Moreover, gyroscope measurements are also influenced by acceleration (i.e. g-sensitivity), whose magnitude is negligible if it is within the range of the additive white noise n g , while in some low-cost MEMS hardware, it can be more significant:</p><p>where T g is the shape matrix causing both misalignment and scale errors in the gyro measurements, T s is the g-sensitivity coefficient, n g &#8764; N (0, N g ) is zero-mean white Gaussian noise, and the bias b g is time-varying and random.</p><p>The INS kinematic model is given by <ref type="bibr">[22]</ref>:</p><p>where</p><p>]&#65025; T is the true rotational velocity of the IMU, and &#8486;(&#969;) is defined by:</p><p>Using the IMU measurements and assuming known bias models (e.g., random walk), 3D motion estimates can be obtained by integrating the above continuous-time kinematics.</p><p>Clearly, the quality of IMU data (affected by noise and bias) determines the motion accuracy.</p><p>Because of the (aided) INS observability properties <ref type="bibr">[25]</ref>, any method that tries to bypass bias modeling and directly predict global position or velocity has the fundamental limitation on the target motion pattern. As we focus on general scenario without prior motion pattern assumption, we limit to estimate motion increment (i.e., odometry), while only assuming known initials if absolute motion is needed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Modeling Bias</head><p>As evident, it is critical to find biases for IOO from IMU measurements in order to be able to perform accurate inertial integration to estimate motion:</p><p>where f &#960; is some estimator. However, finding such estimator is non-trivial because the bias is not deterministic. As an IMU is a physical electronic sensor, factors such as temperature, impacts, vibration, and quantization noise all affect it <ref type="bibr">[1]</ref>. These compound effects are complex and difficult to model. Moreover, many of these factors are time-varying, giving the bias a stochastic nature. Not only does the bias change as the IMU operates, but also after the power cycles further complicating model development. As such, it is almost impossible to analytically model the IMU bias.</p><p>While building an exact model is challenging, analysis on bias as a black-box signal using Power Spectral Density (PSD) <ref type="bibr">[26]</ref> and Allan variance analysis <ref type="bibr">[27]</ref> reveal certain bias characteristic, such as angle/velocity random walk, bias instability and rate ramp. These characteristics become standard in the industry for IMU sensors <ref type="bibr">[28]</ref>, <ref type="bibr">[29]</ref>. However, utilizing all these characteristics to build an estimator is difficult because some of them, like bias instability are defined only in frequency domain, without state-space equivalent.</p><p>As approximation, in practice, a simplified model leveraging rate random walk is commonly used as the bias model <ref type="bibr">[2]</ref>, <ref type="bibr">[3]</ref>. Specifically, it assumes the bias dynamic model as:</p><p>To fit parameters &#951; g , &#951; a , a common approach is to collect a sequence of stationary IMU readings and fit them with using Allan variance analysis, e.g., as demonstrated in Kalibr <ref type="bibr">[2]</ref>:</p><p>The initial values b g (0), b a (0) require extra heuristics to estimate, such as taking the average of stationary IMU reading and subtract. Although this simple model captures the slow variations in bias, its accuracy is limited and typically requires external sensors to aid inertial navigation. Additionally, the process involves two steps: first, estimating dynamic parameters using specific IMU readings, and second estimating bias based on the dynamic model. The first step calibration is not only time-consuming but also restrictive, as it demands an extended period of stationary IMU readings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. LEARNING BIAS FOR IOO</head><p>In this section, we thus design a deep neural network to represent the modeling function f &#960; (6), which can be trained end-to-end to predict the IMU bias. This is in contrast to the classical random walk model, which uses a hand-craft twostep pipeline and assumes a long period static IMU reading available. These models are shown to be able to generalize to unseen readings with good accuracy. The success motivates us to take the approach of deep learning based bias modeling.</p><p>[&#65027; b g (t) b a (t)</p><p>However, different from the literature, we do not treat it as a regression problem, assuming b g , b a are fixed value. Instead, we model them as probability distribution, as p(b g , b a |&#969;, a). This probability distribution can be very complex, thus deep learning model is a good fit to estimate them.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Diffusion Model</head><p>Diffusion models <ref type="bibr">[30]</ref> are generative models that aims to represent data x 0 using a series of latent codes x 1 , . . . , x T through a forward and reverse diffusion process. The forward process gradually adds noise to the data, encoding it into a structured latent space, while the reverse process decodes the latent code back into the original data. Once trained, the model allows us to sample a latent code x T from a simple distribution and generate the corresponding data x 0 by running the reverse diffusion process. The key strength of the diffusion model lies in its ability to model highly complex distribution x 0 &#8764; q(x), by leveraging the multiple latent representations between x 1 and x T . That is why we want to use diffusion model to learn conditional bias distribution. The latent space is structured in such a way that x T follows a simple Gaussian distribution, making sampling straightforward.</p><p>The encoding process between two latent codes x t-1 , x t is performed by adding Gaussian noise.</p><p>where &#946; t is the hyperparameter that controls the amount of noise added at each step t, and T is the total number of diffusion steps. As t increases, the latent variable x t transitions towards pure Gaussian noise. At the core of the diffusion model is the denoiser network, which is described in Sec. IV-B, aiming to estimate the noise added at each step in the forward process. Given corrupted data x t and the step t, the network predicts the noise &#1013; t-1 added at the previous step:</p><p>The denoiser network is trained using the Mean Squared Error (MSE) loss on the noise:</p><p>This simple training loss function is equivalent to minimizing the evidence lower bound (ELBO) from variational inference perspective, which allows the model to approxiamte the underlying distribition of data x 0 . To generate a sample from the diffusion model, we first sample a latent code x T , and decode it back to original data x 0 . x T follows a Gaussian distribution, as conceptually it is the result of adding Gaussian noise for T steps in the forward process. The sampling process begins with:</p><p>x T &#8764; N (0, I)</p><p>Next, we use denoiser network to iteratively decode x t back to x t-1 as follows:</p><p>where &#946; t , &#963; t , &#947; t are fixed value. The parameter &#946; t is the same noise variable used in the forward process, while both &#946; t , &#963; t are hyperparameters of the diffusion model, controlling the noise schedule. The parameter &#947; t is a fixed function of &#946; t .</p><p>In our bias modeling, x 0 corresponds to the original bias (b g , b a ). The bias is the only required training data.</p><p>As we want to model the conditional probability distribution of bias given IMU readings, we introduce an additional feature vector c extracted from the IMU readings. This feature c serves as conditional code to the denoiser network at each step t of the denoising process, so that we can model the conditional distribution:</p><p>The training and sampling steps remain the same.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Model Design</head><p>As shown in Fig. <ref type="figure">1</ref>, we design two models to implement &#1013; &#952; in equation 14: the IMU encoder and the denoiser network of the diffusion model. The IMU encoder extracts feature code c and pass it to the denoiser network, as the implementation of &#1013; &#952; (x t , t, c). We choose Temporal Convolutional Network (TCN) as the IMU encoder, because it effectively captures the temporal relation in sequential data. It is easy to train and deploy because it mainly consists of convolutional layers <ref type="bibr">[31]</ref>. Previous deep learning based IOO work <ref type="bibr">[16]</ref>, <ref type="bibr">[17]</ref> shows it can extract useful information in IMU reading sequence.</p><p>The second component is the denoiser network for the diffusion model. It needs to fuse the conditional code c from IMU encoder with diffusion model latent code x t and step number t, and then process the fused code with its backbone to make prediction, and optimize for the loss function in equation 11. The internal structure is illustrated on the right of Fig. <ref type="figure">1</ref>.</p><p>The fusion is done with one linear layer, as a simple design. Since in diffusion models, each denoising step corresponds to a specific noise level, the timestep information is critical. Thus, we add sinusoidal positional embedding to the step number t to provide a smooth, continuous representation of time, inspired by the design of transformer <ref type="bibr">[32]</ref>.</p><p>Deviating from U-Net <ref type="bibr">[33]</ref>, a popular design for diffusion models, we design a lightweight RNN-based network, because U-Net is computationally expensive with large number of paramaters, making it less suitable for real-time applications where efficiency is the key. Our backbone consists of only a stacked GRU <ref type="bibr">[34]</ref> with two cells, followed by a linear layer. Despite the simple architecture, as demonstrated in Table <ref type="table">I</ref>, our network outperforms U-Net, offering better performance with a significantly smaller architecture.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Implementation Details</head><p>In practice, we process a window of IMU readings at once, instead of one-by-one, because the network needs context information from IMU readings. However, the window can't be too long either, because the drift inevitably becomes larger as the window is larger, even with correction from predicted bias. We choose one-second window as the window size, inspired the choice of <ref type="bibr">[17]</ref> and <ref type="bibr">[7]</ref>, striking a balance between capturing sufficient temporal information and maintaining the system online performance.</p><p>For network training, we allow overlapping between IMU windows taken from the full IMU reading sequence, so the network can see more IMU reading patterns. However, too much overlapping provides very similar data, slowing down the training without clear benefit. In practice, we find 50% overlapping to be a good choice.</p><p>The training uses Adam optimizer <ref type="bibr">[35]</ref>, with learning rate of 3 &#215; 10 -5 , taking 6 hours on an NVIDIA A4500 GPU. The noise schedule is linearly spaced between &#946; 1 = 0.0001 and &#946; T = 0.02, with the model trained for 1000 steps, following the default setting in <ref type="bibr">[30]</ref>.</p><p>For sampling, we select DDIM <ref type="bibr">[36]</ref> to save sampling steps while maintaining the performance. We use only 25 sampling steps for bias generation, in contrast to the typical 1000 steps required by standard DDPM sampling <ref type="bibr">[30]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Acquire Bias Ground Truth Data</head><p>To train the model, we require the ground truth bias at the IMU rate. Although the bias is not directly measured by the sensor since it is observable <ref type="bibr">[37]</ref>, it can be recovered through joint optimization of IMU data and other sensors inputs. Many VINS systems, such as OpenVINS <ref type="bibr">[3]</ref>, OKVIS <ref type="bibr">[38]</ref> and VINS-Mono <ref type="bibr">[39]</ref>, provide reliable bias estimates as part of their state estimation. When additional sensors like LiDAR <ref type="bibr">[40]</ref> or external motion capture system <ref type="bibr">[41]</ref> are available, the bias estimation can be further refined.</p><p>Although these bias estimates are typically provided at the frame rate, we can interpolate them to match the higher frequency of the IMU. Since IMU bias tends to change slowly over time, the interpolated values offer sufficient accuracy for the use as supervision during training.</p><p>Empirically, we find that bias recovered through joint optimization and then interpolated to the IMU rate is of high quality. When the recovered bias is used to correct the IMU data, it results in better motion integration performance compared to bias predicted by deep learning models trained on integrated motion data. Therefore, the recovered bias can serve as an effective ground truth signal to guide our network towards better performance.</p><p>Moreover, we observe that the recovered bias is continuous and changes slowly, consistent with out prior understanding of IMU bias behavior. This further supports the validity of using the recovered bias as the ground truth for training the model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. EXPERIMENTAL RESULTS</head><p>We conduct our experiments on the EuRoC dataset <ref type="bibr">[41]</ref>, using the same training and testing splits as prior studies <ref type="bibr">[6]</ref>. For evaluation metrics, we use relative Positional RMSE (PRMSE in meters) for position and Relative Orientation Error (ROE in degrees), consistent with established conventions in <ref type="bibr">[6]</ref>.</p><p>We compare the performance with the following baselines:</p><p>&#8226; AirIMU <ref type="bibr">[6]</ref>, a recent work that predicts bias through indirect supervision using integrated motion. This method achieves state-of-the-art result on EuRoC dataset, outperforming previous work that also use indirect motion supervision <ref type="bibr">[23] [5]</ref>, as well as model-free methods <ref type="bibr">[17]</ref>.</p><p>As expected, the model-free approach, which relies on motion patterns, performs significantly worse on EuRoC dataset. By comparing our method with AirIMU, we indirectly compare it with model-free methods as well. &#8226; Random walk modeling baseline: For this baseline, we use noise density and random walk rate parameters from offline calibrated results provided by the EuRoC dataset.</p><p>Since the random walk model treats bias as a stochastic process, its actual performance is difficult to evaluate directly. We provide a strong baseline as the performance upper bound. We take the ground truth bias at the start of each IMU window, and sample multiple bias changes according to the random walk model. The final bias is the sum of the initial ground truth bias and the sampled bias changes. After removing the sampled bias from the IMU readings and integrating the result, we select the best result for each window. It should be noted that this is not a practical algorithm, as it relies on the ground truth data to choose the optimal result. In our experiments, we sample bias changes 50 times per window. &#8226; Direct bias regression. This method follows similar approach to <ref type="bibr">[7]</ref>, where the network directly regresses bias values using the ground truth as supervision. For a fair comparison, we use the same network architecture as our model, with minimal changes to the output layer to match dimensions required for regression. We do not compare directly with the results from <ref type="bibr">[7]</ref> because they only show results using predicted bias in a factor graph optimization framework with visual observations, and their code is not publicly available.</p><p>The results are presented in table I. Since our model uses a probabilistic formulation, its predictions are samples from the learned IMU-conditioned bias distribution. Thus, the metric reported in the table is averaged over 50 runs. Our model achieves improved performance in terms of position error and the second-best orientation error. Our result is better than the strong random walk baseline, demonstrating that our bias model is more accurate than commonly used random walk model in its best case. Compared with direct regression baseline, our model with almost the same network has better performance. This shows that our probabilistic model formulation can better captures the problem nature than the regression formulation, thus leading to the improved performance.</p><p>In comparison to AirIMU, our model has better position error but worse orientation error, resulting in a similar overall performance. As AirIMU is better than the RNN direct regression baseline, who has similar backbone of TCN and GRU, the indirect supervision can offer better accuracy than the direct supervision. However, as we will show in Section V-A, indirect supervision method may suffer from spurious</p><p>TABLE I: Motion estimation result for 1-second window on EuRoC dataset (PRMSE / ROE) Sequence AirIMU Ours (RNN) Direct Regression (RNN) Random Walk Ours (UNet) Direct Regression (UNet) MH02 0.0234 / 0.0789 0.0225 / 0.0604 0.0246 / 0.1370 0.0615 / 0.1380 0.0227 / 0.0775 0.0343 / 0.1307 MH04 0.0415 / 0.0708 0.0410 / 0.0636 0.0437 / 0.1551 0.0657 / 0.1413 0.0408 / 0.0593 0.0462 / 0.1233 V103 0.0583 / 0.1884 0.0561 / 0.1931 0.0611 / 0.2369 0.0685 / 0.1762 0.0577 / 0.2185 0.0639 / 0.2574 V202 0.0851 / 0.2157 0.0703 / 0.2557 0.0777 / 0.3010 0.0813 / 0.1877 0.0703 / 0.2627 0.0664 / 0.4179 Average 0.0521 / 0.1385 0.0475 / 0.1432 0.0510 / 0.2075 0.0693 / 0.1608 0.0479 / 0.1545 0.0527 / 0.2323 * the best performance * the second best performance * V101 not tested as its ground truth accuracy is limited, as reported in [3]</p><p>Fig. <ref type="figure">2</ref>: Bias prediction result for our model and AirIMU in an one-second window prediction. Our method uses the direct supervision while having similar performance to indirect supervision method, combining the best of two methods. This is thanks to our probabilistic formulation implemented with diffusion model, instead of the existing regression formulation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Diffusion Model vs. Indirect Regression</head><p>In this experiment, we compare the bias predictions from our method with those from AirIMU <ref type="bibr">[6]</ref>. As mentioned earlier, using indirect supervision through integrated motion can result in spurious bias predictions, as the network may predict unrealistic bias values to optimize motion integration. While this is not an issue when the end goal is accurate integrated motion, it raises concerns about its generalization ability. If the predicted correction does not correspond to the actual IMU bias, it may not capture intrinsic IMU properties and therefore may not generalize well to new data.</p><p>We validate this concern in the experiment. As an example, we randomly select one-second IMU reading window and plot the predicted bias values alongside the ground truth. In Fig. <ref type="figure">2</ref>, our prediction match more closely to the ground truth in both magnitude and changing pattern, In contrast, AirIMU's predictions show abrupt changes, violating our prior knowledge of IMU bias. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Timing on Embedded Device</head><p>We evaluate the timing of our pipeline on NVIDIA Jetson AGX Orin embedded device for both the U-Net architecture and the proposed lightweight RNN model. As shown in Table <ref type="table">II</ref>, the U-Net model has 42.8 million parameters and requires 170 ms for inference, whereas the RNN model is significantly smaller, with 2.2 million parameters, and achieves a faster inference time of 145 ms. This demonstrates the efficiency of the RNN model in terms of both model size and speed.</p><p>Low-speed real-world applications can benefit from this inference time, such as agriculture and warehouse robots. For more demanding scenarios such as high-speed drone, further optimization is necessary.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. CONCLUSIONS AND FUTURE WORK</head><p>In this paper, we have introduced a conditional probability distribution formulation for the IMU bias modeling. Based on this formulation, we have designed a conditional diffusion model to predict the bias from IMU reading, and used it for inertial-only odometry (IOO). Compared with classical random walk bias model and regression based neural network, our model shows better performance and more faithful prediction, which has been validated on the EuRoC dataset, showing the effectiveness of our probabilistic formulation. Although we treat the bias as IMU-conditioned probability distribution, there is more work to be done to leverage the probability distribution to make better bias prediction, rather than taking one random sample as the output. Another direction for future work is to explore how to provide uncertainty for the prediction. In all, we believe our new probabilistic formulation for IMU bias modeling opens up new opportunity to capture IMU bias and benefits the field of IOO.</p></div></body>
		</text>
</TEI>
