<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>A Visual Inertial Odometry Framework for 3D Points, Lines and Planes</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>09/27/2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10344405</idno>
					<idno type="doi">10.1109/IROS51168.2021.9636526</idno>
					<title level='j'>2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Shenbagaraj Kannapiran</author><author>Jeroen van Baar</author><author>Spring Berman</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Recovering rigid registration between successive camera poses lies at the heart of 3D reconstruction, SLAM and visual odometry. Registration relies on the ability to compute discriminative 2D features in successive camera images for determining feature correspondences, which is very challenging in feature-poor environments, i.e. low-texture and/or low-light environments. In this paper, we aim to address the challenge of recovering rigid registration between successive camera poses in feature-poor environments in a Visual Inertial Odometry (VIO) setting. In addition to inertial sensing, we instrument a small aerial robot with an RGBD camera and propose a framework that unifies the incorporation of 3D geometric entities: points, lines, and planes. The tracked 3D geometric entities provide constraints in an Extended Kalman Filtering framework. We show that by directly exploiting 3D geometric entities, we can achieve improved registration. We demonstrate our approach on different texture-poor environments, with some containing only flat texture-less surfaces providing essentially no 2D features for tracking. In addition, we evaluate how the addition of different 3D geometric entities contributes to improved pose estimation by comparing an estimated pose trajectory to a ground truth pose trajectory obtained from a motion capture system. We consider computationally efficient methods for detecting 3D points, lines and planes, since our goal is to implement our approach on small mobile robots, such as drones.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>Odometry and SLAM are critical for robots to navigate in previously unseen environments. Although various solutions are available <ref type="bibr">[1]</ref>, they rely on discriminative visual features in color camera images in order to determine robust correspondences. Texture-rich environments can provide the necessary visual features and many encountered environments are texture-rich, e.g., outdoor environments. However, other environments may lack texture almost completely, e.g., stairwells and elevator shafts. In addition, even if the environment is texture-rich, low light levels or atmospheric conditions such as fog will negatively impact the ability to compute robust correspondences.</p><p>Without good visual features in camera images, robots will have to rely on other sensing modalities to provide 3D measurements for registration and pose estimation. A popular modality is LIDAR, which can provide a 360 &#8226; point cloud around the robot. However, LIDAR sensors are both expensive and and, more importantly, heavy. In the case where the 2 Jeroen van Baar is with Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA 02139, USA jeroen@merl.com robot is an Unmanned Aerial Vehicle (UAV), the weight of a LIDAR sensor severely limits its feasibility as a payload. RGBD cameras, on the other hand, are inexpensive and lightweight, and thus a more desirable choice for obtaining 3D measurements. Although current 3D reconstruction and SLAM approaches incorporate RGBD cameras and utilize their depth information <ref type="bibr">[2]</ref>, these approaches still rely on visual features from the color camera images.</p><p>Additional sensors, such as inertial measurement units (IMUs), have been introduced to further assist 3D reconstruction and SLAM. A major drawback of IMUs is drift in their measurements due to integration of a slowly changing bias instability and high-frequency noise. Relying mostly on IMUs for registration, dead-reckoning accumulates errors over time and can lead to registration errors that are prohibitively large for accurate reconstruction and mapping. Visual Inertial Odometry (VIO) frameworks <ref type="bibr">[3]</ref> have been introduced to reduce or eliminate this drift. In addition to 2D point features, 2D line features have been recently considered for VIO <ref type="bibr">[4]</ref>. Assuming that the environment can be (locally) approximated as a Manhattan-world configuration, the approach relies on tracking 2D lines as features in the camera images.</p><p>We propose a VIO framework that can handle texture-poor environments and incorporates 3D geometric entities: 3D points, 3D lines and 3D planes from point clouds obtained with an RGBD camera. The 3D points themselves may be prone to noise in the point cloud data, and fitting of 3D lines and planes with robustness to outliers-due to noisehelps to mitigate sensitivity to such noise. A major advantage of our proposed approach, due to its reliance on depth measurements rather than visual features, is its potential to operate effectively in environments with fog, low-light conditions, or man-made structures devoid of visual textures. Moreover, our framework is agnostic to the particular RGBD camera that is used to obtain the point clouds. Our focus is on developing an approach which can be implemented on UAV robots that can access relatively small spaces. This limits the UAV size, and hence the size and weight of its payload. Our contributions can be summarized as follows:</p><p>&#8226; Algorithms for detection and tracking of 3D points, 3D lines and 3D planes in 3D point clouds, including methods to determine world space estimates from tracked 3D entities in local camera frames; and &#8226; A unified VIO framework that incorporates the tracked 3D points, 3D lines and 3D planes as geometric constraints in an Extended Kalman Filtering (EKF) framework. We next discuss related work, followed by a technical description of our approach in Section III, and discussion of experiments and results in Section IV. Section V concludes the paper and provides an outlook on future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. RELATED WORK</head><p>In this section, we give an overview of previous work related to 3D reconstruction, SLAM and visual odometry. Given the wealth of research in these fields, e.g., see <ref type="bibr">[1]</ref>, <ref type="bibr">[5]</ref>, <ref type="bibr">[6]</ref>, a comprehensive overview is beyond the scope of this paper. Feature matching and determining correspondences lie at the heart of all these approaches. Several approaches use monocular RGB(D) cameras to determine features and correspondences. ORB-SLAM <ref type="bibr">[7]</ref>, <ref type="bibr">[8]</ref> computes so-called ORB features for matching, providing sparse correspondence information. To ensure greater robustness to mismatches in 2D feature points, SLAM with points and planes <ref type="bibr">[2]</ref>, <ref type="bibr">[9]</ref> rely on both SIFT <ref type="bibr">[10]</ref> (or alternatively SURF <ref type="bibr">[11]</ref>) features and plane correspondences. CPA-SLAM <ref type="bibr">[12]</ref> incorporates a plane-based approach using expectation-maximization (EM) pose estimation and graph-based optimization for registration. More recent techniques use data-driven approaches, e.g., <ref type="bibr">[13]</ref>, but still rely on correspondences from visual data. An interesting data-driven registration approach samples correspondences <ref type="bibr">[14]</ref> in a deep learning framework. Although featureless approaches are being investigated, e.g., <ref type="bibr">[15]</ref>, these methods still rely on textures for template matching.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Methods for Low-Texture Environments</head><p>Classical methods rely on texture-rich environments for computing visual features which are discriminative for matching. Low-texture environments, however, cannot provide such features. Several approaches have been introduced for low-texture environments. Pop-up SLAM <ref type="bibr">[16]</ref> detects wall and ground planes in single images and formulates SLAM based on Pop-up 3D planes, while also incorporating sparse features when available. Newcombe et al. <ref type="bibr">[17]</ref> perform fast Iterative Closest Point (ICP) directly on depth scans acquired with an RGBD sensor. Although this method does not rely on features, scans between successive frames have to overlap significantly to avoid incorrect ICP solutions. The method proposed in <ref type="bibr">[18]</ref> uses range flow and sparse geometry features from depth maps directly to handle lowtexture environments. The method in <ref type="bibr">[19]</ref> leverages point and line features in dynamic Manhattan worlds by performing dynamic object tracking, visual odometry, local mapping, and loop closing. End-to-end deep learning approaches aim to learn feature correspondences in depth maps <ref type="bibr">[20]</ref>. A major drawback of deep learning approaches is their reduced accuracy in novel environments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Inertial Odometry</head><p>Incorporating additional sensors besides cameras can further improve results. The authors in <ref type="bibr">[21]</ref> propose to combine odometry or inertial sensor measurements with ORB-SLAM to improve performance in low-texture environments. MonoSLAM <ref type="bibr">[22]</ref> introduced an Extended Kalman Filtering (EKF) approach to SLAM, including the use of a gyro sensor. Visual Inertial Odometry (VIO) <ref type="bibr">[3]</ref>, <ref type="bibr">[23]</ref> also uses an EKF framework to prevent drift by integrating constraints, i.e. 2D features, that are generated from visual information in successive frames. Constraints from 2D lines <ref type="bibr">[4]</ref> can be added by assuming that the scene is locally Manhattan-world. Similarly, there are other tightly coupled VIO techniques, such as <ref type="bibr">[24]</ref>, <ref type="bibr">[25]</ref>, <ref type="bibr">[26]</ref>, <ref type="bibr">[27]</ref>, <ref type="bibr">[28]</ref>, that exploit points and lines to reduce IMU drift and obtain optimized camera pose estimates.</p><p>To handle low-texture environments, we build on the VIO frameworks described in <ref type="bibr">[3]</ref>, <ref type="bibr">[4]</ref>, <ref type="bibr">[23]</ref> and propose to directly incorporate 3D geometric entities, i.e., 3D points, 3D lines and 3D planes, computed from the point cloud provided by an RGBD camera, into a single unified VIO framework. To the best of our knowledge, this is the first such proposed VIO framework.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. VIO WITH 3D GEOMETRIC ENTITIES</head><p>In our VIO framework, the visual information, which is incorporated as geometric constraints in an EKF, relies on tracking a set of 2D correspondences, i.e., points and lines. To compensate for the potential lack of texture, or visual information necessary to determine 2D correspondences, we directly utilize 3D information from the underlying geometry instead. We first give a brief overview of the notation and VIO framework using EKF and then describe our proposed approach in more detail.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Notation and Prerequisites</head><p>We use the following notation and definitions. We identify several coordinate spaces: the local coordinate frame I for the inertial sensor; the local coordinate frame C for the RGBD camera; and the world coordinate frame W. A (rigid) transform M from a space A to another space B is denoted by B A M. Rotational matrices are denoted by R, and R q is obtained from a quaternion vector q by the operation C(q).</p><p>The notation V&#215; denotes a 3 &#215; 3 skew-symmetric matrix from some 3-vector V. An estimated quantity is decorated with a &#710;symbol, and an error quantity with &#732;, defining x = x -x for some quantity x (e.g., a state vector).</p><p>The goal of VIO is to estimate IMU states in an EKF estimation framework, and "ground" the estimates according to visual features. The IMU state at time k is defined as</p><p>T , whose components are the rotation between the world and IMU frames, the gyro bias, the IMU velocity in the world frame, the accelerometer bias, and the IMU position in the world frame, respectively. For the purpose of grounding, the EKF state estimate at time k, Xk , augments the IMU state with N camera poses:</p><p>(note the&#710;). In the EKF framework, state estimates are updated at each time step as follows: Xk+1 = Xk + &#8710;X = Xk + K k r k , where K k denotes the matrix of Kalman gains, and r k is a residual of the form</p><p>Here, H is the measurement Jacobian matrix, and the noise is zero-mean Gaussian white noise, uncorrelated with the state vector X. The IMU error-state is:</p><p>, with the corresponding EKF error-state:</p><p>For some quantity x, the error x here means x = x -x. The vectors &#948;&#952; T Ci are errors for the quaternions representing camera poses i. The EKF error-state is an (15 + 6N )dimensional vector. For more details on the EKF framework, we refer the reader to <ref type="bibr">[23]</ref>.</p><p>The measurement model relies on the notion that static features are observed from multiple poses, which impose constraints on the IMU estimates: the pose estimates should be consistent with the observations of the (tracked) geometric entities. Given Eq. ( <ref type="formula">1</ref>), the objective is to determine the measurement Jacobian matrix H. We will now describe how to incorporate observations of 3D points, 3D lines and 3D planes into a single framework for VIO.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Points in 3D</head><p>Given a pair of successive RGBD images, labeled source and target, we compute Fast Point Feature Histogram (FPFH) features <ref type="bibr">[29]</ref> for the 3D points in the point cloud. The choice of FPFH is based on the required computational efficiency, given the necessity for real-time computation if deployed on an actual UAV. We then use the FPFH features to determine correspondences between 3D points in the source and target.</p><p>We denote the position of the 3D point j observed in frame i as z</p><p>Since W p j is unknown, it is estimated by tracking 3D point correspondences. We denote this estimate as W pj and can obtain measurement estimate &#7825;(j)</p><p>We can define the measurement residual as r</p><p>In order to incorporate 3D points in the VIO measurement model, we update the Jacobian of a measurement z (j) i with respect to the state, cf. Eq. ( <ref type="formula">22</ref>) in <ref type="bibr">[23]</ref>, as follows:</p><p>The skew-symmetric matrix in Eq. ( <ref type="formula">2</ref>) is obtained using Lie theory <ref type="bibr">[30]</ref>. Given a matrix R &#8712; SO(3) and a function y = f (R, x) = R &#8226; x, we compute:</p><p>where &#969; represents the parameters of so(3).</p><p>The Jacobian of z (j) i</p><p>with respect to feature position, cf. Eq. ( <ref type="formula">23</ref>) in <ref type="bibr">[23]</ref>, is</p><p>The residuals for all M pt observations of 3D point j are then stacked.</p><p>Given tracked correspondences for 3D point j in local coordinate frame C, the estimate W pj can be easily determined by taking the average over W pi j , i &#8712; {1, ..., M pt }, where W pi j = ( Ci W R q ) -1Ci p j . We employ RANSAC to eliminate potentially incorrect correspondences.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Lines in 3D</head><p>In addition to 3D points, we aim to incorporate 3D lines. In the presence of noisy observations, fitting a line may mitigate some of the noise. Furthermore, some scenes may not provide many discriminative point features, e.g., if the geometry consists of mostly planar wall regions. Our goal is to incorporate 3D lines which are robust with respect to several observations. Various 3D line finding techniques have been introduced, e.g., <ref type="bibr">[31]</ref>, <ref type="bibr">[32]</ref>, <ref type="bibr">[33]</ref>. To avoid an expensive search, we identify only those lines for which the points lie on a boundary between different depth values, as determined from the depth map. Although we could use the FPFH features computed for the 3D point correspondences, instead for each 3D point we analyze the normals in a small local neighborhood, and define a histogram-based feature similar to SIFT features <ref type="bibr">[10]</ref>. We group together all points with similar features and fit a 3D line Ci L j to them. The line fitting procedure is based on Eigen analysis.</p><p>The authors in <ref type="bibr">[4]</ref> parameterize lines with start and end points. In the case of 3D lines, we would need to track these 3D points and update them if they become occluded or clipped by the camera field of view. To avoid having to determine such corresponding anchor points for a line between observations, we parameterize a line by its direction d only. A line j is then transformed by:</p><p>The Jacobians of z (j) i with respect to the state and feature position are then: The residuals for all M ln observations of 3D lines j are then stacked.</p><p>As for 3D points, the 3D line estimate W dj can be determined by averaging the observations in the world coordinate frame for the 3D line j. Tracking lines between different observations is performed by comparing line directions in the world coordinate frame, and designating those with directions within some threshold as corresponding lines. However, this may potentially designate lines with similar directions, but different 3D positions, as corresponding. To prevent this, for a given line L j we determine the 3D points which support the 3D line, denoted as</p><p>, where &#964; is chosen as the threshold to determine the number of inliers are discarded, since we found empirically that they cannot be robustly matched. Second, for two lines L j , L k we can use the associated P Lj and P L k to calculate Euclidean distances. If the distance is within some threshold, lines L j , L k are flagged as corresponding, and non-corresponding otherwise.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Planes in 3D</head><p>Finally, we also consider 3D plane entities for the VIO. Planes can further mitigate uncertainty due to noise for points and lines. The treatment of 3D planes in this framework follows that of lines. The main difference between lines and planes is that we parameterize the planes by their normal direction.</p><p>The Jacobians of z (j) i wtih respect to the state and feature position are now:</p><p>where n denotes the plane normal. The residuals for all M pl observations of 3D planes j are then stacked. Tracking 3D plane correspondences across observations occurs in the same way as for 3D lines, including the calculation of supporting 3D points for the plane. Planes with the number of supporting points lower than a threshold are discarded for the same robustness reason as lines (see above).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E. Dimensionality Reduction</head><p>For a world coordinate frame feature W f , i.e., a 3D point, line or plane, we have that r (j) H (j) X X + H (j)W f + noise. In order to formulate the residual in the form of Eq. ( <ref type="formula">1</ref>), we perform the same nullspace projection as in <ref type="bibr">[3]</ref>, resulting in the (stacked) residuals r (j) o and Jacobian H (j) o for a feature j. Prior to the nullspace projection, the Jacobian is a (3M (j) &#215; 3)-dimensional matrix, and after projection, its dimensions are ((3M (j) -3) &#215; 3). Here M (j) denotes the number of observations for an entity j: a point, line or plane.</p><p>If we assume that &#8704;j, M (j) = M = M pt = M ln = M pl , then given N pt 3D points, N ln 3D lines and N pl 3D planes, the final stacked Jacobian matrix, by combining all entities, is ((N pt +N ln +N pl )(3M -3)&#215;3)-dimensional. It is evident that dimensionality reduction via QR decomposition is essential for reducing the computational complexity. We follow the same approach as in <ref type="bibr">[3]</ref> for reducing the dimensionality.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>F. EKF Updates</head><p>We now have the necessary information to compute the desired Kalman gains to perform an EKF update, i.e., determine the updated state and covariance estimates. In addition to augmenting the EKF state, the arrival of a new observation also augments the covariance matrix necessary for the EKF updates. In the next section, we perform an evaluation of our proposed framework.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. EXPERIMENTS</head><p>We first provide a short description of the experimental hardware platform, and then describe the results of experiments we conducted to demonstrate the effectiveness of incorporating 3D points, lines and planes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Experimental Hardware Platform</head><p>We instrumented a UAV with an IMU and RGBD camera. The IMU is an Adafruit 9-DOF Absolute Orientation IMU Fusion Breakout -BNO055, providing IMU readings at 100 Hz. The RGBD camera is an Intel RealSense D435 that acquires data at 10 Hz<ref type="foot">foot_0</ref> . The experimental hardware is shown in Figure <ref type="figure">2</ref>. We employed IMU-camera calibration as proposed by <ref type="bibr">[34]</ref>, <ref type="bibr">[35]</ref>. The IMU-camera calibration also provides estimation for the camera's intrinsic parameters. We used all the depth frames produced by the RGBD camera, around 150 frames for each experiment. The compute time of registration between adjacent frames was around 5 s using non-optimized Python code; we expect that C++ code with modest optimization could run at an interactive rate. Due to restrictions, the results in this paper were obtained by manually moving the experimental UAV platform.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Results</head><p>Quantitative Results. We want to evaluate how incorporating different 3D geometric entities improves the accuracy of registration for a given texture-poor scene. The scene is shown in Figure <ref type="figure">3(a)</ref>. In order to be able to perform  a quantitative comparison, we used an OptiTrack motion capture system <ref type="bibr">[36]</ref> to track six infrared reflective markers mounted on the UAV, using 34 cameras operating at 120 Hz. The motion capture system produces a pose trajectory, i.e. 3D locations and orientations, of the UAV and we refer to this as ground truth. The results are summarized in Table <ref type="table">I</ref>. Columns four and five report position and orientation RMS errors, respectively, between the ground truth and the position and orientation estimated by our framework. Each row indicates which 3D entities are incorporated, except for the first row where we rely on the IMU only, or dead-reckoning. The errors are largest when none of the 3D entities are used. The errors are nearly identical when only one of the entities is incorporated. When incorporating 3D points and lines, and 3D points and planes, the errors improve. However, omitting 3D points but incorporating 3D lines and planes results in an increased RMS error. A possible explanation for this is that the lines and planes do not sufficiently constrain all degrees of freedom in our framework, resulting in drift. Finally, the inclusion of all three entities gives the lowest RMS error. We point out that although these are RMS errors, as more input data are processed, the accumulation of errors causes the estimated trajectory of the UAV to drift more. As is evident from Table <ref type="table">I</ref>, apart from the case when none of the entities are incorporated, the rotational RMS errors do not vary significantly among the different combinations. One explanation for this is the fact that the gyroscope provides more accurate measurements compared to the accelerometer. Qualitative Results. Figure <ref type="figure">3</ref> shows results of registration for two different environments. The first environment, the Box Scene shown in Fig. <ref type="figure">3a</ref>, was constructed by arranging several cardboard boxes and foam boards to create a challenging, feature-poor scene. We discussed quantitative results for this environment above. The registration result shown in Fig. <ref type="figure">3b</ref> is for 125 registered point clouds. The second environment is a Hallway Scene, depicted in Fig. <ref type="figure">3c</ref>. The result in Fig. <ref type="figure">3d</ref> is from 59 registered point clouds. Although the hallway scene is texture-rich, we only rely on 3D entities from the point cloud data. Both environments show qualitatively good registration results. Figure <ref type="figure">4</ref> shows an example of detected 3D points, lines and planes for the Box Scene (Fig. <ref type="figure">3a</ref>). In general, the number of detected lines and planes will be low. On the other hand, there may be an abundance of 3D points, which would create an imbalance and cause our framework to mainly be corrected by the 3D points. To avoid this imbalance, we limit the number of points for registration by dividing the point cloud of the observed scene into cells and only using a small number of points per cell. The cell size and number of points are hyperparameters that are determined by the user.</p><p>To verify whether the lack of texture in the Box Scene indeed hindered 2D visual feature tracking, we employed ORB-SLAM2 <ref type="bibr">[8]</ref> to recover the poses of the UAV and to register the point clouds. We found, however, that ORB-SLAM2 was not able to compute the necessary discriminative features for matching and failed entirely.</p><p>In general, the accuracy of our method hinges on the quality of the point clouds and the ability to extract 3D features from them.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. CONCLUSION</head><p>We have presented a unified framework to incorporate 3D points, lines and planes for Visual Inertial Odometry. Our approach is suitable for environments which lack texture for computing 2D correspondences, which most prior methods rely on. Our experiments show that by incorporating all three primitives, we can increase the accuracy of registration between subsequent frames. The depth or point clouds acquired with a RGBD camera suffer from noise, and tracking 3D points may be sensitive to such noise. By additionally estimating 3D lines and 3D planes, this noise can be mitigated. To avoid having to estimate so-called anchor points for 3D lines and planes, we only consider the directions of these entities. We thus require 3D lines and planes in different directions to constrain all three dimensions if we cannot robustly track 3D points. However, To avoid an imbalance due to a relatively large number of points, we limit the number of points for registration by dividing the point cloud into cells and only using a small number of points per cell. The numbers of lines and planes tend to be low, and thus are not restricted.</p><p>in some parts of an environment there may not be enough information to estimate 3D lines and planes in different directions. In such cases, we can always fall back on dead reckoning with the IMU until the necessary conditions are met again, at the expense of some drift. For future work, we would like to implement our framework on an autonomous UAV and further demonstrate its usefulness in texture-poor environments.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>The IMU and camera are integrated in the Intel RealSense D435i, but we did not have access to that model.</p></note>
		</body>
		</text>
</TEI>
