<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Learning to Control an Unstable System with One Minute of Data: Leveraging Gaussian Process Differentiation in Predictive Control</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10329341</idno>
					<idno type="doi">10.1109/IROS51168.2021.9636786</idno>
					<title level='j'>Proceedings of the  IEEERSJ International Conference on Intelligent Robots and Systems</title>
<idno>2153-0858</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>IDJ Rodriguez</author><author>U Rosolia</author><author>AD Ames</author><author>YS Yue</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[We present a straightforward and efficient way to control unstable robotic systems using an estimated dynamics model. Specifically, we show how to exploit the differentiability of Gaussian Processes to create a state-dependent linearized approximation of the true continuous dynamics that can be integrated with model predictive control. Our approach is compatible with most Gaussian process approaches for system identification, and can learn an accurate model using modest amounts of training data. We validate our approach by learning the dynamics of an unstable system such as a segway with a 7-D state space and 2-D input space (using only one minute of data), and we show that the resulting controller is robust to unmodelled dynamics and disturbances, while state-of-the-art control methods based on nominal models can fail under small perturbations. Code is open sourced at https://github.com/learning-and-control/core.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>System identification is frequently used in robotics to mitigate model imperfections using measured input-output data <ref type="bibr">[1]</ref>- <ref type="bibr">[5]</ref>. Managing these modeling errors can be critical to achieving desired performance or guaranteeing safety. This problem is particularly challenging in systems with unstable dynamics since even small modeling errors can integrate over time without control inputs that can directly dampen them. For instance, we experimentally show that running state-ofthe-art model predictive control <ref type="bibr">[6]</ref> on a 7-D state space and 2-D input space segway system using a misidentified model can lead to unsafe and unstable behavior, as depicted in Figure <ref type="figure">1</ref>.</p><p>A useful system identification framework must balance computation time, accuracy and data efficiency. Furthermore, since data often cannot be collected for the entire state space of a real system, an estimate of model uncertainty is also useful to plan around gaps in the knowledge of the learned model. Because of these challenges, much of contemporary research has focused on learning residuals of an already welldeveloped nominal dynamics model <ref type="bibr">[7]</ref>- <ref type="bibr">[19]</ref>.</p><p>In this paper, we aim to learn the full dynamics models of unstable robotic systems. Our goal is to develop a straightforward and data efficient method for system identification that can be easily integrated with state-of-the-art control methods. We ground our approach in Gaussian processes (GPs), which are a popular method for learning dynamics models <ref type="bibr">[9]</ref>, <ref type="bibr">[11]</ref>- <ref type="bibr">[13]</ref>, <ref type="bibr">[18]</ref>, <ref type="bibr">[20]</ref>- <ref type="bibr">[23]</ref>. We leverage the differentiability of GPs <ref type="bibr">[24]</ref>, <ref type="bibr">[25]</ref> to train a discrete-time dynamics model from training data of the form ( x t , u t , x t+&#916;t ), while still recovering a state-dependent linearization of the dynamics that exploits the underlying continuous dynamics structure.</p><p>Learning a discrete-time linearizable dynamics has three key benefits. First, the approach can be very data efficient, as the differentiated GP model automatically infers the statedependent linearization at every state. This differs from other approaches where the continuous dynamics model is learned directly and then used with collocation for approximation <ref type="bibr">[26]</ref>. As shown in <ref type="bibr">[27]</ref>, learning the continuous dynamics rather than the discrete flow-map often requires higher sampling frequencies where measurement noise can become significant in practice. Second, one can use the estimated model with state-of-the-art model predictive control (MPC) methods <ref type="bibr">[6]</ref> for effective and computationally efficient control synthesis that can handle state and input constraints.</p><p>The final benefit is that the approach is generic and can be applied to many GP-based modeling approaches as a drop-in subroutine.</p><p>The idea of using GPs for MPC is not new, but prior work either required using computationally expensive procedures <ref type="bibr">[23]</ref>, or limited themselves to learning only residual models <ref type="bibr">[11]</ref>- <ref type="bibr">[13]</ref>, <ref type="bibr">[18]</ref>. Other prior work use of GPs with Dynamic Programming frameworks that do not take into account state and input constraints <ref type="bibr">[28]</ref>. Also, unlike reinforcement learning approaches that use Policy Gradient <ref type="bibr">[29]</ref> or Value Iteration <ref type="bibr">[30]</ref> with GPs, this method uses MPC to infer a policy. Learning the dynamics in MPC separates the policy from the dynamics and allows for using the same dynamics for different objectives.</p><p>We validate our approach by controlling unstable robotic systems, both in simulation and on a segway with 7-D state space and 2-D input space. Whereas state-of-the-art control methods can fail to stabilize the segway under small model mismatch, we show that one can robustly stabilize using our model trained on only one minute of data (see Figure <ref type="figure">1 above</ref>). These results showcase the practical potential of our approach to significantly reduce the effort required for accurate system identification in unstable robotic systems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. A DIFFERENTIATION-BASED GAUSSIAN PROCESS MODEL FOR DYNAMICS ESTIMATION</head><p>In this section, we outline our approach for system identification using a differentiation-based Gaussian process model. We first describe and motivate learning a state-dependent linearized model in Section II-A. We then discuss Gaussian process preliminaries in Section II-B, and how to differentiate a GP to obtain a state-dependent linearized dynamics model in Section II-C.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. State-Dependent Linearized Dynamics Modeling</head><p>We consider the following dynamical system:</p><p>, and where f is unknown (not even the functional form). The above system is subject to the the following state and input constraints:</p><p>Our goal is to design a control policy &#960; : R n &#8594; R d which maps states to actions. In order to compute such a policy we will use historical data to estimate the full system dynamics f , which will be leveraged in a predictive control scheme. We are particularly interested in systems that are passively unstable (e.g., a segway).</p><p>We assume access to a dataset of M state-input pairs</p><p>, where T is the sampling period. Visually, on the pendulum, this would correspond to the left-most plot in Figure <ref type="figure">2</ref>. Furthermore, we assume the control action is applied using a sampling-and-hold strategy meaning that:</p><p>iT where the noise w a zero-mean Gaussian, i.e, w N(0, &#963; 2</p><p>).</p><p>Rather than performing motion planning directly on these discrete-time dynamics (as is common in the literature <ref type="bibr">[20]</ref>, <ref type="bibr">[31]</ref>), we instead use (3) and a state-dependent linear approximation of the dynamics around the state-input pair (x&#175;, u&#175;):</p><p>This linearization of the dynamics can be related to a linearization of the discrete flow map in (3) as follows:</p><p>x &#710;(t + Although these matrices alone are sufficient for use in our MPC controller, through the use of matrix logarithms, the local linear approximation of the dynamics can be computed:</p><p>As we shall see in Section III, having a state-dependent linearization is crucial for efficient integration with predictive control. In general, computing a state-dependent linearization with GPs in real-time can be challenging which is why most prior work resorts to approximating the GP using inducing inputs <ref type="bibr">[31]</ref>, <ref type="bibr">[32]</ref>, a time-varying state-input independent model <ref type="bibr">[13]</ref> or learning the residual <ref type="bibr">[11]</ref>- <ref type="bibr">[13]</ref>. We will show in Section Section II-C how to solve for the matrices of the state-dependent linear dynamical approximation, (A, B, C), by taking derivatives of a Gaussian process dynamics model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Gaussian Process Preliminaries</head><p>A Gaussian process (GP) is the defined by a mean function &#181;(s) and positive semidefinite covariance function k(s, s ' ).</p><p>In this work we will primarily use two kernels: the Radial Basis Function (RBF) Kernel:  and the Periodic Kernel:</p><p>where &#963;, &#969; and ` are tunable parameters. We will also construct composite kernels by exploiting the fact that the product of kernels is a valid kernel, in order to encode geometric properties of the dynamical system. In particular,</p><p>f (x(&#964;), u(&#964;))d&#964;.</p><p>(</p><p>Zt t+&#948;t Fig. <ref type="figure">2</ref>: System identification results for a simulated pendulum. Left-to-Right: The dataset collected (selecting 30 initial points uniformly at random and integrating forward for 0.01s); Phase plot of estimated dynamics with dataset overlaid; Point-wise error between true and estimated dynamics. Phase plots are computed on a 100 &#215; 100 grid. We see that the error is small and captures the behavior of the system even in regions with few data points.</p><p>the Periodic Kernel is useful for modeling angular coordinates, whereas the RBF kernel is more suitable for Euclidean coordinates.</p><p>Samples of a GP take the form:</p><p>where function samples approximate the integral of the dynamics as follows:</p><p>Multi-dimensional outputs are predicted with an independent GP for each output. For a test input s = x &#732; u &#732;T , the mean and variance are computed as:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Differentiating a Gaussian Process</head><p>From <ref type="bibr">[24]</ref>, <ref type="bibr">[25]</ref>, we know that the derivative of a GP is another GP. For the following derivation it is sufficient for the kernel function to be differentiable with respect to the both of its parameters (which is true for both the RBF and Periodic kernels). For an input s we define the derivative of a GP as follows:</p><p>where h ' : R n+m &#8594; R n+m is the gradient of sampled function h. We derive the mean of the GP derivative as follows: <ref type="bibr">(12)</ref> This GP in ( <ref type="formula">9</ref>) is related to the linear approximation in (5) as follows:</p><p>These approximates are derived by noting that the GP approximates the integral in (3) which concludes the derivation. Graphically, this relationship can be seen in the center plot of Figure <ref type="figure">2</ref> where the data points are overlapped with the state-dependent continuous linear approximation we computed using Equation <ref type="bibr">(6)</ref>. Note that since we are training on M, a dataset of state input pairs, we are still learning the discrete time flow map shown in <ref type="bibr">(3)</ref>. A key aspect of our contribution is to re-interpret the learned dynamics in the context of (1) to directly infer a local linear approximation that is amenable to MPC. This allows our method to be retroactively applied to previous GP-based modeling work that learns a discrete transition model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. CONTROL DESIGN</head><p>We now describe our control strategy, which is based on state-of-the-art methods for model predictive control (MPC) <ref type="bibr">[6]</ref>. This approach works generically with the statedependent linearized model described in <ref type="bibr">(5)</ref>. First, we introduce a Finite Time Optimal Control Problem (FTOCP) which is based on a simplified Affine Time-Varying (ATV) model. We then present the proposed algorithm that at each time t solves an FTOCP where the ATV model is updated leveraging the state-dependent linearized model from <ref type="bibr">(5)</ref>. Our algorithm applies the first action of the planned trajectory to the system and the entire process is repeated at the next time step t + 1, yielding to a receding horizon strategy also referred to as model predictive control. ( <ref type="formula">9</ref>)</p><p>At time t and for system's state x( t ), we define the following finite time optimal control problem (FTOCP):</p><p>x k|t</p><p>where </p><p>which minimize the predicted cost while satisfying state and input constraints from (2). When the above optimal predicted trajectory is computed at time t, we have that x k|t denotes the predicted state of at time k. This notation will be useful later on when we are going to differentiate between the optimal state x k|t at time k predicted at time t and the optimal state x k|t+1 at time k predicted at time t + 1. In what follows, we use the optimal state-input sequences <ref type="bibr">(15)</ref> to synthesize a control policy for the dynamical system (1).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Policy Synthesis</head><p>This section describes the control synthesis strategy. At each time t, we solve the FTOCP <ref type="bibr">(14)</ref>, where the time Algorithm 1 Control Policy 1: Init Parameters:</p><p>)) 2: Input: x t 3: if t &gt; 0 then</p><p>. Update Candidate Trajectory 4:</p><p>Set u &#175;t+N-1|t = u t+N-2|t-1 8: end if 9: for k {t, ... , t + N -1 } do . Update Model 10:</p><p>Set A k = A( x &#175;k|t , u &#175;k|t ) from expectation of (13) 11:</p><p>Set B k = B( x &#175;k|t , u &#175;k|t ) from expectation of (13) 12:</p><p>Set </p><p>)J</p><p>At time t = 0, we initialize the candidate trajectory with an initial guess and afterwards we update the candidate solution using the optimal trajectory from (15), as shown in Algorithm 1. In particular, in Algorithm 1 we update the candidate trajectory by shifting the optimal solution computed as the previous time time (Lines 3-8). Afterwards, we update ATV matrices used to define the FTOCP problem <ref type="bibr">(14)</ref>. Finally, we solve problem ( <ref type="formula">14</ref>) and we store the optimal state-input trajectories. The strategy described in Algorithm 1 is repeated at each time t based on the new measurement x t .</p><p>It is clear that the prediction model defined by the ATV</p><p>k=t plays a crucial role in determining the success of the MPC. If the prediction model is inaccurate, then the closed-loop system will deviate from the planned trajectory. This deviation may result in poor closedloop performance and safety constraint violation. We validate this point in our experiments showing that controlling using an inaccurate model can be unsafe, thus highlighting the need to quickly learn accurate dynamics models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. SIMULATION RESULTS</head><p>In this section, we validate our approach in simulation on two unstable systems: the inverted pendulum and the segway. The goal of this evaluation is to provide both an intuition as well as a demonstration of the theoretical limits for our approach. Specifically, we aim to address:</p><p>&#8226; Can we learn an accurate dynamics model with few training examples? &#8226; Can we integrate our dynamics model with control for steering and motion planning?</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Pendulum Simulation</head><p>We first test on a continuous time inverted pendulum. The state of the system is x = [&#952;, &#952; ] with torque input u. The system has a point mass of 0.25kg and a length of 0.5m.</p><p>1) Data Collection &amp; System Identification: First, we estimate the discrete flow map for the unactuated system using 37 samples collected uniformly at random so that &#952;( t i ) U( -&#960;, &#960; ) and &#952;( t i ) U( -5, 5). We capture the (a) (b) Fig. <ref type="figure">3</ref>: Figure <ref type="figure">3a</ref> shows the closed-loop trajectory of the pendulum. MPC with both the true model and the GP produce near identical trajectories.</p><p>Figure <ref type="figure">3b</ref> shows the one-step prediction error of the MPC policy for both models. Fig. <ref type="figure">4</ref>: Control results on simulated segway. Figure <ref type="figure">4a</ref>, we plot the position of the segway as it reaches a sequence of target goals. Figure <ref type="figure">4b</ref> shows : the angle of the segway with respect to the upright position. Notice that the segway must deviate from its equilibrium in order to accelerate forwards or backwards. Figure <ref type="figure">4c</ref> shows the one-step prediction error of the MPC policy using GP dynamics.</p><p>geometric structure of the pendulum's sate-space by using the following kernel:</p><p>), ( <ref type="formula">16</ref>)</p><p>Next, we use a local linear approximation to compute a point-wise estimate of the continuous dynamics as shown in <ref type="bibr">(13)</ref>. The entire dataset of state transitions is shown in the left plot of Figure <ref type="figure">2</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2) Evaluating Model Accuracy:</head><p>We divide the state-space of the pendulum into a 100 x 100 grid. For each point in the grid we compute the true value of x &#729; using ( <ref type="formula">1</ref>) and an estimated value using the linear approximation in (4) where A, B and C are computed using the derivative of the GP as in <ref type="bibr">(13)</ref>. The phase plot in Figure <ref type="figure">2</ref> correspond to the direction of the estimated &#7819;. For each point in the grid we can compute the 2-norm of the error which is shown in the heat map on the right. Overall, we can see that our approach can recover an accurate state-dependent linearization of the true continuous time dynamics.</p><p>3) Evaluating Motion Planning: Next, we consider the task of steering the pendulum to the upright position starting from x(0) = (-&#960;, 0) with the input constraint set U = {u | -.6 &lt; u &lt; .6}. Notice that with these constraints the pendulum is unable to reach the goal in a single swing. For this task, we collect a new dataset of 34 uniformly distributed samples where states are sampled as before and u(t i ) U(-.6, .6). Finally, we run the control policy from Algorithm 1. Both our strategy and an MPC using true dynamics are able to swing up the pendulum reaching the unstable equilibrium state as shown in Figure <ref type="figure">3a</ref>.</p><p>To evaluate the performance of the GP model for motion planning, we compute the difference between the first state predicted by the MPC policy and the actual state observed by the system. Figure <ref type="figure">3b</ref> shows the errors for the true model and the learned model at each time step of the pendulum's swingup. We see that throughout the pendulum swing up the MPC model has a higher prediction error than the GP. Towards the end, as the pendulum stabilizes, the error of the MPC policy with the true model falls to 0 while the GP controller maintains a low but stable error. We expect the MPC with continuous time dynamics to have a slightly higher error than the GP as the continuous time dynamics are linearized and discretized to compute the predicted trajectory, while GP provides an estimate of the discrete-time map which is used to compute the next step.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. High-Fidelity Segway Simulation</head><p>We next evaluate our strategy on a high-fidelity simulated segway based on the 6-D state space and 2-D input space system shown in Figure <ref type="figure">5</ref>. The state of a segway is x = [X, Y, &#952;, v, &#952;, &#968;, &#968;], where (X, Y ) represents the position of the center of mass, (&#952;, &#952;) the heading angle and yaw rate, v the velocity and (&#968;, &#968;) the rod's angle and angular velocity. The control input u = [T l , T r ], where T l and T r are the torques to the left and right wheel motors, respectively. For all experiments, we limit |T l | &lt; 6 and |T r | &lt; 6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1) Data Collection &amp; System Identification:</head><p>Recall that we are learning the mapping shown in <ref type="bibr">(5)</ref>. As a prior, we know that X and Y have no effect on the dynamics of the system. Therefore we only need to learn a mapping from the state excluding X and Y at the current time step to the change in state at the next time step. We encode the property of &#952; being an angle via the following kernel:</p><p>where s = [v, &#952;,</p><p>. Although &#968; is also an angle, since the system cannot rotate about that axis without catastrophic failure, we use a regular RBF kernel for it.</p><p>In simulation, we record the segway performing a task consisting of 1000 state-transitions at a frequency T = 0.05 which is approximately one minute of data. We then find 180 clusters using a hierarchical clustering algorithm and select the nearest neighbor for each cluster as the data-point for the training set. We test the ability of the our strategy to perform the same task that was used to collect the data but with the GP dynamics model.</p><p>2) Evaluating Motion Planning: Figure <ref type="figure">4a</ref> shows the path that the segway takes while reaching the targets. Once the segway is within 1m of a goal the next one is provided. Notice that peaks and troughs in Figure <ref type="figure">4b</ref> correspond to moments of forwards and backwards acceleration (since the segway must tilt to move forward). Those same moments of high acceleration also match with the peaks of high one-step prediction error observed in Figure <ref type="figure">4c</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. EXPERIMENTAL RESULTS ON SEGWAY</head><p>We finally evaluate our approach experimentally on the segway system (see Figure <ref type="figure">5</ref>). The state representation is the same as in Section IV-B.</p><p>We aim to demonstrate that: &#8226; Our method can control a physical open-loop unstable system to perform a simple move-forward task. &#8226; Our method is able to overcome perturbations with unmodelled dynamics in a physical open-loop unstable system where a state-of-the art MPC controller fails.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1) Data Collection &amp; System Identification:</head><p>We record the trajectory of the segway performing a task that takes 1000 measurements recorded every T = 0.05s to complete. The data is then preprocessed using the same procedure as for the simulated system. We evaluate on two tasks: the first is a standard moving forward task while staying upright, and the second is stabilizing task under a external force disturbance. We note that since this a real system an estimation is performed online with on-board sensors there is significant estimation error as well. Although our method is capable of running at 20Hz with up to 300 data-points, we required less data than this for the experiments.</p><p>2) Simple Move-Forward Task: We start by considering the move-forward task. For this we only require 130 data points. As can be observed in Figure <ref type="figure">7</ref>, the system is able to stabilize at a point, move forward and stabilize close to the new location with some minor oscillations around the target point, as highlighted in Figure <ref type="figure">6</ref>. The first and second peaks in Figure <ref type="figure">6b</ref> correspond to the acceleration and deceleration respectively. Notice that due to a combination of modeling and estimation estimation error, the segway balances slightly off the equilibrium point. Finally, from Figure <ref type="figure">6c</ref> we see that the model error spikes at the moments of high acceleration.</p><p>3) Robustness To Perturbations: We now evaluate the performance of the learned model under perturbations. To test the robustness of the learned model, we start by collecting 100 data points from a dataset of the MPC policy with nominal dynamics completing the task with an unmodelled weight of 2kg. Although this results in slightly different behavior, the amount of data collected is the same as in previous experiments. Next, we attach 4kg of unmodelled weight as shown in Figure <ref type="figure">1</ref>. Notice that the weight is not perfectly centered and that it is allowed to sway back and forth from its point of contact.</p><p>In Figure <ref type="figure">8</ref> and Figure <ref type="figure">9</ref>, we can see the result of applying force perpendicular to the axis of the wheels to the MPC policy with the nominal and GP dynamics, respectively. Both controllers have a spike in input following each disturbance, and in both cases the control action is saturated. Notice that because of the symmetry of the nominal model, the MPC policy applies the same force on each input as shown in Figure <ref type="figure">9a</ref>. Meanwhile the learned dynamics captures some of the asymmetry resulting from the weights which causes uneven outputs and a more robust system. In Figure <ref type="figure">8b</ref> and Figure <ref type="figure">9b</ref>, we can see that even though the initial disturbances are of similar magnitude, the controller with nominal dynamics exhibits much larger oscillations and falls after the third perturbation. Although both models have sharp increases in one-step prediction error after a disturbance, the MPC model reaches much higher one-step prediction errors than with the GP, as shown in Figure <ref type="figure">8c</ref> and Figure <ref type="figure">9c</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. DISCUSSION AND FUTURE WORK</head><p>We presented a methodology for full dynamics learning that has been validated on an open-loop unstable robotic system. Using one minute of highly correlated data we are able estimate an accurate enough model for motion planning that is resilient to perturbations. Furthermore, the results presented in this paper are for a worst-case scenario where no useful prior is provided. Finally, our approach is generic and can be applied to many Gaussian process modeling approaches as a drop-in subroutine.</p><p>There are many directions for future work. A natural one is to study even higher-dimensional systems where one would likely need to combine learning with a prior nominal model. Another direction is dealing with noise in the state estimation as well as delays, which are significant issues for unstable dynamical systems. Using techniques to correct noisy estimation data would significantly improve the performance of our method in real systems. This would be true when dealing with outlying measurements that strongly violate the Gaussian assumption implicit in the GP. One could also consider integration with perceptual systems <ref type="bibr">[33]</ref>.</p><p>Another direction for future work is how to intelligently and autonomously collect training data. A relevant line of work here is the area of safe exploration <ref type="bibr">[9]</ref>, <ref type="bibr">[34]</ref>- <ref type="bibr">[38]</ref>. It is also important to understand the fundamental limits of how much data we need to learn a reliable model, a concept known as sample complexity in the machine learning literature <ref type="bibr">[39]</ref>.</p><p>This work exploited the differentiability of Gaussian processes, but largely ignored the uncertainty quantification aspect. In cases where there are more complex constraints to be satisfied, such as reachability <ref type="bibr">[40]</ref> or chance constraints <ref type="bibr">[35]</ref>, it would be interesting to develop a more holistic framework that reasons about uncertainty quantication in differentiated GP models.</p><p>A final direction for future work is scalability. For more complex systems, it would be beneficial to collected and store more data points for estimating the GP model. However, it is known that the computational complexity of GP inference can scale poorly with the amount of training data. Leveraging various methods for scaling up GP training and inference could be beneficial <ref type="bibr">[41]</ref>, <ref type="bibr">[42]</ref>. In Figure <ref type="figure">6a</ref> we plot the physical position of the segway. In Figure <ref type="figure">6b</ref> we see : the angle between segway's pole and the upright position. The two peaks correspond to the segway accelerating and decelerating. Figure <ref type="figure">6c</ref> shows the MPC policy's one-step prediction error using the GP dynamics. Fig. <ref type="figure">8</ref>: The MPC policy with GP dynamics responding to 5 perturbations. The segway remains stable after each disturbance. Figure <ref type="figure">8a</ref> shows the inputs spiking after each disturbance. The difference between inputs suggests the learned dynamics model captures asymmetries induced by the placement and sway of the unmodelled weights. Figure <ref type="figure">9b</ref> plots the angle of the segway with respect to the ground. Figure <ref type="figure">9c</ref> shows the MPC's one step prediction error remains low through all disturbances.</p><p>(a) (b) (c) Fig. <ref type="figure">9</ref>: The MPC policy with nominal dynamics responding three perturbations. The segway remains stable after the first disturbance, oscillates before stabilizing for the second disturbance and falls down after the third disturbance. Figure <ref type="figure">8a</ref> shows the input spike on each input after the disturbances. Notice that for all three perturbations, the both motors act in unison to stabilize the system. Figure <ref type="figure">9b</ref> shows the angle of the segway with respect to the ground. Figure <ref type="figure">9c</ref> shows the MPC's one-step prediction error using the nominal dynamics. Notice that the oscillations cause the weight to swing which magnify the prediction error.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>This work was supported by NSF awards1637598, 1645832, 1932091,  1924526, and 1923239, and funding from AeroVironment, JPL and BMW.1 I. D. Jimenez Rodriguez, U. Rosolia, A. D. Ames and Yisong Yue are at the California Institute of Technology, Pasadena, USA. E-mails: {ivan.jimenez, urosolia, ames, yyue}@caltech.edu .</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1"><p>978-1-6654-1714-3/21/$31.00 &#169;2021 IEEE</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_2"><p>Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on May 31,2022 at 23:36:06 UTC from IEEE Xplore. Restrictions apply.</p></note>
		</body>
		</text>
</TEI>
