<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Lyapunov-Based Real-Time and Iterative Adjustment of Deep Neural Networks</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>01/01/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10355278</idno>
					<idno type="doi">10.1109/LCSYS.2021.3055454</idno>
					<title level='j'>IEEE Control Systems Letters</title>
<idno>2475-1456</idno>
<biblScope unit="volume">6</biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Runhan Sun</author><author>Max L. Greene</author><author>Duc M. Le</author><author>Zachary I. Bell</author><author>Girish Chowdhary</author><author>Warren E. Dixon</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[A real-time Deep Neural Network (DNN) adaptive control architecture is developed for general uncertain nonlinear dynamical systems to track a desired timevarying trajectory. A Lyapunov-based method is leveraged to develop adaptation laws for the output-layer weights of a DNN model in real-time while a data-driven supervised learning algorithm is used to update the inner-layer weights of the DNN. Specifically, the output-layer weights of the DNN are estimated using an unsupervised learning algorithm to provide responsiveness and guaranteed tracking performance with real-time feedback. The inner-layer weights of the DNN are trained with collected data sets to increase performance, and the adaptation laws are updated once a sufficient amount of data is collected. Building on the results in (Joshi and Chowdhary, 2019) and (Joshi et al., 2020), which focus on deep model reference adaptive control for linear systems with known drift dynamics and control effectiveness matrices, this letter considers general control-affine uncertain nonlinear systems. The real-time controller and adaptation laws enable the system to track a desired time-varying trajectory while compensating for the unknown drift dynamics and parameter uncertainties in the control effectiveness. A nonsmooth Lyapunov-based analysis is used to prove semi-global asymptotic tracking of the desired trajectory. Numerical simulation examples are included to validate the results, and the Levenberg-Marquardt algorithm is used to train the weights of the DNN.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>N EURAL Networks (NNs) have been used in results such as <ref type="bibr">[1]</ref> to approximate the receding-horizon (RH) regulator in RH optimal control. More recently, results in <ref type="bibr">[2]</ref> and <ref type="bibr">[3]</ref> leverage Deep Neural Networks (DNNs) to approximate the MPC laws. However, the NN approximations in <ref type="bibr">[1]</ref>- <ref type="bibr">[4]</ref> can only provide statistical guarantees of the approximation error. DNN methods can also be used to capture complex features of the dynamics by using back-propagation algorithms that indicate how to update the inner-layer weights <ref type="bibr">[5]</ref>. In results such as <ref type="bibr">[5]</ref> and <ref type="bibr">[6]</ref>, the emergence of DNN models with more complex structures improve function approximation performance. Although DNN function approximation methods show improved performance empirically, these methods typically lack performance guarantees because the accuracy of the outputs are probabilistic. As a result, DNN-based methods may have limited adoption for safety-critical applications.</p><p>Motivated to ensure performance guarantees, early works in <ref type="bibr">[7]</ref>- <ref type="bibr">[10]</ref> use Lyapunov-based methods for NN-based adaptive control of unknown nonlinear systems. In <ref type="bibr">[7]</ref>- <ref type="bibr">[9]</ref>, NNs are trained with a gradient descent-based adaptive update law and used as a feedforward control term. Since the update laws are derived from a stability analysis and the NN weights are embedded inside activation functions, it is challenging to derive adaptation laws from a stability analysis beyond a single-hidden-layer.</p><p>In <ref type="bibr">[11]</ref> and <ref type="bibr">[12]</ref>, the authors developed a data-driven adaptive learning method called concurrent learning to increase performance of parameter estimation. Concurrent learning leverages recorded input and output data concurrent to realtime execution to apply batch-like updates to adaptive update laws, and has been extended to works in <ref type="bibr">[13]</ref> and <ref type="bibr">[14]</ref>. Results in <ref type="bibr">[15]</ref>, <ref type="bibr">[16]</ref>, and <ref type="bibr">[17]</ref> leverage concurrent learning to develop a deep Model Reference Adaptive Control (D-MRAC). Specifically, a gradient descent-based adaptive update law is used to estimate the ideal output-layer weights of a DNN in real-time online, and an offline data-driven method is used to apply batch updates to the inner-layer weights of the DNN for linear systems with known system matrices. The methods were tested on quadrotors and demonstrated that DNNbased adaptive control can significantly improve learning performance <ref type="bibr">[16]</ref>, <ref type="bibr">[17]</ref>. The authors demonstrated that DNN enabled MRAC outperforms shallow NN MRAC, and also showed that the DNN weights cluster in different regions in different operating envelopes of the quadrotor, clearly establishing the learning performance of DNNs <ref type="bibr">[17]</ref>. However, the D-MRAC development is specific to linear systems with known system A, B matrices with matched system uncertainty (x(t)), i.e., &#7819;(t) = Ax(t) + B(u(t) + (x(t))).</p><p>Building on the output-layer weight adjustment strategy in <ref type="bibr">[15]</ref> and <ref type="bibr">[16]</ref>, this letter develops new control design and stability analysis methods for general uncertain nonlinear systems. A Lyapunov-based adaptive control law is developed to estimate the unknown output-layer weights of the DNN using real-time state feedback. Concurrent to real-time execution, data is collected and an offline function approximation method is used to update the estimates of the inner-layer DNN weights. Moreover, this letter considers control-affine dynamics with uncertain state-dependent control effectiveness matrices. To compensate for the uncertain control effectiveness, a novel adaptive update law is developed that has internal feedback. Specifically, the adaptive update law depends on the control input, and hence, is a function of both the input uncertainty estimates and the DNN weight estimates. To account for switching from iterative updates of the DNN weights, a nonsmooth Lyapunov-based analysis is performed to ensure asymptotic tracking of the desired trajectory. The proposed DNN architecture is shown in Figure <ref type="figure">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. SYSTEM DYNAMICS</head><p>Consider a control-affine nonlinear dynamic system modeled as</p><p>where x : [t 0 , &#8734;) &#8594; R n denotes the generalized state, t 0 &#8712; R &#8805;0 denotes the initial time, f : R n &#8594; R n denotes the unknown drift dynamics, g : R n &#8594; R n&#215;m denotes the uncertain control effectiveness matrix, and u : [t 0 , &#8734;) &#8594; R m denotes the control input. To facilitate the control development, the following assumption is made. Assumption 1: The product of the uncertain control effectiveness matrix and control input can be linearly parameterized as</p><p>where Y : R n &#215; R m &#215; [t 0 , &#8734;) &#8594; R n&#215;q denotes a measurable regression matrix, and &#952; &#8712; R q denotes a vector of constant unknown parameters.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. DNN APPROXIMATION AND UPDATE POLICY</head><p>Let &#8834; R n be a compact simply connected set with x(t) &#8712; , and define S n ( ) as the space where f (x(t)) is continuous. There exists ideal weights, ideal basis functions, and an ideal pre-trained DNN such that the drift dynamics f (x(t)) &#8712; S n ( ) can be represented as <ref type="bibr">[18]</ref> f (x(t)) = W * T &#963; * * (x(t)) + &#949;(x(t)), <ref type="bibr">(3)</ref> where W * &#8712; R L&#215;n is an unknown bounded ideal output-layer weight matrix, &#963; * : R p &#8594; R L is an unknown bounded vector of the ideal activation functions, * :R n &#8594; R p is the ideal unknown DNN, &#949; : R n &#8594; R n is the bounded unknown function reconstruction error associated with the ideal weights, activation functions, and DNN. The ideal unknown DNN * can be expressed as</p><p>where k &#8712; Z denotes the number of inner-layers of the DNN, the symbol &#8226; denotes function composition, W and &#966;(&#8226;) denote the corresponding inner-layer weights and activation functions of the DNN, respectively.</p><p>The DNN is updated using a multiple timescale approach. The DNN is trained a priori using data sets collected from previous experiments, simulation data, etc. Ideally, large data sets from the same dynamic system operating under the same environmental conditions will be available for training the DNN. However, the developed strategy of real-time (Lyapunov-based) adjustment of the output-layer weights provides an advantage of significant flexibility in the training data. For example, as observed in the subsequent simulation, the training data could be from a dynamic system with different parameters (i.e., transfer learning), or could also be initialized with random weights.</p><p>Based on (3), the DNN approximation of the drift dynamics fi : R n &#8594; R n can be represented as fi (x(t)) = W T (t) &#963;i ( i (x(t))), where W :[t 0 , &#8734;) &#8594; R L&#215;n is the estimate of the ideal output-layer weight matrix, &#963;i : R p &#8594; R L and i : R n &#8594; R p are the i th activation functions and estimates of &#963; * and * , respectively, and i &#8712; N is the DNN estimate update index. <ref type="foot">1</ref> The mismatch between the ideal output-layer weights and the weight estimates W: [t 0 , &#8734;) &#8594; R L&#215;n is defined as</p><p>Assumption 2: Using the universal function approximation property there exists known constants W * , &#963; * , &#963; , &#949; &#8712; R &gt;0 such that the unknown ideal weights W * , unknown ideal activation functions &#963; * (&#8226;), user-selected activation functions &#963;i (&#8226;), the unknown ideal DNN * (&#8226;), and the function reconstruction error &#949;(&#8226;) can be upper bounded such that sup</p><p>A priori training provides 1 (&#8226;) and W(t 0 ). The offline DNN training can be achieved by using different techniques. For example, <ref type="bibr">[15]</ref> and <ref type="bibr">[16]</ref> use a Stochastic Gradient Descent (SGD) based generative network architecture to generate estimates of matched system uncertainty, and the SGD update policy depends on a stochastic estimation of the expected value of the gradient of the loss function over a training set. When the offline DNN training phase is completed, an adaptive control law will be implemented for the system described in <ref type="bibr">(1)</ref> to generate the output-layer DNN weight estimates, i.e., W(t) for all t &#8805; t 0 . Simultaneous to the online execution, data is collected and offline function approximation methods are used to update estimates on the inner-layer DNN weights.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. CONTROL DESIGN</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Control Objective</head><p>The control objective is to ensure the trajectory of the system in (1) tracks a desired sufficiently smooth time-varying trajectory x d : [t 0 , &#8734;) &#8594; R n . To quantify the tracking objective, a tracking error e : [t 0 , &#8734;) &#8594; R n is defined as</p><p>(5)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Control Development</head><p>To facilitate the subsequent control development, the product of the estimated control effectiveness matrix and the control input can be written as</p><p>where &#285; : R n &#8594; R n&#215;m denotes the estimate of the control effectiveness matrix. The parameter estimation error &#952; : [t 0 , &#8734;) &#8594; R q is defined as</p><p>where &#952; : [t 0 , &#8734;) &#8594; R q denotes the parameter estimate. Assumption 3: The estimate of the control effectiveness matrix &#285; is a full-row rank matrix for t &#8805; t 0 , and the right pseudo inverse of &#285;(&#8226;) is denoted by &#285;+ :R n &#8594; R m&#215;n , where</p><p>Based on the subsequent stability analysis, the control input is designed as</p><p>where k, k s &#8712; R &gt;0 are constant control gains, and sgn(&#8226;) denotes the signum function. The weight estimate adaptation law is designed as</p><p>where W &#8712; R L&#215;L denotes a user-defined positive definite, diagonal control gain matrix. The adaptation law for the parameter estimate in ( <ref type="formula">6</ref>) is designed as</p><p>where &#952; &#8712; R q&#215;q denotes a user-defined positive definite, diagonal control gain matrix. Taking the time-derivative of ( <ref type="formula">5</ref>)</p><p>and substituting in ( <ref type="formula">1</ref>)-( <ref type="formula">3</ref>) and ( <ref type="formula">6</ref>)-( <ref type="formula">8</ref>) yields the closed-loop error system</p><p>Recall the initially trained DNN provides initial estimates 1 (&#8226;) and W(t 0 ). During implementation of the real-time controller, the output-layer weights of the DNN are estimated online. Concurrently, the data generated in real-time is stored for additional DNN training. Once a sufficient amount of data (user-defined) is collected to improve the function approximation performance, the inner-layer weights of the DNN will be updated to generate i+1 (&#8226;) for all i, i.e., ( <ref type="formula">8</ref>)- <ref type="bibr">(10)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. STABILITY ANALYSIS</head><p>The stability of the DNN-based adaptive tracking controller is established in the following theorem.</p><p>Theorem 1: Consider a general nonlinear system modeled by the dynamics in (1) with x(t 0 ) &#8712; and satisfying Assumptions 1-3. The control input in <ref type="bibr">(8)</ref>, the output-layer weight adaptation law in <ref type="bibr">(9)</ref>, and the parameter estimate adaptation law in <ref type="bibr">(10)</ref> ensure the trajectory tracking error defined in (5) yields semi-global asymptotic tracking in the sense that lim t&#8594;&#8734; e(t) &#8594; 0, t &#8805; t 0 , provided the following gain condition is satisfied</p><p>Proof: Consider the candidate Lyapunov-like function</p><p>where z : [t 0 , &#8734;) &#8594; R n(L+1)+q is defined as z [ e T , &#952; T , vec( W) T ] T and vec(&#8226;) denotes the vectorization operator. Let &#950; : [t 0 , &#8734;) &#8594; R n(L+1)+q be a Filippov solution to the differential inclusion &#950; &#8712; K[h](&#950; ), where &#950;(t) = z(t), the calculus of K[ &#8226; ] is used to compute Filippov's differential inclusion as defined in <ref type="bibr">[19]</ref>, and h:R n(L+1)+q &#8594; R n(L+1)+q is defined as</p><p>The time-derivative of V L exists almost everywhere (a.e.), i.e., for almost all t &#8712; [0, &#8734;), VL (&#950; )</p><p>where &#8706;V L (&#950; (t)) denotes the Clarke generalized gradient of</p><p>where &#8711; denotes the gradient operator.</p><p>Taking the generalized time-derivative of ( <ref type="formula">13</ref>), then substituting in the mismatch between the ideal output-layer weight and the weight estimate in (4), the output-layer adaptation law in <ref type="bibr">(9)</ref>, the parameter estimate adaptation law in <ref type="bibr">(10)</ref>, and the closed-loop error system in <ref type="bibr">(11)</ref> yields</p><p>Using the trace operator property, <ref type="foot">3</ref> the estimated mismatch for the ideal output-layer weight in (4), and adding and subtracting e T W * T K[ &#963;i ( i (x))] in ( <ref type="formula">14</ref> </p><p>By satisfying the gain condition described in ( <ref type="formula">12</ref>), ( <ref type="formula">16</ref>) can be further upper bounded as</p><p>Using ( <ref type="formula">13</ref>) and ( <ref type="formula">17</ref>) implies <ref type="formula">4</ref>), ( <ref type="formula">5</ref>) and ( <ref type="formula">7</ref>) implies</p><p>Using Assumptions 2 and 3 implies &#963;i</p><p>Furthermore, by the extension of the LaSalle-Yoshizawa theorem for non-smooth systems in <ref type="bibr">[21]</ref> and <ref type="bibr">[22]</ref>, k e 2 &#8594; 0, which implies e(t) &#8594; 0 as t &#8594; &#8734; and x(t) &#8712; for all t &#8805; t 0 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. SIMULATION</head><p>To demonstrate the effectiveness of the developed method, a simulation is performed on a control-affine realization of a two-state Van der Pol oscillator. The dynamics used in the simulation are</p><p>where The DNN used in this simulation was composed of 4 layers, each with 10, 5, 8, and 2 neurons, respectively. The DNN architecture is illustrated in Figure <ref type="figure">2</ref>. Each layer is linear and the first, second, and third layers have tangent-sigmoid, logarithmic-sigmoid, and tangent-sigmoid activation functions, respectively. The learning rate (i.e., the learning gain parameter used to determine the step size in retraining the DNN weights at each iteration) was fixed as &#951; = 0.001. The mean squared error (MSE) was used as the loss function for training. Each training iteration lasted until the MSE (i.e., the loss) was less than 10 -3 . The Levenberg-Marquardt algorithm was used to train the weights of the DNN. For each DNN training iteration, 70% of the data was used for training, 15% was used for validation, and 15% was used for testing.  To pre-train the DNN, a 600 second simulation of a system with dynamics in <ref type="bibr">(18)</ref> and &#956; = 10 was performed. Training statistics for the offline training are shown in Figure <ref type="figure">3</ref>. The real-time controller and the update laws in ( <ref type="formula">8</ref>)- <ref type="bibr">(10)</ref> are used to update their respective parametric estimates. Concurrent to the real-time controller execution, input-output data is collected to retrain the DNN. As shown in Figures <ref type="figure">4</ref><ref type="figure">5</ref><ref type="figure">6</ref>, the training start time is denoted by the red dashed vertical line and the training completion time is denoted by the black dashed vertical line. The time (and corresponding amount of data) between retraining the inner-layer weights is a user-defined parameter of the simulation. After the prescribed time between retraining elapses, the inner-layer DNN weights begin updating via retraining. In this simulation, the time between retraining is 25 seconds. The first retraining starts at t = 25 seconds and ends at t = 37.4 seconds. The second retraining starts at t = 62.4 seconds and ends at t = 68.3 seconds. While the retraining is in process, the real-time controller and update laws continue uninterrupted as described in ( <ref type="formula">8</ref>)- <ref type="bibr">(10)</ref>. Once the retraining is completed, the new inner-layer DNN weights are updated, overwriting the previous values.  As shown in Figures <ref type="figure">4</ref><ref type="figure">5</ref><ref type="figure">6</ref>, the time for the MSE to be less than 10 -3 took 12.4 seconds to complete. After the DNN has completed retraining, the controller implements the inner-layer weights at t = 37.4 seconds. After implementing the updated DNN weights, new data is collected for another 25 seconds. To further improve the DNN estimate, a second retraining is performed. During the second training iteration, data from the first 25 seconds and the second 25 seconds are both used. The second retraining took 5.9 seconds. The inner-layer weights from the second retraining are implemented at t = 68.3 seconds.</p><p>The tracking error performance in Figure <ref type="figure">4</ref> indicates that discretely retrained DNNs with an online adaptive output-layer weights are a viable method to perform trajectory tracking. The first iteration of the DNN (DNN1) is the offline generated DNN, DNN2 is the model after the first retraining, and DNN3 is the model after the second retraining. As shown by the root mean squared error (RMSE) in Table <ref type="table">I</ref>  T . The decrease in error after each retraining is expected since a larger set of system data was used to train the DNN during each retraining. Figure <ref type="figure">7</ref> shows the phase plot of the system, and compares the performance of the tracking during the application of each DNN. DNN1 has the worst estimate of the system dynamics. DNN2 and DNN3 show significantly better tracking behavior, which is also reflected in Figure <ref type="figure">4</ref>. Figure <ref type="figure">6</ref> presents the control input to the system for the duration of the simulation. DNN1 poorly approximates the dynamics near x = [-3.5, 3.5] T , and this error is further reflected in Figure <ref type="figure">6</ref> with the spikes in control input approximately at t = 2 seconds and t = 8 seconds.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Transfer Learning &amp; Random Weights</head><p>To further demonstrate the flexibility of the developed realtime Lyapunov-based adjustment of the output-layer weights, two additional simulations were performed. In this section,  transfer learning-based and randomly initialized DNN weights simulations were investigated. Transfer learning in this context is applying the learned DNN model of one system to another system. In the simulation, transfer learning is demonstrated by training a DNN model on a dataset of a system described by the dynamics in <ref type="bibr">(18)</ref> with parameter &#956; = 1, whereas the simulated system has parameter &#956; = 10.</p><p>In the transfer learning-based approach, the DNN is pretrained with 600 seconds of simulated data from a system with dynamics in <ref type="bibr">(18)</ref>, but parametrized with &#956; = 1. Figures <ref type="figure">8(a</ref>), 9(a), and Table I (B) show the tracking error, phase plot, and RMSE of the transfer learning approach over three iterations of DNN training, respectively. For situations where data cannot be collected a priori, initial inner-layer DNN weights can be selected by the user. In the third simulation, instead of pre-training the DNN, a simulation was performed with the initial DNN randomly selected weights. The performance of the transfer learning-based approach and initial randomly selected DNN weights simulations are depicted in Figures <ref type="figure">8</ref> and<ref type="figure">9</ref>. Iterations in the inner-layer weights are shown to improve performance. The first simulation, which was trained with 600 seconds of offline data using the dynamics in <ref type="bibr">(18)</ref> with &#956; = 10 has the best performance with respect to the smallest RMSE within 25  second intervals compared to transfer learning and initial randomly selected DNN weights. Nevertheless, the proposed realtime Lyapunov-based adjustment of the output-layer weights accommodates for different methods to initialize the DNN inner-layer weights.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VII. CONCLUSION</head><p>A multiple timescale DNN-based adaptive control architecture is developed for general nonlinear dynamical systems with unknown drift dynamics and uncertain control effectiveness matrix. Specifically, a Lyapunov-based adaptive update law is developed to estimate the unknown output-layer weights of the DNN and the uncertain control effectiveness matrix in real-time. Simultaneous to real-time execution, data is collected and offline function approximation methods are used to update estimates of the innerlayer weights. A nonsmooth Lyapunov-based analysis is performed to ensure semi-global asymptotic tracking of the desired trajectory. Numerical simulation examples are provided to demonstrate the performance of the proposed architecture.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>The subscript i on &#963; and represents the i th training iteration activation functions and estimated DNN, respectively. The explicit expression for i (x(t)) can be expressed asi (x(t)) = ( W T i,k &#966;i,k &#8226; W T i,k-1 &#966;i,k-1 &#8226; &#8226; &#8226; &#8226; &#8226; W T i,1 &#966;i,1 )(x(t)), where W and &#966;(&#8226;) denote the corresponding estimated innerlayer weights and activation functions of the corresponding training iteration, respectively.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1"><p>For common activation functions, e.g., hyperbolic tangent function, sigmoid function, radial basis function, &#963; * = &#963; = L.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_2"><p>Authorized licensed use limited to: University of Florida. Downloaded on June 25,2021 at 02:44:49 UTC from IEEE Xplore. Restrictions apply.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_3"><p>For real column matrices a, b &#8712; R n , the trace of the outer product is equivalent to the inner product, i.e., tr(ba T ) = a T b.</p></note>
		</body>
		</text>
</TEI>
