<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Model-Based Switched Approximate Dynamic Programming for Functional Electrical Stimulation Cycling</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>06/08/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10355301</idno>
					<idno type="doi">10.23919/ACC53348.2022.9867819</idno>
					<title level='j'>American Control Conference</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Wanjiku A. Makumi</author><author>Max L. Greene</author><author>Kimberly J. Stubbs</author><author>Warren E. Dixon</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[This paper applies a reinforcement learning-based approximately optimal controller to a motorized functional electrical stimulation-induced cycling system to track a desired cadence. Sufficient torque to achieve the cycling objective is achieved by switching between the quadriceps muscle and electric motor. Uniformly ultimately bounded (UUB) convergence of the actual cadence to a neighborhood of the desired cadence and of the approximate control policy to a neighborhood of the optimal control policy are proven for both motor control and muscle control via a Lyapunov-based stability analysis provided developed dwell-time conditions that determine when to switch between the motor or the muscle are satisfied. Lyapunovbased techniques are also used to derive a minimum dwell-time condition to prove UUB stability of the overall switched system.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>Rehabilitation through functional electrical stimulation (FES) is a treatment for people with neurological conditions (NCs), such as stroke and spinal cord injury <ref type="bibr">[1]</ref> and <ref type="bibr">[2]</ref>. FES induces involuntary muscle contractions to perform a functional movement by applying an electric potential across the motor neurons of a muscle. To improve motor function and overall quality of life, multiple efforts in the rehabilitation field use FES with rehabilitation robots to facilitate human-robot therapy <ref type="bibr">[3]</ref>. Stationary FES cycling is a common human-robot rehabilitative therapy for people with movement impairments resulting from NCs <ref type="bibr">[4]</ref>. FES cycling has the benefits of both FES and rehabilitation robotics; it is a preferred therapy because there is minimal risk of a fall, and the repetition of coordinated limb movements improves motor skills and nervous system reorganization <ref type="bibr">[5]</ref>.</p><p>Optimal controllers can be established by assigning a userdefined cost to the states and control inputs, which penalizes the state and the magnitude of the control input. Through the cost function, a balance can be obtained between the accuracy of the limb motion versus the level of control effort, allowing potential tradeoffs between comfort, performance, duration of exercise, and muscle fatigue. The only results that apply optimal control methods to FES applications are Wanjiku A. Makumi, Max L. Greene, Kimberly J. Stubbs, and Warren E. Dixon are with the Department of Mechanical and Aerospace Engineering, University of Florida, Gainesville, FL, USA. Email: {makumiw, maxgreene12, kimberlyjstubbs, wdixon}@ufl.edu.</p><p>This research is supported in part by NSF Award number 1762829, Office of Naval Research Grant N00014-13-1-0151, AFOSR award number FA9550-18-1-0109, and AFOSR award number FA9550-19-1-0169. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the sponsoring agency. <ref type="bibr">[6]</ref> and <ref type="bibr">[7]</ref>. These results use extremum seeking, a modelfree online optimization tool, to adjust a closed-loop PID controller to minimize the cost function for upper limb electrical stimulation.</p><p>Optimal control problems can be solved via the Hamilton-Jacobi-Bellman (HJB) equation <ref type="bibr">[8]</ref>. By solving the HJB equation to determine the optimal value function, an optimal control policy can be developed <ref type="bibr">[8]</ref>. Generally, the HJB equation does not have a closed-form analytic solution for nonlinear systems. Motivated by the challenges of solving the HJB, especially in real-time, approximate dynamic programming (ADP) has emerged as a method to yield an approximate solution. Specficially, ADP uses a reinforcement learning (RL)-based actor-critic framework to approximate the value function in real-time <ref type="bibr">[9]</ref>. Neural networks (NNs) are generally used within ADP to approximate the unknown optimal value function, but other function approximation methods could also be used <ref type="bibr">[10]</ref>.</p><p>In traditional adaptive control, the uncertain parameter estimates are updated using an error feedback as a performance metric; in ADP, the Bellman error (BE) is used as feedback on the level of suboptimality. Specifically, the BE is used to update the NN parameters to improve the value function approximation online. BE extrapolation yields faster policy learning over a domain by evaluating the BE over user-defined, off-trajectory regions of the state space <ref type="bibr">[11]</ref>. Sufficient off-trajectory data must be selected to achieve adequate exploration. The value function approximation is updated according to the on-and off-trajectory BE.</p><p>Due to the potential benefits of using an optimal controller, it is advantageous to apply ADP to the cycling system. However, the system switches between two actuation methods: the rider's muscles and the cycle's electric motor. Therefore, FES-cycling is a switched (also called hybrid) system, which requires switched (hybrid) system analysis and design methods <ref type="bibr">[12]</ref>. Until recently, switching has not been investigated in the context of ADP. The result in <ref type="bibr">[13]</ref> develops a framework to estimate the optimal feedback control policy online while switching between multiple dynamic system models. When analyzing switched systems, a common problem is the growth and discontinuity of Lyapunov functions at switching instances <ref type="bibr">[14]</ref>. This growth and discontinuity problem is overcome in <ref type="bibr">[13]</ref> in which a dwell-time analysis is developed to determine the minimum time necessary before the system can switch to a different subsystem (i.e., a minimum dwelltime). This provides a framework to switch between the two different modes of the FES-cycling controller and show stability of the overall switched system.</p><p>Motivated by our previous results in <ref type="bibr">[13]</ref> and <ref type="bibr">[14]</ref>, this paper implements a continuous-time ADP-based tracking controller that allows for switching between mutliple cycle actuation methods to track a desired cadence. Uniformly ultimately bounded (UUB) stability of the overall switched system is proven. Moreover, the developed controller is also proven to converge to a neighborhood of the optimal controller.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Notation</head><p>For notational brevity, time-dependence is omitted while denoting trajectories of the dynamic systems. For example, the trajectory x (t), where x : R &#8805;0 &#8594; R n , is denoted as x &#8712; R n and referred to as x instead of x (t). For example, the equation f + h (y, t) = g (x) should be interpreted as f (t) + h ((y, t) , t) = g (x (t)). The gradient denote the modulo operator where, generally, m is the dividend, p is the divisor, and r is the remainder. In this paper, the quantity or function belonging to the k th mode of the switched system is denoted with the subscript k.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. PROBLEM FORMULATION</head><p>Following the development in <ref type="bibr">[15]</ref>, the dynamics of the combined one-legged cycle and rider system are &#964; = M (q) q + V c (q, q) q + G (q) + P (q, q) + b c q, (1) where q, q, and q &#8712; R denote the angle, angular velocity, and angular acceleration, of the crank arm respectively. M : R &#8594; R &gt;0 denotes the inertia matrix, V c : R &#215; R &#8594; R denotes the centripetal-Coriolis matrix, G : R &#8594; R denotes the gravitational effects, P : R &#215; R &#8594; R denotes the passive viscoelastic tissue forces, b c &#8712; R &gt;0 denotes the cycle's viscous damping effect, and &#964; denotes the torque applied by the quadriceps muscle and the cycle motor, which is subsequently defined.</p><p>The torque is applied by two different actuators, corresponding to either the torque due to the FES-induced muscle contractions or the torque due to the electric motor. Given the need to use the different actuators at different times, we define two sets: Q, when the crank angle is in the kinematically effective quadricep region, and Q c , when the crank angle is in the region of poor kinematic efficiency <ref type="bibr">[16]</ref>. Let Q &#8834; [0 &#8226; , 360 &#8226; ) denote where electrical stimulation is active and Q c denote the complement of Q, where the electric motor is active.</p><p>The torque &#964; : R &#215; R &#8594; R in ( <ref type="formula">1</ref>) is defined as</p><p>where b 1 : R &#215; R &#8594; R &gt;0 is the assumed known muscle control effectiveness, u 1 &#8712; R is the muscle control input, b 2 &#8712; R is the known motor control constant, and u 2 &#8712; R is the motor control input. From (2) the dynamics for each mode are <ref type="bibr">[15]</ref> b k u k = M (q) q + V c (q, q) q + G (q) + P (q, q) + b c q, (3)</p><p>where k represents the active switched subsystem. Let k &#8712; S, where S &#8796; {1, 2} is the switching index set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Background Information</head><p>Following the development in <ref type="bibr">[17]</ref>, the dynamics in (3) can be rewritten in the control-affine form<ref type="foot">foot_0</ref> </p><p>where x &#8796; [q, q] T , and a subsequently defined control input u k &#8712; R represents the control input for the k th system. The drift dynamics f : R 2 &#8594; R 2 are defined as f (x) &#8796; q M (q)</p><p>-1 (-V (q, q) q -G (q) -P (q, q) -b c q) , and the control effectiveness g k : R 2 &#8594; R 2 is defined as</p><p>The control objective is to track a time-varying continuously differentiable signal x d &#8712; R 2 . To quantify the tracking objective, the tracking error e &#8712; R 2 is defined as e &#8796; x -x d . Using the technique in <ref type="bibr">[17]</ref>, the control affine dynamics in (4) can be expressed as</p><p>where</p><p>The following properties and assumptions facilitate the development of the desired approximate optimal tracking controller.</p><p>Property 1. The drift dynamics f are continuously differentiable <ref type="bibr">[15]</ref>, which, using <ref type="bibr">[18,</ref><ref type="bibr">Lemma 3.2]</ref>, means that f is a locally Lipschitz function and f (0) = 0.</p><p>Property 2. The control effectiveness matrix g k is continuously differentiable <ref type="bibr">[15]</ref> and therefore a locally Lipschitz function <ref type="bibr">[18,</ref><ref type="bibr">Lemma 3.2]</ref>. The matrix g k is bounded such that 0 &lt; &#8741;g k (x)&#8741; &#8804; g k &#8704;x &#8712; R n , where g k &#8712; R &gt;0 is the supremum over all x of the maximum singular value of g k (x), respectively, for all k. It follows that &#8741;G</p><p>Assumption 1. The desired trajectory is upper-bounded by a known positive constant</p><p>Based on the above assumptions, the trajectory tracking component of the controller</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Control Objective</head><p>The control objective is to solve the infinite-horizon optimal tracking problem i.e. to find a control policy &#181; k that minimizes the cost function</p><p>where</p><p>and Q k is PD so that the cost in <ref type="bibr">(6)</ref> does not depend on the desired trajectory.</p><p>Property 3. The state cost matrix Q k satisfies q k &#8804; Q k &#8804; q k where q k , q k &#8712; R &gt;0 are the minimum and maximum eigenvalues of Q k , respectively.</p><p>The infinite horizon value function (i.e. the cost-to-go) for the k th mode V * k : R 4 &#8594; R &#8805;0 is defined as</p><p>where U &#8834; R is the action space for &#181; k .</p><p>Assumption 3. The optimal value function V * k is continuously differentiable for all k &#8712; S <ref type="bibr">[17]</ref>.</p><p>The optimal transient control policy &#181; * k : R 4 &#8594; R is defined as</p><p>Each k th optimal value function and optimal control policy satisfy the HJB equation</p><p>which has the boundary condition V * k (0) = 0.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Value Function Approximation</head><p>The optimal value function V * k is unknown for general nonlinear systems. Let &#8486; &#8834; R 4 be a compact set such that &#950; &#8712; &#8486;. The value function can be approximated with a NN in &#8486; by invoking the Stone-Weierstrass Theorem to obtain</p><p>where W k &#8712; R L is a vector of unknown weights, &#981; : R 4 &#8594; R L is a user-defined vector of basis functions, and &#1013; k : R 4 &#8594; R is the bounded function reconstruction error. <ref type="foot">2</ref> Substituting (10) into (8) yields a NN representation of the optimal control policy The ideal weights W k are unknown a priori; hence, an approximation of W k is desired. The critic weight estimate vector &#372;c,k &#8712; R L is substituted into <ref type="bibr">(10)</ref> to obtain the optimal value function estimate Vk :</p><p>The actor weight estimate vector &#372;a,k &#8712; R L is substituted into <ref type="bibr">(11)</ref> to obtain the optimal transient control policy estimate &#956;k : R 4 &#215; R L &#8594; R, defined as</p><p>The overall controller u k &#8712; R is defined as</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. BELLMAN ERROR</head><p>To calculate the BE &#948; k : R 4 &#215; R L &#215; R L &#8594; R, the optimal value function V * k (&#950;) and the optimal control policy &#181; * k (&#950;) in ( <ref type="formula">9</ref>) are replaced by the approximate optimal value function Vk &#950;, &#372;c,k and the approximate optimal control policy &#956;k &#950;, &#372;a,k , respectively, where</p><p>The value of the BE indicates how close the actor and critic weight estimates are to their respective ideal weight values. By subtracting ( <ref type="formula">9</ref>) from ( <ref type="formula">14</ref>), substituting (10)- <ref type="bibr">(13)</ref>, and denoting the difference between the actual and ideal weight values by Wc,k &#8796; W k -&#372;c,k and Wa,k &#8796; W k -&#372;a,k the analytical form of the BE in ( <ref type="formula">14</ref>) is</p><p>where</p><p>Although they are equivalent, ( <ref type="formula">14</ref>) is used in implementation and ( <ref type="formula">15</ref>) is used in the stability analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Bellman Error Extrapolation</head><p>Using the control policy given in ( <ref type="formula">13</ref>), the current system state, the critic weight estimate, and the actor weight estimate, the estimated BE in ( <ref type="formula">14</ref>) can be evaluated to calculate the instantaneous BE denoted by &#948; k &#950;, &#372;c,k , &#372;a,k at each time instance t &#8712; R &#8805;0 . The exploration versus exploitation problem is well-known for learning-based control methods. In results such as <ref type="bibr">[21]</ref>, an exploration signal is required to successfully explore the operating domain. Results such as <ref type="bibr">[11]</ref> use BE extrapolation, which simultaneously evaluates the BE along the system trajectory and at user-defined points in the state space. The BE extrapolation technique eliminates the need for the signal by providing simulation of experience, thus yielding a better value function mation <ref type="bibr">[11]</ref>.</p><p>The BE is extrapolated from the user-defined off-trajectory points {&#950; i :</p><p>i=1 set by the user, where N k &#8712; N denotes a user-specified number of overall extrapolation trajectories in the compact set &#8486;. The tuple (&#931; c,k , &#931; a,k , &#931; &#915;,k ) represents the data stacks defined as &#931; c,k &#8796; 1</p><p>is a user-defined gain, and &#915; k : R L&#215;L is a time-varying least-squares gain matrix. Each subsystem has its own distinct set of data, gain values, and update laws. Assumption 5. On the compact set, &#8486;, a finite set of offtrajectory points {&#950; i :</p><p>where c k is a constant scalar lower bound of the value of each input-output data pair's minimum eigenvalues for the k th subsystem <ref type="bibr">[11]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. UPDATE LAWS FOR ACTOR AND CRITIC WEIGHTS</head><p>The critic and actor weights are updated according to the subsequent laws while each mode is active. In the weight update laws, &#951; c1,k , &#951; c2,k , &#951; a1,k , &#951; a2,k , &#955; k &#8712; R are positive constant learning gains, and &#915; k , &#915; k &#8712; R &gt;0 are the upper and lower projection operator bounds for &#915; k . The critic update 3 </p><p>where</p><p>The actor update law for the k th mode &#7814;a,k &#8712; R L is defined as</p><p>where</p><p>&#372;c,k +&#951; c2,k &#931; a,k &#372;c,k . The operator proj {&#8226;} denotes the smooth projection operator defined in [22, Appendix E, Eq. E.4] and is designed such that &#372;c,k &#8712;</p><p>where 1 {&#8226;} denotes the indicator function. <ref type="foot">4</ref> While the k th mode is inactive &#7814;c,k = 0 L&#215;1 , &#915;k = 0 L&#215;L , and &#7814;a,k = 0 L&#215;1 . <ref type="foot">5</ref>Remark 2. Under Assumptions 1-3, the PD solution of the HJB equation is the optimal value function for each system. The approximation of the PD solution to the HJB equation is guaranteed by the appropriate selection of Lyapunov-based update laws and initial weight estimates <ref type="bibr">[23]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. STABILITY ANALYSIS</head><p>It is possible for a switched system to become unstable, even if the individual subsystems of a switched system are stable <ref type="bibr">[12,</ref><ref type="bibr">Ch. 3]</ref>. Hence, the stability of each subsystem must be investigated along with the switching between the systems. In the subsequent development, k subsystems, each with a class of dynamics in (4), are analyzed with the control policy in <ref type="bibr">(13)</ref> and update laws outlined in ( <ref type="formula">16</ref>)- <ref type="bibr">(18)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Subsystem Stability Analysis</head><p>Since the state penalty matrix Q k is PSD, the optimal value function V * k is PSD and is not a valid Lyapunov function. However, a nonautonomous form of the optimal value function denoted as</p><p>, and is PD and decrescent <ref type="bibr">[17]</ref>. To facilitate the stability analysis, let z k &#8712; R 2+2L be a concatenated state vector defined as</p><p>According to <ref type="bibr">[17]</ref> and <ref type="bibr">[18,</ref><ref type="bibr">Lemma 4.3]</ref>, <ref type="bibr">(19)</ref> can generally be bounded as</p><p>To facilitate the subsequent dwell-time analysis, the following more restrictive assumption is required.</p><p>Assumption 6. The optimal value function V * t,k (e, t) can be bounded by the square of the norm of its argument times a positive constant, i.e., 6</p><p>for all k &#8712; S, where</p><p>Using Assumption 6, ( <ref type="formula">19</ref>) can be bounded as</p><p>where</p><p>The normalized regressors &#969; k &#961; k and</p><p>, and G &#981;,k is bounded as</p><p>Remark 3. Using the projection operator from the critic update law in <ref type="bibr">(16)</ref> and <ref type="bibr">[22,</ref> </p><p>Using the projection operator from the actor update in <ref type="bibr">(17)</ref> and <ref type="bibr">[22,</ref><ref type="bibr">Lemma E.1]</ref>, -W T a,k &#7814;a,k is bounded from above as</p><p>To facilitate the subsequent analysis, let R &#8712; R &gt;0 be the radius of a compact ball B R &#8834; R 2 &#215; R L &#215; R L centered at the origin.</p><p>Theorem 1. While each subsystem is active, if Assumptions 1-6 hold, the control policy in <ref type="bibr">(13)</ref> and the weight update laws in ( <ref type="formula">16</ref>)-( <ref type="formula">18</ref>) are implemented, and the conditions converges to a neighborhood of the optimal control policy &#181; * k . Proof: Using (5) and the fact that V * t,k (e, t) = V * k (&#950;) , &#8704;e &#8712; R 2 , t &#8712; R &#8805;0 and taking the time derivative of the candidate Lyapunov function in <ref type="bibr">(19)</ref> yields</p><p>where the fact that d dt &#915; -1 k = &#915; -1 k &#915;k &#915; -1 k is used. Under the sufficient gain conditions in <ref type="bibr">(21)</ref> and <ref type="bibr">(22)</ref>, and using ( <ref type="formula">9</ref>), <ref type="bibr">(15)</ref>, and the update laws in ( <ref type="formula">16</ref>)-( <ref type="formula">18</ref>), the expression in (24) can be bounded as</p><p>for all k &#8712; S and t &#8712; R &#8805;0 , where</p><p>, and l k is a positive constant that depends on the control gains and NN bounding constants in Assumption 4. Using the bounds in <ref type="bibr">(20)</ref>, the time derivative in (24), &#923; k , and ( <ref type="formula">23</ref>), <ref type="bibr">[18,</ref><ref type="bibr">Thm. 4.18]</ref> can be invoked to prove that z k is UUB such that lim sup t&#8594;&#8734; &#8741;z k &#8741; &#8804; 2&#945; 2,k l k &#945; 1,k &#923; k , and the transient tracking control policy &#956;k converges to a neighborhood of the optimal control policy &#181; * k . Since z k &#8712; L &#8734; , it follows that e, Wc,k , Wa,k &#8712; L &#8734; , and since &#956;k &#8712; L &#8734; and x d &#8804; x d , it follows that u k &#8712; L &#8734; . Furthermore, every trajectory z k that is initialized in the ball B R is bounded such that z k &#8712; B R , &#8704;t &#8712; R &#8805;0 , &#8704;k &#8712; S. Since z k &#8712; B R , it follows that the individual elements of z k lie in a compact set, i.e. e, Wc,k , and Wa,k lie in a compact set. Additionally, since x d &#8804; x d , then the concatenated state &#950; &#8712; &#8486;, &#8704;t &#8712; R &#8805;0 , &#8704;k &#8712; S, which facilitates value function approximation. Remark 4. See <ref type="bibr">[11]</ref> for insight into satisfying the gain conditions in <ref type="bibr">(21)</ref> and <ref type="bibr">(22)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Switched Subsystems</head><p>Let t ON k &#8712; [0, t] denote a time instant when the k th subsystem of the switching sequence is activated. Let t OF F k &#8712; [0, t] denote a time instant when the k th subsystem in the switching sequence is deactivated. The dwell-time in any active mode of a subsystem denoted by &#964; k &#8712; R &gt;0 is defined as &#964; k = t OF F k -t ON k and represents the amount of time a subsystem must be active before switching to the next. The minimum dwell time for any active mode of a system is denoted by &#964; * &#8712; R &gt;0 . There are a finite number of switches, and N &#963; &#8712; N &lt;&#8734; denotes the number of switching events. The sequence of time instants at which a switching event occurs is defined as t ON N&#963; , such that 0</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Dwell-Time Analysis</head><p>The stability analysis proves that each subsystem is UUB while active. However, the Lyapunov function for the overall switching system may instantaneously increase due to the change in the optimal value function and set of new weight parameters. The value function corresponding to mode k +1,</p><p>, may be larger than the value function corresponding to mode k, V * t,k (&#950;). Similarly, the magnitude of the actor and critic weight errors could be larger in mode k + 1 than in mode k. Therefore, a dwell time condition must be designed to account for switching between the subsystems which ensures that the switched system is stable <ref type="bibr">[12,</ref><ref type="bibr">Ch. 3</ref>].</p><p>Theorem 2. The system consisting of a family of subsystems with the dynamics in (4) with a properly designed minimum dwell-time, &#964; * &#8712; R &gt;0 ensures that the tracking error, critic estimate errors, and actor estimate errors will converge to a neighborhood of the origin in the sense that &#8741;z</p><p>is the maximum ultimate bound for all subsystems, and T &#8712; R &#8805;0 is the time required to reach the ultimate bound.</p><p>The proof follows that of <ref type="bibr">[13,</ref><ref type="bibr">Theorem 2]</ref> and is available upon request.</p><p>Remark 5. From Section II, the system switches modes based on if the crank angle q belongs to Q or Q c , i.e., the switching is state-based; furthermore the switches occur more frequently at higher desired cadence values. The user cannot directly control the time of the switching instances. For the system to be stable using the previous analysis, the dwell-time must be significantly smaller than the time required to travel through the regions Q and Q c . The dwelltime &#964; * is composed of many user-selected parameters. Notably, &#964; * can be decreased by increasing the decay rate &#947; 0 , i.e. stronger convergence parameters result in a shorter dwelltime. The dwell time &#964; * is inversely proportional to the decay rate &#947; 0 , and &#947; 0 is proportional to &#923;. Hence, maximizing &#923; will decrease the dwell time. This is achieved by maximizing the term min &#955; min (Q k ) , 1  6 &#951; c2,k c k , 1 8 (&#951; a1,k + &#951; a2,k ) . While this maximization decreases the dwell-time so that it is significantly smaller than the time dictated by the desired trajectory, there are some practical drawbacks. A larger state cost matrix Q k will increase the penalty on the error; paired with larger actor and critic learning gains &#951; a1,k , &#951; a2,k , and &#951; c2,k , this may lead to a more aggressive controller, which may cause rider discomfort. Furthermore, increasing c k relies on using more BE extrapolation data pairs, which may become computationally intensive. Motivated by these practical considerations, additional analysis methods that can potentially eliminate the need for a minimum dwell-time are motivated.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. CONCLUSION</head><p>This paper develops an ADP-based controller for switched cycle dynamics while achieving a time-varying tracking objective. The stability of each individual subsystem is proven by a Lyapunov-based analysis, and the stability of the overall switched system is proven via a dwell-time analysis. The entire switched system is proven to be UUB such that the control policy is proven to converge to a neighborhood of the optimal policy and to track the cadence within a neighborhood of its desired value. Future work will investigate the application of the developed controller to an FES-cycling testbed and the development of analysis methods free of minimum dwell-time requirements.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>The cycle-rider dynamics do not differ between modes. The only difference between the switching modes is the actuation methods.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1"><p>To focus the scope of this manuscript, each switched system will use the same dimension vector of basis functions &#981; (&#950;) i.e., L 1 = L 2 = ... = L k</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2"><p>Using (18)  ensures that each &#915; k &#8804; &#8741;&#915; k &#8741; &#8804; &#915; k for all t &#8712; R &gt;0 .</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_3"><p>The update laws will not update a subsystem k's weight estimates or least-squares matrix unless subsystem k is active.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_4"><p>Authorized licensed limited to: of Florida. Downloaded on September 12,2022 at 15:09:26 UTC from IEEE Xplore. Restrictions apply.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_5"><p>Authorized licensed use to: University of Florida. Downloaded on September 12,2022 at 15:09:26 UTC from IEEE Xplore. Restrictions apply.</p></note>
		</body>
		</text>
</TEI>
