skip to main content


Title: Reinforcement Learning for Spacecraft Attitude Control
Reinforcement learning (RL) has recently shown promise in solving difficult numerical problems and has discovered non-intuitive solutions to existing problems. This study investigates the ability of a general RL agent to find an optimal control strategy for spacecraft attitude control problems. Two main types of Attitude Control Systems (ACS) are presented. First, the general ACS problem with full actuation is considered, but with saturation constraints on the applied torques, representing thruster-based ACSs. Second, an attitude control problem with reaction wheel based ACS is considered, which has more constraints on control authority. The agent is trained using the Proximal Policy Optimization (PPO) RL method to obtain an attitude control policy. To ensure robustness, the inertia of the satellite is unknown to the control agent and is randomized for each simulation. To achieve efficient learning, the agent is trained using curriculum learning. We compare the RL based controller to a QRF (quaternion rate feedback) attitude controller, a well-established state feedback control strategy. We investigate the nominal performance and robustness with respect to uncertainty in system dynamics. Our RL based attitude control agent adapts to any spacecraft mass without needing to re-train. In the range of 0.1 to 100,000 kg, our agent achieves 2% better performance to a QRF controller tuned for the same mass range, and similar performance to the QRF controller tuned specifically for a given mass. The performance of the trained RL agent for the reaction wheel based ACS achieved 10 higher better reward then that of a tuned QRF controller  more » « less
Award ID(s):
1653118
NSF-PAR ID:
10156483
Author(s) / Creator(s):
Date Published:
Journal Name:
70th International Astronautical Congress
Page Range / eLocation ID:
IAC–19–C1.5.2
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This article presents a utilization of viscoelastic damping to reduce control system complexity for strain-actuated solar array (SASA) based spacecraft attitude control systems (ACSs). SASA utilizes intelligent structures for attitude control, and is a promising next-generation spacecraft ACS technology with the potential to achieve unprecedented levels of pointing accuracy and jitter reduction during key scientific observation periods. The current state-of-the-art SASA implementation utilizes piecewise modeling of distributed piezoelectric (PZT) actuators, resulting in a monolithic structure with the potential for enhanced ACS reliability. PZT actuators can operate at high frequencies, which enables active vibration damping to achieve ultra-quiet operation for sensitive instruments. Relying on active damping alone, however, requires significant control system complexity, which has so far limited adoption of intelligent structures in spacecraft control systems. Here we seek to understand how to modify passive system design in strategic ways to reduce control system complexity while maintaining high performance. An integrated physical and control system design (codesign) optimization strategy is employed to ensure system-optimal performance, and to help understand design coupling between passive physical aspects of design and active control system design. In this study, we present the possibility of utilizing viscoelastic material distributed throughout the SASA substructure to provide tailored passive damping, intending to reduce control system complexity. At this early phase of study, the effect of temperature variation on material behavior is not considered; the study focuses instead on the design coupling between distributed material and control systems. The spatially-distributed design of both elastic and viscoelastic material in the SASA substructure is considered in an integrated manner. An approximate model is used that balances predictive accuracy and computational efficiency. This model approximates the distributed compliant SASA structure using a series of rigid links connected by generalized torsional springs and dampers. This multi-link pseudo-rigid-body dynamic model (PRBDM) with lumped viscoelastic damping models is derived, and is used in numerical co-design studies to quantify the tradeoffs and benefits of using distributed passive damping to reduce the complexity of SASA control systems. 
    more » « less
  2. The University of Illinois, in collaboration with NASA Jet Propulsion Laboratory (JPL) and NASA Ames Research Center, has developed a novel Attitude Control System (ACS) called the Strain Actuated Solar Arrays (SASA), with sub-milli-arcsecond pointing capability. SASA uses strain-producing actuators to deform flexible deployable structures, and the resulting reaction forces rotate the satellite. This momentum transfer strategy is used for jitter reduction and small-angle slew maneuvers. The system is currently at a Technology Readiness Level of 4-5 and has an upcoming demonstration flight on the CAPSat CubeSat mission. An extension to the SASA concept, known as Multifunctional Structures for Attitude Control (MSAC), enables arbitrarily large-angle slew maneuvers in addition to jitter cancellation. MSAC can potentially replace reaction wheels and control moment gyroscopes for attitude control systems, thereby eliminating a key source of jitter noise. Both SASA and MSAC are more reliable because of fewer failure modes and lower failure rates as compared to conventional ACS, while having an overall smaller mass, volume, and power budget. The paper discusses the advantages of using SASA and MSAC for a wide range of spacecraft and variant mission classes. 
    more » « less
  3. Abstract Background Few studies have systematically investigated robust controllers for lower limb rehabilitation exoskeletons (LLREs) that can safely and effectively assist users with a variety of neuromuscular disorders to walk with full autonomy. One of the key challenges for developing such a robust controller is to handle different degrees of uncertain human-exoskeleton interaction forces from the patients. Consequently, conventional walking controllers either are patient-condition specific or involve tuning of many control parameters, which could behave unreliably and even fail to maintain balance. Methods We present a novel, deep neural network, reinforcement learning-based robust controller for a LLRE based on a decoupled offline human-exoskeleton simulation training with three independent networks, which aims to provide reliable walking assistance against various and uncertain human-exoskeleton interaction forces. The exoskeleton controller is driven by a neural network control policy that acts on a stream of the LLRE’s proprioceptive signals, including joint kinematic states, and subsequently predicts real-time position control targets for the actuated joints. To handle uncertain human interaction forces, the control policy is trained intentionally with an integrated human musculoskeletal model and realistic human-exoskeleton interaction forces. Two other neural networks are connected with the control policy network to predict the interaction forces and muscle coordination. To further increase the robustness of the control policy to different human conditions, we employ domain randomization during training that includes not only randomization of exoskeleton dynamics properties but, more importantly, randomization of human muscle strength to simulate the variability of the patient’s disability. Through this decoupled deep reinforcement learning framework, the trained controller of LLREs is able to provide reliable walking assistance to patients with different degrees of neuromuscular disorders without any control parameter tuning. Results and conclusion A universal, RL-based walking controller is trained and virtually tested on a LLRE system to verify its effectiveness and robustness in assisting users with different disabilities such as passive muscles (quadriplegic), muscle weakness, or hemiplegic conditions without any control parameter tuning. Analysis of the RMSE for joint tracking, CoP-based stability, and gait symmetry shows the effectiveness of the controller. An ablation study also demonstrates the strong robustness of the control policy under large exoskeleton dynamic property ranges and various human-exoskeleton interaction forces. The decoupled network structure allows us to isolate the LLRE control policy network for testing and sim-to-real transfer since it uses only proprioception information of the LLRE (joint sensory state) as the input. Furthermore, the controller is shown to be able to handle different patient conditions without the need for patient-specific control parameter tuning. 
    more » « less
  4. A new attitude control system called Multifunctional Structures for Attitude Control (MSAC) is explored in this paper. This system utilizes deployable structures to provide fine pointing and large slewing capabilities for spacecraft. These deploy- able structures utilize distributed actuation, such as piezoelectric strain actuators, to control flexible structure vibration and motion. A related type of intelligent structure has been introduced recently for precision spacecraft attitude control, called Strain Actuated Solar Arrays (SASA). MSAC extends the capabilities of the SASA concept such that arbitrarily large angle slewing can be achieved at relatively fast rates, thereby providing a means to replace Reaction Wheel Assemblies and Control Moment Gyroscopes. MSAC utilizes actuators bonded to deployable panels, such as solar arrays or other structural appendages, and bends the panels to use inertial coupling for small-amplitude, high-precision attitude control and active damping. In addition to presenting the concept, we introduce the operational principles for MSAC and develop a lumped low-fidelity Hardware-in-the-Loop (HIL) prototype and testbed to explore them. Some preliminary experimental results obtained using this prototype provided valuable insight into the design and performance of this new class of attitude control systems. Based on these results and developed principles, we have developed useful lumped-parameter models to use in further system refinement. 
    more » « less
  5. In this paper, the issue of model uncertainty in safety-critical control is addressed with a data-driven approach. For this purpose, we utilize the structure of an input-output linearization controller based on a nominal model along with a Control Barrier Function and Control Lyapunov Function based Quadratic Program (CBF-CLF-QP). Specifically, we propose a novel reinforcement learning framework which learns the model uncertainty present in the CBF and CLF constraints, as well as other control-affine dynamic constraints in the quadratic program. The trained policy is combined with the nominal model based CBF-CLF-QP, resulting in the Reinforcement Learning based CBF-CLF-QP (RL-CBF-CLF-QP), which addresses the problem of model uncertainty in the safety constraints. The performance of the proposed method is validated by testing it on an underactuated nonlinear bipedal robot walking on randomly spaced stepping stones with one step preview, obtaining stable and safe walking under model uncertainty. 
    more » « less