skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Reinforcement Learning for Spacecraft Attitude Control
Reinforcement learning (RL) has recently shown promise in solving difficult numerical problems and has discovered non-intuitive solutions to existing problems. This study investigates the ability of a general RL agent to find an optimal control strategy for spacecraft attitude control problems. Two main types of Attitude Control Systems (ACS) are presented. First, the general ACS problem with full actuation is considered, but with saturation constraints on the applied torques, representing thruster-based ACSs. Second, an attitude control problem with reaction wheel based ACS is considered, which has more constraints on control authority. The agent is trained using the Proximal Policy Optimization (PPO) RL method to obtain an attitude control policy. To ensure robustness, the inertia of the satellite is unknown to the control agent and is randomized for each simulation. To achieve efficient learning, the agent is trained using curriculum learning. We compare the RL based controller to a QRF (quaternion rate feedback) attitude controller, a well-established state feedback control strategy. We investigate the nominal performance and robustness with respect to uncertainty in system dynamics. Our RL based attitude control agent adapts to any spacecraft mass without needing to re-train. In the range of 0.1 to 100,000 kg, our agent achieves 2% better performance to a QRF controller tuned for the same mass range, and similar performance to the QRF controller tuned specifically for a given mass. The performance of the trained RL agent for the reaction wheel based ACS achieved 10 higher better reward then that of a tuned QRF controller  more » « less
Award ID(s):
1653118
PAR ID:
10156483
Author(s) / Creator(s):
Date Published:
Journal Name:
70th International Astronautical Congress
Page Range / eLocation ID:
IAC–19–C1.5.2
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The University of Illinois, in collaboration with NASA Jet Propulsion Laboratory (JPL) and NASA Ames Research Center, has developed a novel Attitude Control System (ACS) called the Strain Actuated Solar Arrays (SASA), with sub-milli-arcsecond pointing capability. SASA uses strain-producing actuators to deform flexible deployable structures, and the resulting reaction forces rotate the satellite. This momentum transfer strategy is used for jitter reduction and small-angle slew maneuvers. The system is currently at a Technology Readiness Level of 4-5 and has an upcoming demonstration flight on the CAPSat CubeSat mission. An extension to the SASA concept, known as Multifunctional Structures for Attitude Control (MSAC), enables arbitrarily large-angle slew maneuvers in addition to jitter cancellation. MSAC can potentially replace reaction wheels and control moment gyroscopes for attitude control systems, thereby eliminating a key source of jitter noise. Both SASA and MSAC are more reliable because of fewer failure modes and lower failure rates as compared to conventional ACS, while having an overall smaller mass, volume, and power budget. The paper discusses the advantages of using SASA and MSAC for a wide range of spacecraft and variant mission classes. 
    more » « less
  2. This article introduces a model-based approach for training feedback controllers for an autonomous agent operating in a highly non-linear (albeit deterministic) environment. We desire the trained policy to ensure that the agent satisfies specific task objectives and safety constraints, both expressed in Discrete-Time Signal Temporal Logic (DT-STL). One advantage for reformulation of a task via formal frameworks, like DT-STL, is that it permits quantitative satisfaction semantics. In other words, given a trajectory and a DT-STL formula, we can compute therobustness, which can be interpreted as an approximate signed distance between the trajectory and the set of trajectories satisfying the formula. We utilize feedback control, and we assume a feed forward neural network for learning the feedback controller. We show how this learning problem is similar to training recurrent neural networks (RNNs), where the number of recurrent units is proportional to the temporal horizon of the agent’s task objectives. This poses a challenge: RNNs are susceptible to vanishing and exploding gradients, and naïve gradient descent-based strategies to solve long-horizon task objectives thus suffer from the same problems. To address this challenge, we introduce a novel gradient approximation algorithm based on the idea of dropout or gradient sampling. One of the main contributions is the notion ofcontroller network dropout, where we approximate the NN controller in several timesteps in the task horizon by the control input obtained using the controller in a previous training step. We show that our control synthesis methodology can be quite helpful for stochastic gradient descent to converge with less numerical issues, enabling scalable back-propagation over longer time horizons and trajectories over higher-dimensional state spaces. We demonstrate the efficacy of our approach on various motion planning applications requiring complex spatio-temporal and sequential tasks ranging over thousands of timesteps. 
    more » « less
  3. This article presents a utilization of viscoelastic damping to reduce control system complexity for strain-actuated solar array (SASA) based spacecraft attitude control systems (ACSs). SASA utilizes intelligent structures for attitude control, and is a promising next-generation spacecraft ACS technology with the potential to achieve unprecedented levels of pointing accuracy and jitter reduction during key scientific observation periods. The current state-of-the-art SASA implementation utilizes piecewise modeling of distributed piezoelectric (PZT) actuators, resulting in a monolithic structure with the potential for enhanced ACS reliability. PZT actuators can operate at high frequencies, which enables active vibration damping to achieve ultra-quiet operation for sensitive instruments. Relying on active damping alone, however, requires significant control system complexity, which has so far limited adoption of intelligent structures in spacecraft control systems. Here we seek to understand how to modify passive system design in strategic ways to reduce control system complexity while maintaining high performance. An integrated physical and control system design (codesign) optimization strategy is employed to ensure system-optimal performance, and to help understand design coupling between passive physical aspects of design and active control system design. In this study, we present the possibility of utilizing viscoelastic material distributed throughout the SASA substructure to provide tailored passive damping, intending to reduce control system complexity. At this early phase of study, the effect of temperature variation on material behavior is not considered; the study focuses instead on the design coupling between distributed material and control systems. The spatially-distributed design of both elastic and viscoelastic material in the SASA substructure is considered in an integrated manner. An approximate model is used that balances predictive accuracy and computational efficiency. This model approximates the distributed compliant SASA structure using a series of rigid links connected by generalized torsional springs and dampers. This multi-link pseudo-rigid-body dynamic model (PRBDM) with lumped viscoelastic damping models is derived, and is used in numerical co-design studies to quantify the tradeoffs and benefits of using distributed passive damping to reduce the complexity of SASA control systems. 
    more » « less
  4. Tamim Asfour, editor in (Ed.)
    A reinforcement learning (RL) control policy could fail in a new/perturbed environment that is different from the training environment, due to the presence of dynamic variations. For controlling systems with continuous state and action spaces, we propose an add-on approach to robustifying a pre-trained RL policy by augmenting it with an L1 adaptive controller (L1AC). Leveraging the capability of an L1AC for fast estimation and active ompensation of dynamic variations, the proposed approach can improve the robustness of an RL policy which is trained either in a simulator or in the real world without consideration of a broad class of dynamic variations. Numerical and real-world experiments empirically demonstrate the efficacy of the proposed approach in robustifying RL policies trained using both model-free and modelbased methods. 
    more » « less
  5. A new attitude control system called Multifunctional Structures for Attitude Control (MSAC) is explored in this paper. This system utilizes deployable structures to provide fine pointing and large slewing capabilities for spacecraft. These deploy- able structures utilize distributed actuation, such as piezoelectric strain actuators, to control flexible structure vibration and motion. A related type of intelligent structure has been introduced recently for precision spacecraft attitude control, called Strain Actuated Solar Arrays (SASA). MSAC extends the capabilities of the SASA concept such that arbitrarily large angle slewing can be achieved at relatively fast rates, thereby providing a means to replace Reaction Wheel Assemblies and Control Moment Gyroscopes. MSAC utilizes actuators bonded to deployable panels, such as solar arrays or other structural appendages, and bends the panels to use inertial coupling for small-amplitude, high-precision attitude control and active damping. In addition to presenting the concept, we introduce the operational principles for MSAC and develop a lumped low-fidelity Hardware-in-the-Loop (HIL) prototype and testbed to explore them. Some preliminary experimental results obtained using this prototype provided valuable insight into the design and performance of this new class of attitude control systems. Based on these results and developed principles, we have developed useful lumped-parameter models to use in further system refinement. 
    more » « less