The University of Illinois, in collaboration with NASA Jet Propulsion Laboratory (JPL) and NASA Ames Research
Center, has developed a novel Attitude Control System (ACS) called the Strain Actuated Solar Arrays (SASA),
with sub-milli-arcsecond pointing capability. SASA uses strain-producing actuators to deform flexible deployable
structures, and the resulting reaction forces rotate the satellite. This momentum transfer strategy is used for jitter
reduction and small-angle slew maneuvers. The system is currently at a Technology Readiness Level of 4-5 and has
an upcoming demonstration flight on the CAPSat CubeSat mission. An extension to the SASA concept, known as
Multifunctional Structures for Attitude Control (MSAC), enables arbitrarily large-angle slew maneuvers in addition
to jitter cancellation. MSAC can potentially replace reaction wheels and control moment gyroscopes for attitude
control systems, thereby eliminating a key source of jitter noise. Both SASA and MSAC are more reliable because
of fewer failure modes and lower failure rates as compared to conventional ACS, while having an overall smaller
mass, volume, and power budget. The paper discusses the advantages of using SASA and MSAC for a wide range of
spacecraft and variant mission classes.
more »
« less
Reinforcement Learning for Spacecraft Attitude Control
Reinforcement learning (RL) has recently shown promise in solving difficult numerical problems and has discovered
non-intuitive solutions to existing problems. This study investigates the ability of a general RL agent to find an optimal
control strategy for spacecraft attitude control problems. Two main types of Attitude Control Systems (ACS) are
presented. First, the general ACS problem with full actuation is considered, but with saturation constraints on the
applied torques, representing thruster-based ACSs. Second, an attitude control problem with reaction wheel based
ACS is considered, which has more constraints on control authority. The agent is trained using the Proximal Policy
Optimization (PPO) RL method to obtain an attitude control policy. To ensure robustness, the inertia of the satellite
is unknown to the control agent and is randomized for each simulation. To achieve efficient learning, the agent is
trained using curriculum learning. We compare the RL based controller to a QRF (quaternion rate feedback) attitude
controller, a well-established state feedback control strategy. We investigate the nominal performance and robustness
with respect to uncertainty in system dynamics. Our RL based attitude control agent adapts to any spacecraft mass
without needing to re-train. In the range of 0.1 to 100,000 kg, our agent achieves 2% better performance to a QRF
controller tuned for the same mass range, and similar performance to the QRF controller tuned specifically for a
given mass. The performance of the trained RL agent for the reaction wheel based ACS achieved 10 higher better
reward then that of a tuned QRF controller
more »
« less
- Award ID(s):
- 1653118
- PAR ID:
- 10156483
- Date Published:
- Journal Name:
- 70th International Astronautical Congress
- Page Range / eLocation ID:
- IAC–19–C1.5.2
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Tamim Asfour, editor in (Ed.)A reinforcement learning (RL) control policy could fail in a new/perturbed environment that is different from the training environment, due to the presence of dynamic variations. For controlling systems with continuous state and action spaces, we propose an add-on approach to robustifying a pre-trained RL policy by augmenting it with an L1 adaptive controller (L1AC). Leveraging the capability of an L1AC for fast estimation and active ompensation of dynamic variations, the proposed approach can improve the robustness of an RL policy which is trained either in a simulator or in the real world without consideration of a broad class of dynamic variations. Numerical and real-world experiments empirically demonstrate the efficacy of the proposed approach in robustifying RL policies trained using both model-free and modelbased methods.more » « less
-
This article presents a utilization of viscoelastic damping to reduce control system complexity for strain-actuated solar array (SASA) based spacecraft attitude control systems (ACSs). SASA utilizes intelligent structures for attitude control, and is a promising next-generation spacecraft ACS technology with the potential to achieve unprecedented levels of pointing accuracy and jitter reduction during key scientific observation periods. The current state-of-the-art SASA implementation utilizes piecewise modeling of distributed piezoelectric (PZT) actuators, resulting in a monolithic structure with the potential for enhanced ACS reliability. PZT actuators can operate at high frequencies, which enables active vibration damping to achieve ultra-quiet operation for sensitive instruments. Relying on active damping alone, however, requires significant control system complexity, which has so far limited adoption of intelligent structures in spacecraft control systems. Here we seek to understand how to modify passive system design in strategic ways to reduce control system complexity while maintaining high performance. An integrated physical and control system design (codesign) optimization strategy is employed to ensure system-optimal performance, and to help understand design coupling between passive physical aspects of design and active control system design. In this study, we present the possibility of utilizing viscoelastic material distributed throughout the SASA substructure to provide tailored passive damping, intending to reduce control system complexity. At this early phase of study, the effect of temperature variation on material behavior is not considered; the study focuses instead on the design coupling between distributed material and control systems. The spatially-distributed design of both elastic and viscoelastic material in the SASA substructure is considered in an integrated manner. An approximate model is used that balances predictive accuracy and computational efficiency. This model approximates the distributed compliant SASA structure using a series of rigid links connected by generalized torsional springs and dampers. This multi-link pseudo-rigid-body dynamic model (PRBDM) with lumped viscoelastic damping models is derived, and is used in numerical co-design studies to quantify the tradeoffs and benefits of using distributed passive damping to reduce the complexity of SASA control systems.more » « less
-
In this paper, the issue of model uncertainty in safety-critical control is addressed with a data-driven approach. For this purpose, we utilize the structure of an input-output linearization controller based on a nominal model along with a Control Barrier Function and Control Lyapunov Function based Quadratic Program (CBF-CLF-QP). Specifically, we propose a novel reinforcement learning framework which learns the model uncertainty present in the CBF and CLF constraints, as well as other control-affine dynamic constraints in the quadratic program. The trained policy is combined with the nominal model based CBF-CLF-QP, resulting in the Reinforcement Learning based CBF-CLF-QP (RL-CBF-CLF-QP), which addresses the problem of model uncertainty in the safety constraints. The performance of the proposed method is validated by testing it on an underactuated nonlinear bipedal robot walking on randomly spaced stepping stones with one step preview, obtaining stable and safe walking under model uncertainty.more » « less
-
Abstract Background Few studies have systematically investigated robust controllers for lower limb rehabilitation exoskeletons (LLREs) that can safely and effectively assist users with a variety of neuromuscular disorders to walk with full autonomy. One of the key challenges for developing such a robust controller is to handle different degrees of uncertain human-exoskeleton interaction forces from the patients. Consequently, conventional walking controllers either are patient-condition specific or involve tuning of many control parameters, which could behave unreliably and even fail to maintain balance. Methods We present a novel, deep neural network, reinforcement learning-based robust controller for a LLRE based on a decoupled offline human-exoskeleton simulation training with three independent networks, which aims to provide reliable walking assistance against various and uncertain human-exoskeleton interaction forces. The exoskeleton controller is driven by a neural network control policy that acts on a stream of the LLRE’s proprioceptive signals, including joint kinematic states, and subsequently predicts real-time position control targets for the actuated joints. To handle uncertain human interaction forces, the control policy is trained intentionally with an integrated human musculoskeletal model and realistic human-exoskeleton interaction forces. Two other neural networks are connected with the control policy network to predict the interaction forces and muscle coordination. To further increase the robustness of the control policy to different human conditions, we employ domain randomization during training that includes not only randomization of exoskeleton dynamics properties but, more importantly, randomization of human muscle strength to simulate the variability of the patient’s disability. Through this decoupled deep reinforcement learning framework, the trained controller of LLREs is able to provide reliable walking assistance to patients with different degrees of neuromuscular disorders without any control parameter tuning. Results and conclusion A universal, RL-based walking controller is trained and virtually tested on a LLRE system to verify its effectiveness and robustness in assisting users with different disabilities such as passive muscles (quadriplegic), muscle weakness, or hemiplegic conditions without any control parameter tuning. Analysis of the RMSE for joint tracking, CoP-based stability, and gait symmetry shows the effectiveness of the controller. An ablation study also demonstrates the strong robustness of the control policy under large exoskeleton dynamic property ranges and various human-exoskeleton interaction forces. The decoupled network structure allows us to isolate the LLRE control policy network for testing and sim-to-real transfer since it uses only proprioception information of the LLRE (joint sensory state) as the input. Furthermore, the controller is shown to be able to handle different patient conditions without the need for patient-specific control parameter tuning.more » « less