skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Adaptive Neuroevolution With Genetic Operator Control and Two-Way Complexity Variation
Topology and weight evolving artificial neural network (TWEANN) algorithms optimize the structure and weights of artificial neural networks (ANNs) simultaneously. The resulting networks are typically used as policy models for solving control and reinforcement learning (RL) type problems. This paper presents a neuroevolution algorithm that aims to address the typical stagnation and sluggish convergence issues present in other neuroevolution algorithms. These issues are often caused by inadequacies in population diversity preservation, exploration/exploitation balance, and search flexibility. This new algorithm, called the Adaptive Genomic Evolution of Neural-Network Topologies (AGENT), builds on the neuroevolution of augmenting topologies (NEAT) concept. Novel mechanisms for adapting the selection and mutation operations are proposed to favorably control population diversity and exploration/exploitation balance. The former is founded on a fundamentally new way of quantifying diversity by taking a graph-theoretic perspective of the population of genomes and inter-genomic differences. Further advancements to the NEAT paradigm occur through the incorporation of variable neuronal properties and new mutation operations that uniquely allow both the growth and pruning of ANN topologies during evolution. Numerical experiments with benchmark control problems adopted from the OpenAI Gym illustrate the competitive performance of AGENT against standard RL methods and adaptive HyperNEAT, and superiority over the original NEAT algorithm. Further parametric analysis provides key insights into the impact of the new features in AGENT. This is followed by evaluation on an unmanned aerial vehicle collision avoidance problem where maneuver planning models are learnt by AGENT with 33% reward improvement over 15 generations.  more » « less
Award ID(s):
2048020
PAR ID:
10427422
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE Transactions on Artificial Intelligence
ISSN:
2691-4581
Page Range / eLocation ID:
1 to 14
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Off-policy deep reinforcement learning (RL) has been successful in a range of challenging domains. However, standard off-policy RL algorithms can suffer from several issues, such as instability in Qlearning and balancing exploration and exploitation. To mitigate these issues, we present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy RL algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration. By enforcing the diversity between agents using Bootstrap with random initialization, we show that these different ideas are largely orthogonal and can be fruitfully integrated, together further improving the performance of existing off-policy RL algorithms, such as Soft Actor-Critic and Rainbow DQN, for both continuous and discrete control tasks on both low-dimensional and high-dimensional environments. 
    more » « less
  2. The open radio access network (O-RAN) architecture introduces RAN intelligent controllers (RICs) to facilitate the management and optimization of the disaggregated RAN. Reinforcement learning (RL) and its advanced form, deep RL (DRL), are increasingly employed for designing intelligent controllers, or xApps, to be deployed in the near-real time (near-RT) RIC. These models often encounter local optima, which raise concerns about their reliability for RAN intelligent control. We therefore introduce Federated O-RAN enabled Neuroevolution (NE)-enhanced DRL (F-ONRL) that deploys an NE-based optimizer xApp in parallel to the RAN controller xApps. This NE-DRL xApp framework enables effective exploration and exploitation in the near-RT RIC without disrupting RAN operations. We implement the NE xApp along with a DRL xApp and deploy them on Open AI Cellular (OAIC) platform and present numerical results that demonstrate the improved robustness of xApps while effectively balancing the additional computational load. 
    more » « less
  3. We propose Deterministic Sequencing of Exploration and Exploitation (DSEE) algorithm with interleaving exploration and exploitation epochs for model-based RL problems that aim to simultaneously learn the system model, i.e., a Markov decision process (MDP), and the associated optimal policy. During exploration, DSEE explores the environment and updates the estimates for expected reward and transition probabilities. During exploitation, the latest estimates of the expected reward and transition probabilities are used to obtain a robust policy with high probability. We design the lengths of the exploration and exploitation epochs such that the cumulative regret grows as a sub-linear function of time. 
    more » « less
  4. This paper studies multi-stage systems with end-to-end bandit feedback. In such systems, each job needs to go through multiple stages, each managed by a different agent, before generating an outcome. Each agent can only control its own action and learn the final outcome of the job. It has neither knowledge nor control on actions taken by agents in the next stage. The goal of this paper is to develop distributed online learning algorithms that achieve sublinear regret in adversarial environments. The setting of this paper significantly expands the traditional multi-armed bandit problem, which considers only one agent and one stage. In addition to the exploration-exploitation dilemma in the traditional multi-armed bandit problem, we show that the consideration of multiple stages introduces a third component, education, where an agent needs to choose its actions to facilitate the learning of agents in the next stage. To solve this newly introduced exploration-exploitation-education trilemma, we propose a simple distributed online learning algorithm, ϵ-EXP3. We theoretically prove that the ϵ-EXP3 algorithm is a no-regret policy that achieves sublinear regret. Simulation results show that the ϵ-EXP3 algorithm significantly outperforms existing no-regret online learning algorithms for the traditional multi-armed bandit problem. 
    more » « less
  5. Deep reinforcement learning (RL) has led to encouraging successes in numerous challenging robotics applications. However, the lack of inductive biases to support logic deduction and generalization in the representation of a deep RL model causes it less effective in exploring complex long-horizon robot-control tasks with sparse reward signals. Existing program synthesis algorithms for RL problems inherit the same limitation, as they either adapt conventional RL algorithms to guide program search or synthesize robot-control programs to imitate an RL model. We propose ReGuS, a reward-guided synthesis paradigm, to unlock the potential of program synthesis to overcome the exploration challenges. We develop a novel hierarchical synthesis algorithm with decomposed search space for loops, on-demand synthesis of conditional statements, and curriculum synthesis for procedure calls, to effectively compress the exploration space for long-horizon, multi-stage, and procedural robot-control tasks that are difficult to address by conventional RL techniques. Experiment results demonstrate that ReGuS significantly outperforms state-of-the-art RL algorithms and standard program synthesis baselines on challenging robot tasks including autonomous driving, locomotion control, and object manipulation. CCS Concepts: •Software and its engineering → Automatic programming. 
    more » « less