The Open Radio Access Network (RAN) paradigm is transforming cellular networks into a system of disaggregated, virtualized, and software-based components. These self-optimize the network through programmable, closed-loop control, leveraging Artificial Intelligence (AI) and Machine Learning (ML) routines. In this context, Deep Reinforcement Learning (DRL) has shown great potential in addressing complex resource allocation problems. However, DRL-based solutions are inherently hard to explain, which hinders their deployment and use in practice. In this paper, we propose EXPLORA, a framework that provides explainability of DRL-based control solutions for the Open RAN ecosystem. EXPLORA synthesizes network-oriented explanations based on an attributed graph that produces a link between the actions taken by a DRL agent (i.e., the nodes of the graph) and the input state space (i.e., the attributes of each node). This novel approach allows EXPLORA to explain models by providing information on the wireless context in which the DRL agent operates. EXPLORA is also designed to be lightweight for real-time operation. We prototype EXPLORA and test it experimentally on an O-RAN-compliant near-real-time RIC deployed on the Colosseum wireless network emulator. We evaluate EXPLORA for agents trained for different purposes and showcase how it generates clear network-oriented explanations. We also show how explanations can be used to perform informative and targeted intent-based action steering and achieve median transmission bitrate improvements of 4% and tail improvements of 10%.
more »
« less
An offline multi‐scale unsaturated poromechanics model enabled by self‐designed/self‐improved neural networks
Abstract Supervised machine learning via artificial neural network (ANN) has gained significant popularity for many geomechanics applications that involves multi‐phase flow and poromechanics. For unsaturated poromechanics problems, the multi‐physics nature and the complexity of the hydraulic laws make it difficult to design the optimal setup, architecture, and hyper‐parameters of the deep neural networks. This paper presents a meta‐modeling approach that utilizes deep reinforcement learning (DRL) to automatically discover optimal neural network settings that maximize a pre‐defined performance metric for the machine learning constitutive laws. This meta‐modeling framework is cast as a Markov Decision Process (MDP) with well‐defined states (subsets of states representing the proposed neural network (NN) settings), actions, and rewards. Following the selection rules, the artificial intelligence (AI) agent, represented in DRL via NN, self‐learns from taking a sequence of actions and receiving feedback signals (rewards) within the selection environment. By utilizing the Monte Carlo Tree Search (MCTS) to update the policy/value networks, the AI agent replaces the human modeler to handle the otherwise time‐consuming trial‐and‐error process that leads to the optimized choices of setup from a high‐dimensional parametric space. This approach is applied to generate two key constitutive laws for the unsaturated poromechanics problems: (1) the path‐dependent retention curve with distinctive wetting and drying paths. (2) The flow in the micropores, governed by an anisotropic permeability tensor. Numerical experiments have shown that the resultant ML‐generated material models can be integrated into a finite element (FE) solver to solve initial‐boundary‐value problems as replacements of the hand‐craft constitutive laws.
more »
« less
- PAR ID:
- 10386151
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- International Journal for Numerical and Analytical Methods in Geomechanics
- Volume:
- 45
- Issue:
- 9
- ISSN:
- 0363-9061
- Format(s):
- Medium: X Size: p. 1212-1237
- Size(s):
- p. 1212-1237
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Recent research has highlighted the effectiveness of advanced building controls in reducing the energy consumption of heating, ventilation, and air-conditioning (HVAC) systems. Among advanced building control strategies, deep reinforcement learning control (DRL) shows the potential to achieve energy savings for HVAC systems and has emerged as a promising strategy. However, training DRL requires an interactive environment for the agent, which is challenging to achieve with real buildings due to time and response speed constraints. To address this challenge, a simulation environment serving as a training environment is needed, even though the DRL algorithm does not necessarily need a model. The error between the model and the real building is inevitable in this process, which may influence the efficiency of the DRL controller. To investigate the impact of model error, a virtual testbed was established. A high- fidelity Modelica-based model is developed serving as the virtual building. Three reduced-order models (ROMs) (i.e., 3R2C, Light Gradient Boosting Machine (LightGBM) and artificial neural network (ANN) models) were trained with the historical data generated from the virtual building and were embedded in the training environments of DRL. The sensitivity of ROMs and the Modelica model to random and periodical actions were tested and compared. Deploying the policy trained based on a ROM-based environment, which stands for a surrogate model in reality, into the Modelica-based virtual building testing environment, which stands for real-building, is a practical approach to implementing the DRL control. The performance of the practical DRL controller is compared with rule-based control (RBC) and an ideal DRL controller which was trained and deployed both in the virtual building environment. In the final episode with best rewards of the case study, the 3R2C, LightGBM, and ANN-based DRL outperform the RBC by 7.4%, 14.4%, and 11.4%, respectively in terms of the reward, comprising the weighted sum of energy cost, temperature violations, and the slew rate of the control signal, but falls short of the ideal Modelica-based DRL controller which outperforms RBC by 29.5%. The DRL controllers based on data-driven models are highly unstable with higher maximum rewards but much lower average rewards which might be caused by the significant prediction defect in certain action regions of the data-driven model.more » « less
-
null (Ed.)Conventionally, neural network constitutive laws for path-dependent elasto-plastic solids are trained via supervised learning performed on recurrent neural network, with the time history of strain as input and the stress as input. However, training neural network to replicate path-dependent constitutive responses require significant more amount of data due to the path dependence. This demand on diverse and abundance of accurate data, as well as the lack of interpretability to guide the data generation process, could become major roadblocks for engineering applications. In this work, we attempt to simplify these training processes and improve the interpretability of the trained models by breaking down the training of material models into multiple supervised machine learning programs for elasticity, initial yielding and hardening laws that can be conducted sequentially. To predict pressure-sensitivity and rate dependence of the plastic responses, we reformulate the Hamliton-Jacobi equation such that the yield function is parametrized in the product space spanned by the principle stress, the accumulated plastic strain and time. To test the versatility of the neural network meta-modeling framework, we conduct multiple numerical experiments where neural networks are trained and validated against (1) data generated from known benchmark models, (2) data obtained from physical experiments and (3) data inferred from homogenizing sub-scale direct numerical simulations of microstructures. The neural network model is also incorporated into an offline FFT-FEM model to improve the efficiency of the multiscale calculations.more » « less
-
Animal brains evolved to optimize behavior in dynamic environments, flexibly selecting actions that maximize future rewards in different contexts. A large body of experimental work indicates that such optimization changes the wiring of neural circuits, appropriately mapping environmental input onto behavioral outputs. A major unsolved scientific question is how optimal wiring adjustments, which must target the connections responsible for rewards, can be accomplished when the relation between sensory inputs, action taken, environmental context with rewards is ambiguous. The credit assignment problem can be categorized into context-independent structural credit assignment and context-dependent continual learning. In this perspective, we survey prior approaches to these two problems and advance the notion that the brain’s specialized neural architectures provide efficient solutions. Within this framework, the thalamus with its cortical and basal ganglia interactions serves as a systems-level solution to credit assignment. Specifically, we propose that thalamocortical interaction is the locus of meta-learning where the thalamus provides cortical control functions that parametrize the cortical activity association space. By selecting among these control functions, the basal ganglia hierarchically guide thalamocortical plasticity across two timescales to enable meta-learning. The faster timescale establishes contextual associations to enable behavioral flexibility while the slower one enables generalization to new contexts.more » « less
-
This paper proposes the Phy-DRL: a physics-regulated deep reinforcement learning (DRL) framework for safety-critical autonomous systems. The Phy-DRL has three distinguished invariant-embedding designs: i) residual action policy (i.e., integrating data-driven-DRL action policy and physics-model-based action policy), ii) automatically constructed safety-embedded reward, and iii) physics-model-guided neural network (NN) editing, including link editing and activation editing. Theoretically, the Phy-DRL exhibits 1) a mathematically provable safety guarantee and 2) strict compliance of critic and actor networks with physics knowledge about the action-value function and action policy. Finally, we evaluate the Phy-DRL on a cart-pole system and a quadruped robot. The experiments validate our theoretical results and demonstrate that Phy-DRL features guarantee safety compared to purely data-driven DRL and solely model-based design, while offering remarkably fewer learning parameters and fast training towards safety guarantee.more » « less
An official website of the United States government
