skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Deep Reinforcement Learning Approach to Sensor Placement under Uncertainty
Optimal sensor placement is critical for enhancing the effectiveness of monitoring dynamical systems. Deterministic solutions do not reflect the effects of input and parameter uncertainty on the sensor placement. Using a Markov decision process (MDP) and a sensor placement agent, this study proposes a stochastic approach to maximize the gain from placing a fixed number of sensors within the system. Utilizing Deep Reinforcement Learning (DRL), the agent is trained by collecting interactive samples within the environment, which uses an information-theoretic reward function that is a measure, based on Shannon entropy, of the identifiability of the model parameters. The goal of the agent is to maximize its expected future reward by selecting, at each step, the action (placing a sensor) that provides the most information. This framework is validated using a synthetic model of a base-isolated structure. To consider the existing uncertainty in the parameters, a prior probability distribution is chosen (e.g., based on expert judgement or preliminary study) for each model parameter. Further, a probabilistic model for the input is used to reflect input variability. In a Deep Q-network, a type of DRL algorithm, the agent learns a mapping from states (i.e., sensor configurations) to the "quality" of each action at that state, called "Q-values". This network is trained using samples of state, action, and reward by interacting with the environment. The modular property of the framework and the function approximation used in this study makes it scalable to complex real-world applications of sensor placement problems in the presence of uncertainties.  more » « less
Award ID(s):
1663667
PAR ID:
10654015
Author(s) / Creator(s):
;
Publisher / Repository:
IFAC
Date Published:
Journal Name:
IFAC-PapersOnLine
Volume:
55
Issue:
27
ISSN:
2405-8963
Page Range / eLocation ID:
178 to 183
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Human experience involvement in existing operations of airborne Light Detection and Ranging (LIDAR) systems and off-line processing of collected LIDAR data make the acquisition process of airborne LIDAR point cloud less adaptable to environment conditions. This work develops a deep reinforcement learning-enabled framework for adaptive airborne LIDAR point cloud acquisition. Namely, the optimization of the airborne LIDAR operation is modeled as a Markov decision process (MDP). A set of LIDAR point cloud processing methods are proposed to derive the state space, action space, and reward function of the MDP model. A DRL algorithm, Deep Q-Network (DQN), is used to solve the MDP. The DRL model is trained in a flexible virtual environment by using simulator AirSim. Extensive simulation demonstrates the efficiency of the proposed framework. 
    more » « less
  2. In this work, we propose an energy-adaptive moni-toring system for a solar sensor-based smart animal farm (e.g., cattle). The proposed smart farm system aims to maintain high-quality monitoring services by solar sensors with limited and fluctuating energy against a full set of cyberattack behaviors including false data injection, message dropping, or protocol non-compliance. We leverage Subjective Logic (SL) as the belief model to consider different types of uncertainties in opinions about sensed data. We develop two Deep Reinforcement Learning (D RL) schemes leveraging the design concept of uncertainty maximization in SL for DRL agents running on gateways to collect high-quality sensed data with low uncertainty and high freshness. We assess the performance of the proposed energy-adaptive smart farm system in terms of accumulated reward, monitoring error, system overload, and battery maintenance level. We compare the performance of the two DRL schemes developed (i.e., multi-agent deep Q-Iearning, MADQN, and multi-agent proximal policy optimization, MAPPO) with greedy and random baseline schemes in choosing the set of sensed data to be updated to collect high-quality sensed data to achieve resilience against attacks. Our experiments demonstrate that MAPPO with the uncertainty maximization technique outperforms its counterparts. 
    more » « less
  3. Recent research has highlighted the effectiveness of advanced building controls in reducing the energy consumption of heating, ventilation, and air-conditioning (HVAC) systems. Among advanced building control strategies, deep reinforcement learning control (DRL) shows the potential to achieve energy savings for HVAC systems and has emerged as a promising strategy. However, training DRL requires an interactive environment for the agent, which is challenging to achieve with real buildings due to time and response speed constraints. To address this challenge, a simulation environment serving as a training environment is needed, even though the DRL algorithm does not necessarily need a model. The error between the model and the real building is inevitable in this process, which may influence the efficiency of the DRL controller. To investigate the impact of model error, a virtual testbed was established. A high- fidelity Modelica-based model is developed serving as the virtual building. Three reduced-order models (ROMs) (i.e., 3R2C, Light Gradient Boosting Machine (LightGBM) and artificial neural network (ANN) models) were trained with the historical data generated from the virtual building and were embedded in the training environments of DRL. The sensitivity of ROMs and the Modelica model to random and periodical actions were tested and compared. Deploying the policy trained based on a ROM-based environment, which stands for a surrogate model in reality, into the Modelica-based virtual building testing environment, which stands for real-building, is a practical approach to implementing the DRL control. The performance of the practical DRL controller is compared with rule-based control (RBC) and an ideal DRL controller which was trained and deployed both in the virtual building environment. In the final episode with best rewards of the case study, the 3R2C, LightGBM, and ANN-based DRL outperform the RBC by 7.4%, 14.4%, and 11.4%, respectively in terms of the reward, comprising the weighted sum of energy cost, temperature violations, and the slew rate of the control signal, but falls short of the ideal Modelica-based DRL controller which outperforms RBC by 29.5%. The DRL controllers based on data-driven models are highly unstable with higher maximum rewards but much lower average rewards which might be caused by the significant prediction defect in certain action regions of the data-driven model. 
    more » « less
  4. This paper proposes the Phy-DRL: a physics-regulated deep reinforcement learning (DRL) framework for safety-critical autonomous systems. The Phy-DRL has three distinguished invariant-embedding designs: i) residual action policy (i.e., integrating data-driven-DRL action policy and physics-model-based action policy), ii) automatically constructed safety-embedded reward, and iii) physics-model-guided neural network (NN) editing, including link editing and activation editing. Theoretically, the Phy-DRL exhibits 1) a mathematically provable safety guarantee and 2) strict compliance of critic and actor networks with physics knowledge about the action-value function and action policy. Finally, we evaluate the Phy-DRL on a cart-pole system and a quadruped robot. The experiments validate our theoretical results and demonstrate that Phy-DRL features guarantee safety compared to purely data-driven DRL and solely model-based design, while offering remarkably fewer learning parameters and fast training towards safety guarantee. 
    more » « less
  5. We consider the problem of spectrum sharing by multiple cellular operators. We propose a novel deep Reinforcement Learning (DRL)-based distributed power allocation scheme which utilizes the multi-agent Deep Deterministic Policy Gradient (MA-DDPG) algorithm. In particular, we model the base stations (BSs) that belong to the multiple operators sharing the same band, as DRL agents that simultaneously determine the transmit powers to their scheduled user equipment (UE) in a synchronized manner. The power decision of each BS is based on its own observation of the radio environment (RF) environment, which consists of interference measurements reported from the UEs it serves, and a limited amount of information obtained from other BSs. One advantage of the proposed scheme is that it addresses the single-agent non-stationarity problem of RL in the multi-agent scenario by incorporating the actions and observations of other BSs into each BS's own critic which helps it to gain a more accurate perception of the overall RF environment. A centralized-training-distributed-execution framework is used to train the policies where the critics are trained over the joint actions and observations of all BSs while the actor of each BS only takes the local observation as input in order to produce the transmit power. Simulation with the 6 GHz Unlicensed National Information Infrastructure (U-NII)-5 band shows that the proposed power allocation scheme can achieve better throughput performance than several state-of-the-art approaches. 
    more » « less