<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>RIS-Assisted ABS for Mobile Multi-User MISO Wireless Communications: A Deep Reinforcement Learning Approach</title></titleStmt>
			<publicationStmt>
				<publisher>IEEE</publisher>
				<date>06/09/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10566850</idno>
					<idno type="doi">10.1109/ICC51166.2024.10622221</idno>
					
					<author>Walaa AlQwider</author><author>Aly Sabri Abdalla</author><author>Vuk Marojevic</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[In response to the evolving landscape of wireless communication networks and the escalating demand for unprecedented wireless connectivity performance in the forthcoming 6G era, this paper proposes a new 6G architecture to enhance the wireless network's sum rate performance. Therefore, we introduce an aerial base station (ABS) network with reconfigurable intelligent surfaces (RISs) while leveraging the multi-users multiple-input single-output (MU-MISO) antenna technology. The motivation behind our proposal stems from the imperative to address critical challenges in contemporary wireless networks and harness emerging technologies for substantial performance gains. We employ deep reinforcement learning (DRL) to jointly optimize the ABS trajectories, the active beamforming weights, and the RIS phase shifts. Simulation results show that this joint optimization effectively improves the system's sum rate while meeting minimum quality of service (QoS) requirements for diverse mobile users.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>The rapid growth of wireless communications services has increased the need for advanced wireless technologies. Legacy communication systems often suffer from disruptions in connectivity and inadequate quality of services (QoS) resulting from wireless channel impairments. The reconfigurable intelligent surface (RIS) was introduced <ref type="bibr">[1]</ref> to steer the radio frequency (RF) propagation and, thus, control the channel. A RIS is an engineered, planar meta-surface constructed from numerous passive antenna elements, each of which can be electronically controlled to create controllable radio environments <ref type="bibr">[2]</ref>. The primary knobs of a RIS are its configurable phase shifters, which enable precise control over the RF propagation. By strategically manipulating the phase of incident electromagnetic waves, the RIS can manipulate the signal path, which can lead to improved coverage and communication quality <ref type="bibr">[3]</ref> compared to conventional passive reflectors or antennas.</p><p>The rapid progress of unmanned aerial vehicle (UAV) technologies has motivated research on integrating UAVs into wireless communication networks <ref type="bibr">[4]</ref>. UAVs, when employed as aerial base stations (ABS), offer a promising solution to increase coverage or capacity on the fly. ABSs can often establish line-of-sight (LoS) links with ground users, thus enhancing communication reliability <ref type="bibr">[5]</ref>. The integration of ABSs with RISs to enhance the wireless system performance has attracted considerable attention in the recent years <ref type="bibr">[6]</ref>. Theoretical research has shown significant improvements by leveraging multi-user multiple-input multiple-output (MU-MIMO) technology for RIS deployments.</p><p>There has been growing interest in using RISs and UAVs for mobile networks. However, there is a research gap on the integration of RIS with ABSs in conjunction with MISO-MU systems, particularly for scenarios involving user mobility. The authors of <ref type="bibr">[7]</ref> focus on optimizing the passive beamforming of the RIS and designing the trajectory of the ABS in a wireless environment for a single ground user at a fixed location. Reference <ref type="bibr">[6]</ref> introduces an alternating optimization algorithm to address the complex sum rate maximization problem for RIS-assisted UAV networks. This involves optimizing the UAV trajectory, phase shifter design, and resource allocation for an orthogonal frequency division multiple access (OFDM) system. It is worth noting that the RIS serves only one user at a time and that the user locations are assumed to be fixed. Another related work <ref type="bibr">[8]</ref> tackles the problem of system sum rate maximization in an ABSassisted network with an RIS. The sum rate is improved by jointly optimizing the RIS's phase shifts and ABS's altitude, employing a conjugate gradient particle swarm optimization (CG-PSO) scheme.</p><p>This paper stands at the intersection of several key advancements in wireless communication technologies-ABS, RIS, and MU-MIMO systems. The emphasis is on sum rate maximization while adhering to individual user QoS requirements in terms of minimum data rates by jointly optimizing the active beamforming MU-MISO system, the RIS phase shifts, and the ABS trajectory in a mobile multi-users scenario. Given the complexity of the problem, we propose applying deep reinforcement learning (DRL). The considered users encompass both vehicular and pedestrian users, exhibiting distinct mobility characteristics. By considering the mobility dynamics of these users, we aim to tailor our approach to scenarios where traditional communication systems often fall short, thereby paving the way for a more adaptable and robust communication infrastructure.</p><p>The rest of the paper is organized as follows: Section II introduces the system model, followed by the formulation of the optimization problem. Section IV introduces the user clustering and DRL schemes based on the deep deterministic policy gradient (DDPG) for jointly optimizing the active beamforming, ABS trajectories, and RIS phase shifts. The numerical analysis of Section V shows the effectiveness of the proposed approach. Section VI draws the conclusions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. SYSTEM MODEL</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. System Model</head><p>We investigate a MU-MISO communication system, where the UAV is deployed as an aerial base station (ABS) equipped with M antennas and responsible for delivering downlink communications to multiple mobile ground terminals (GTs). An RIS with L reflecting elements is deployed on one of the surrounding buildings and is leveraged to steer the transmissions originating at the ABS to the GTs that suffer from high blockage or severe interference in their direct channels with the ABS. Figure <ref type="figure">1</ref> depicts this scenario.</p><p>The total flight time T of the ABS is split into N time slots of duration &#948; t = T N . The ABS hovers at q A [n] = [x[n], y[n], z] T &#8704;n &#8712; N at a fixed height z t . For the sake of simplicity and without loss of generality, we do not consider optimizing the ABS height in this paper. The height z t is chosen to enable LoS communication links to ground users unobstructed by obstacles in ABS's proximity <ref type="bibr">[9]</ref>. There are K single antenna GTs served by the ABS, where K &#8804; M .</p><p>Two types of GTs are assumed: vehicular and pedestrian GTs with different mobility models. The location of each GT in time slot n is q k</p><p>The RIS, which is at a fixed location q R = [x r , y r , z r ] T , receives incoming signals and utilizes its configurable reflective elements to redirect these signals toward the K GTs. For the considered MU-MISO communication system, each GT experiences signal reception through one of two primary communication routes: The first is the direct transmission from the ABS to the GT and the second is the indirect transmission through the RIS. The ABS employs its array of M antennas to transmit K distinct data streams to the RIS simultaneously, one for each each GT.</p><p>The communication channel between the M antennas of ABS A and the L reflecting elements of RIS R in time slot n &#8712; N is modeled as a multiple input, multiple output (MIMO) channel and denoted as H AR [n] &#8712; C L&#215;M . The MISO channels between ABS A and GT k and between RIS R and GT k in time slot n &#8712; N are defined as h Ak [n] &#8712; C M &#215;1 and h Rk [n] &#8712; C L&#215;1 , respectively, &#8704;k. We assume that the ABS has perfect knowledge of the channel state information (CSI) and conveys this information to the RIS controller through a dedicated control channel. The L reflecting elements of the RIS are interconnected to form a uniform linear array (ULA) as in <ref type="bibr">[10]</ref>. The phase shift array in time slot n &#8712; N is denoted as &#981;</p><p>is the phase of the l th element.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Channel Model</head><p>We model the air-to-ground communications channel between the ABS and the RISs and GTs using small-scale Rician fading, which includes line of sight (LoS) and non-LoS (NLoS) components <ref type="bibr">[11]</ref>. Equation Without loss of generality, the entries in H N AR are considered to be independent and identically distributed (i.i.d.). These entries are modeled as zero-mean and unit-variance circularly symmetric complex Gaussian (CSCG) variables: CN (0, 1). The LoS channel gain results from the angle of departure (AoD) channel at the ABS and the angle of arrival (AoA) channel at the RIS:</p><p>The AoD channel contribution is The AoA can be calculated as The MISO channels between the ABS and GT k and the RIS and GT k in time slot n are modeled as</p><p>(5) The same CSCG distribution defined earlier is followed by h N Ak and h N Rk . Parameters D Ak [n] and D Ak [n] represent the 3D distance between the ABS and the k th GT and between the RIS and the k th GT in time slot n, respectively.</p><p>The LoS MISO channel components between the ABS and a GT and between the RIS and a GT in time slot n are modeled as</p><p>where</p><p>represent the AoD components of the transmissions originating from the ABS and RIS, respectively. These are determined by &#934;[n] as the azimuth AoD and &#8486;[n] as the elevation AoD.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Data Rate</head><p>The M -antenna ABS serves the single antenna k-th user either directly or through the L-element RIS utilizing the same frequency, employing space-division multiple access (SDMA) and time-division multiple access (TDMA). In continuation we provide the user data rate calculations for both links, the direct and indirect links.</p><p>1) Data Rate of the Direct Link: The ABS simultaneously generates M concurrent beams to K spatially separated users using SDMA. Each user is assigned a dedicated beam vector for transmit beamforming. However, the presence of power leakage between beams within small proximity at the receivers introduces multi-user interference.</p><p>In time slot n,</p><p>represents the downlink transmit signals, where</p><p>is the ABS beamforming vector, and s k [n] is the transmitted information symbol for the k-th user in time slot n. The beamforming or precoding matrix of the ABS has</p><p>The allocated transmit power for the k-th user can be computed as the squared norm of the beamforming vector:</p><p>Therefore, the received signal at the k-th GT through the direct link can be expressed as</p><p>where n k represents the additive white Gaussian noise (AWGN). It is assumed that the noise at each user follows a complex normal distribution of zero-mean and unit variance: <ref type="bibr">1)</ref>. The signal-to-interference-plus-noise-ratio (SINR) at the k-th GT can be calculated as</p><p>where the first term in the denominator corresponds to the multi-user interference of the MISO communications system and &#963; 2 k is the noise variance. The resulting normalized data rate of the k-th GT served via the direct link in time slot n is then obtained as</p><p>) Data Rate of the Indirect Link: The received signal at the k-th GT on the indirect link through the RIS can be expressed as follows:</p><p>The SINR at the k-th GT served through the RIS is</p><p>and the resulting normalized data rate</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Total Data Rate</head><p>Considering the channel models and mobility models discussed in the previous section, the average achievable downlink data rate R k [n] in bits/s/Hz of the k-th GT up to time slot n can be calculated as</p><p>The average sum data rate over all GTs until time slot n is</p><p>III. PROBLEM FORMULATION Incorporating ABSs and RISs together in a dynamic mobility environment enables maintaining reliable multi-user communications despite heterogeneous user mobility patterns. This can be achieved by strategically positioning the ABS to accomplish direct LoS communications with ground users or establish controlled reflected propagation paths through the RIS. By employing an MU-MIMO ABS and an RIS, the optimization parameters are the beamforming matrix W, the phase shifters of the RIS &#981;, and the trajectory of the ABS q A in addition to the decision of which users should be served by the direct link and which should be served by the RIS, captured by U. The optimization problem is formulated as</p><p>C4 : e j&#952; l [n] = 1, &#8704;l, n,</p><p>Expression R min,k in constraint C2 denotes the minimum average data rate required for the k-th GT. This criterion is required to meet the user-specific QoS. Constraint C3 sets a boundary on the maximum allowable transmit power P max for the ABS. Constraint C5 ensures that the ABS does not travel beyond the specified maximum speed limit V max . Constraint C6 establishes the ABS's initial location q A (Initial) and C7 the final location q A (f inal).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. PROPOSED SOLUTION</head><p>The optimization problem ( <ref type="formula">16</ref>) is a non-convex mixedinteger optimization problem, which is known for its inherent complexity. This complexity primarily stems from the binary variable u k [n] and the non-convex nature of the achievable rate function embedded within both the objective function and constraint C2. Additionally, there exist intricate interdependencies among the optimization variables &#981;, q A , W , and U . Notably, the unit modulus constraint imposed on &#981; has been demonstrated to be non-convex <ref type="bibr">[10]</ref>. In addition, the optimization complexity associated with the phase shifts of the RIS directly scales with the number of elements, which is typically large. Therefore, it is imperative to devise optimization solutions that can efficiently handle a large number of reflective elements.</p><p>We propose solving (16) by leveraging data-driven approaches, which have demonstrated their efficacy in solving similar optimization problems <ref type="bibr">[12]</ref>. Initially, we employ kmeans clustering to assign GTs to either direct or indirect links. Subsequently, we employ DDPG to optimize the joint ABS trajectory, beamforming matrix, and RIS phase shifts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. User Clustering</head><p>The objective of user clustering is to partition the K users into two groups, one served directly by the ABS and the other served indirectly through the RIS. We employ K-means clustering, an unsupervised learning technique that maximizes the similarity within groups and the dissimilarity across groups. K-means clustering is known for its computational efficiency compared to alternative techniques such as graph theory, fuzzy c-means clustering, and hierarchical clustering <ref type="bibr">[13]</ref>.</p><p>We use the normalized channel coefficients between GTs and the ABS as the data points for the K-means clustering algorithm. These data points capture the fluctuations in channel gains arising from diverse propagation factors, including small-scale fading and shadow fading. Hence, we can define</p><p>where h no Ak [n] is the normalized channel gain and h Ak [n] is the channel gain between the ABS and k-th GT. Starting with random centroids for the two clusters the K-means algorithm starts to calculate the distances in terms of the normalized channel coefficient between each GT and the two cluster centers to assign each GT to its nearest center. Then, the centroids are updated to minimize the sum of the squared Euclidean distances between a clustered data point and its centroid. The Euclidean distance is the chosen metric in this paper to measure the similarities between data points, other metrics such as Manhattan distance, can be applied, instead.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. DRL-Based Optimizer</head><p>We employ DDPG to tackle the joint optimization of beamforming, ABS trajectory, and RIS phase shifts. DDPG frames the problem as a Markov Decision Process (MDP), where the environment undergoes transitions from one state to another based on the actions taken and governed by transition probabilities.</p><p>1) MDP Settings: The MDP is structured around three components: the state space S, the action space A, and the reward space R. In time slot n, the agent observes the current state s n &#8712; S and, guided by its policy, selects an action a n &#8712; A. Subsequently, the agent transitions into the new state s n+1 and receives reward r n &#8712; R. State: State s n ,</p><p>(18) encompasses a collection of 2K elements that pertain to the CSI and the average data rate of each GT.</p><p>has K+L+2 elements: K elements pertain to beamforming, L elements are associated with phase shifts, while the remaining two elements contribute to defining the trajectory of the ABS. Parameter &#966; represents the UAV's horizontal flight direction, while &#969; captures the distance of movement in this direction. Reward: The reward function considers the average data rate and incorporates two penalties:</p><p>(20) The first penalty considers the discrepancy between the average data rate and the minimum data rate requirement for each GT, satisfying constraint C2. The second penalty considers the distance between the current location of the ABS and its final destination, while also assessing if there is enough remaining time to reach that destination, addressing constraint C7. 2) Deep Deterministic Policy Gradient: DDPG excels at handling complex, high-dimensional action spaces and continuous action domains <ref type="bibr">[14]</ref>. It leverages two deep neural networks (DDN), the actor and critic network, to approximate both the policy and the value function <ref type="bibr">[15]</ref>. This is illustrated in Fig. <ref type="figure">2</ref>. In each time slot n, the DDPG agent gets the output of the clustering algorithm, the previous W and &#981; values, and the CSI for each GT to construct state s n , then feeds s n to the actor network &#8487; to determines action a n , sends a n to the ABS and the RIS controller, which execute the actions, calculates the reward, and generates a record of experience consisting of s n , a n , r n , and the next state s n+1 , or e n = (s n , a n , r n , s t+n ). This experience is sent to a replay buffer of capacity &#8501; so that M = {e 1 , ..., e n , ..., e &#8501; } are used for training the actor and critic networks.</p><p>The actor network weight parameters are updated by taking a mini batch from the replay buffer and applying</p><p>where &#240; a denotes the actor network weights &#8487;(&#240; a | s n ), &#240; &#8224; c denotes the critic network weights, &#1009; a is the learning rate, &#8710; a Q(&#8226;) is the gradient of the target critic network output with reference to the taken action, and &#8710; &#240;a &#8487;(&#8226;) is the gradient of the training actor network with respect to &#240; a . The updates of the training critic network are obtained as</p><p>Parameter &#8467;(&#240; c ) is the loss function of the training critic network and can be calculated as</p><p>where &#227; is the agent's action that follows the deterministic policy drafted by the target actor network.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. NUMERICAL ANALYSIS</head><p>We evaluate the performance of the proposed scheme through simulations. The K GTs are distributed within an urban environment that features an intersection with vehicular users, a sidewalk with pedestrians, and a park alongside the road. Moreover, the environment is characterized by a multitude of high-rise buildings that may obstruct the signals from the ABS to certain GTs. The ABS maintains a fixed altitude of 100 m. The RIS is mounted on a building at a height of 70 m and faces the park.</p><p>The DDPG agent is constructed with the critic and actor networks employing the same DNN architecture. This architecture consists of six layers, including the input layer, four fully connected hidden layers, and the output layer. The input layer's dimension is set to 2K, matching the state dimension. The hidden layers have 600, 400, 200, 100 neurons. The output layer of the actor network has a dimension of 2L + 2M + 2, where 2L represents the real and imaginary components of the complex phase shifts for the L-element RIS, 2M captures the complex beamforming weights of the M -antenna ABS, and the remaining two elements handle the ABS trajectory. All layers in both DNNs utilize the (tanh) activation function and the Adam optimizer.</p><p>We assess the performance of the clustering step by introducing Baseline 1, which mirrors the proposed scheme but excludes the clustering. Baseline 2 implements the DDPG agent to solely optimize the RIS phase shifts and the ABS trajectory, while the active beamforming is constant and identical for all users. Baseline 3 assumes fixed RIS phase shifts, while employing the DDPG agent to optimize the active beamforming matrix and ABS trajectory. We analyze the convergence of the reward function for our proposed scheme and the baselines and evaluate the achievable average system sum data rate and the 5th percentile data rate, which represents the minimum data rate achieved by 95% of the users.</p><p>Figure <ref type="figure">3a</ref> illustrates the average episode rewards over training episodes for our proposed solution and the baseline schemes with 32 antennas, 32 GTs and 40 RIS elements. The models were trained over 100 episodes, with each episode comprising 10,000 time steps N . The results illustrate that the proposed DDPG agent achieves higher rewards compared to any of the baseline schemes. However, it takes longer to converge when compared to the scheme without beamforming and the one without phase shift optimizations. This is attributed to the fact that in these two schemes, a smaller number of elements are being optimized, resulting in a quicker convergence time. The convergence time of the scheme without clustering is similar to the proposed solution with clustering, but it achieves a lower reward.</p><p>Figure <ref type="figure">3b</ref> presents the average system sum rate achieved by the proposed solution and the baseline schemes as a function of the reflecting elements with 32 antennas and 32 GTs. The results illustrate that as the number of RIS elements increases, the average sum rate improves for all schemes. Furthermore, the scheme that does not optimize the RIS phase shifts performs worst, as also observed in the previous result. This emphasizes the importance of the RIS for improving the network sum rate. These results also show the importance of active beamforming over clustering. Figure <ref type="figure">3c</ref> displays the 5th percentile rate as a function of the number of RIS elements with 32 antennas and 32 GTs for the proposed DDPG-based scheme and the considered baselines. The scenario demands a QoS of R min = 2 bps/Hz, which represents the minimum data rate target for each user. Both the proposed solution and the no clustering scheme ensure that 95% of the users achieve data rates exceeding R min even with a relatively low number of RIS elements. Without proper beamforming, approximately 40 RIS elements are needed for meeting the 5th percentile rate of 2 bps/Hz, whereas without proper RIS phase shift optimization, 100 RIS elements are necessary to satisfy this threshold. This highlights the effectiveness of the proposed scheme and the tradeoff between phase shift optimization and the number of RIS resources.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. CONCLUSIONS</head><p>In this paper, we investigate the use of DRL to jointly optimize active beamforming, ABS trajectory, and RIS phase shifts in a MU-MISO communication system. Employing user clustering to serve them by the ABS directly or through the RIS is a simple yet effective scheme to improve sum rate performance, of the CSI is known. Most important, however, is the RIS phase shift optimization, followed by active beamforming. While taking longest to converge, the proposed DRL scheme outperforms the simpler solutions. Future work will further analyze the performance of the proposed solution in different scenarios, the complexity-performance tradeoff, and the scalability with ground and aerial base stations.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Authorized licensed use limited to: Mississippi State University Libraries. Downloaded on January 17,2025 at 21:48:12 UTC from IEEE Xplore. Restrictions apply.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2024" xml:id="foot_1"><p>IEEE International Conference on Communications (ICC): SAC Aerial Communications Track</p></note>
		</body>
		</text>
</TEI>
