<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>REMARKABLE: RIS-Enabled Mobile Beamforming through Kernalized Bandit Learning</title></titleStmt>
			<publicationStmt>
				<publisher>ACM</publisher>
				<date>10/23/2025</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10651736</idno>
					<idno type="doi">10.1145/3704413.3764443</idno>
					
					<author>Kubra Alemdar</author><author>Arnob Ghosh</author><author>Vini Chaudhary</author><author>Ness Shroff</author><author>Kaushik Chowdhury</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Mobile Robots (MRs), typically equipped with single-antenna radios, face many challenges in maintaining reliable connectivity established by multiple wireless access points (APs). These challenges include the absence of direct line-of-sight (LoS), ineffective beam searching due to the time-varying channel, and interference constraints. This paper presents REMARKABLE, an online learning based adaptive beam selection strategy for robot connectivity that trains kernelized bandit model directly in real-world settings of a factory floor. REMARKABLE employs reconfigurable intelligent surfaces (RISs) with passive reflective elements to create beamforming toward target robots, eliminating the need for multiple APs. We develop a method to create a beamforming codebook, reducing the search space complexity. We also develop a reconfigurable rotational mechanism to expand RIS coverage by rotating its projection plane. To address non-stationary conditions, we adopt the bandit over bandit idea that employs adaptive restarts, allowing the system to forget outdated observations and safely relearn the optimal interference-constrained beam. We show that our approach achieves a dynamic regret and the violation bound of Õ(T^(3/4)B^(1/4)) where T is the total time, and B is the total variation budget which captures the total changes in the environment without even assuming the knowledge of B. Finally, experimental validation with custom-designed RIS hardware and mobile robots demonstrates 46.8% faster beam selection and 94.2% accuracy, outperforming classical methods across diverse mobility settings.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Low</head><p>High on wall on top RIS a) b) Tx SDR with Directional Ant. User Robot Rx SDR with Omni-direct Ant. Best Selected Beam Interference to 2 nd User Legend RIS Selected Beam Backhaul channel (observations) [&#952; 1 ,&#8230;, &#952; 4 ] Beam Steering c) user user AP AP-1 AP-2 Rotational mechanism Software-defined RIS Controller 1 Introduction Industry 4.0 is set to revolutionize manufacturing and industrial services through the digital transformation of the field, enabling real-time decision-making and automation <ref type="bibr">[1]</ref>. Technologies such as the Internet of Things (IoT), artificial intelligence (AI), cloud connectivity, large-scale machine-to-machine communication (M2M), and networked, mobile robotics are central to the future of manufacturing <ref type="bibr">[2]</ref>. These robots can collect and analyze data, making autonomous decisions with minimal human intervention. However, such a robot-enabled network architecture demands ultra-fast data transfer speeds, exceptional reliability, and minimal latency <ref type="bibr">[3]</ref>. While network densification is a possible path to achieve these goals, it involves a significant cost overhead <ref type="bibr">[4]</ref>, and thus, network designers need to trade-off permanent infrastructure installations with reconfigurable platforms that can adapt to robot mobility over time. Aside of mobility, the harsh propagation environment within the factory floors increases blockages, results in limited coverage, and significant path loss. To overcome these challenges, recent studies have shown that programmable wireless environments that enable reconfigurability by shaping signal reflections can improve signal-to-noise ratio (SNR) <ref type="bibr">[5]</ref> and expand coverage <ref type="bibr">[6]</ref>.</p><p>In REMARKABLE, we realize such a network architecture with low-mobility MRs and multiple wireless APs deployed in a robotic factory floor within a rich-scattering environment. Network reconfigurability for enhancing the APs' coverage is achieved by controlling the propagation environment using software-controlled reconfigurable intelligent surfaces (RISs) <ref type="bibr">[7]</ref><ref type="bibr">[8]</ref><ref type="bibr">[9]</ref>. This ensures reliable connectivity for low-mobility MRs by creating a radio environment that adapts with the MR location, as in Fig. <ref type="figure">1a-b</ref>. Yet, this requires a solution to the problem of adaptive beam selection in dynamic channel conditions between the AP and the MR, as depicted in Fig. <ref type="figure">1c</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1">Factory-Floor Networking Challenges</head><p>&#8226; Problem 1 (blockage and coverage): Consider Fig. <ref type="figure">1a</ref>, showing a factory floor where the LoS signal is blocked by obstacles. In absence of LoS conditions, the MR relies on the strongest nonline-of-sight (NLoS) reflection to establish a communication link with the AP. NLoS multipath components can cause destructive interference due to uncontrollable phase reflections, leading to significant communication disruption. This results in significant received signal strength (RSS) fluctuations with small robot movements. This issue is exacerbated in single-antenna equipped MRs. Deploying multiple APs as in Fig. <ref type="figure">1a</ref>, ensures LoS links and expands coverage but increases communication overhead and infrastructure complexity.</p><p>&#8226; Problem 2 (mobility and beam searching): Narrow beams formed via phased antenna arrays can mitigate propagation loss, as well as improve signal reception through increased SNR. Typically, these beams are formed by adjusting antenna element weights, with steering directivity and beamwidth defined through a codebook. The APs equipped with such capabilities exhaustively sweep over the beams in a codebook to discover the optimal beam with the highest signal strength. However, exhaustive beam searches create significant overhead, and MR mobility requires repeated searches to maintain connectivity.</p><p>&#8226; Problem 3 (interference): Even with APs forming highly directional beams towards MRs, in a heterogeneous environment with multiple APs, the close proximity of MRs can cause excessive interference , degrading network performance <ref type="bibr">[10]</ref>. Therefore, beam selection must be judiciously performed, as some beam candidates may not be suitable for data transmission.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2">Proposed Approach</head><p>Our approach aims to tackle problems 1, 2, and 3 for MR connectivity in factory floor settings by achieving the following steps:</p><p>1) We design a passive beamformer using an RIS, a planar array of passive reflective elements, each configured to adjust the amplitude and phase of incident signals. This allows us to create various beam patterns, which are then used to form a beam codebook for the beam steering (see Fig. <ref type="figure">1c</ref>). To address Problem 1 (see Fig. <ref type="figure">1b</ref>), several practical challenges must be considered: (i) Instead of relying on Channel State Information (CSI), our approach uses a predefined codebook where each codeword corresponds to the weights of RIS elements. This is necessary because RIS elements are passive and lack radio chains, making traditional channel estimation impractical. Estimating each channel component-would be proportional to the number of reflective elements, would create extreme overhead. Therefore, we develop a method for creating the desired reflection beam pattern by using a non-uniform phase sampling technique, optimizing each element's reflection gain while considering incident and reflection signals. ii) In a planar RIS, edge elements of RIS contribute less to beamforming, limiting overall gain-especially in dynamic settings like mobile robots. To address this challenge, we propose a novel reconfigurable rotation mechanism that adjusts the pitch and roll angles along the RIS's local coordinate axes, effectively enhancing beam coverage and improving performance. Given a fixed AP location, this method requires reflective beam pattern synthesis w.r.t the new angular domain of RIS. Consequently, we generate a multi-level codebook, each level corresponding to a specific pair of rotational angles.</p><p>2) We study beam selection using codebooks derived from reflective beam-pattern synthesis. Our goal is to learn online the optimal beam from the RIS to the MR by casting the task as a kernelized multi-armed bandit (MAB), with each codeword as an arm. To address Problem 2 and Problem 3, we impose an interference constraint at a neighboring MR. The objective is to select beams that maximize RSS at the target MR subject to this constraint over a time-varying channel. We model cross-beam correlations with a Gaussian Process (GP) bandit and propose a primal-dual GP-Upper Confidence Bound (UCB) algorithm to balance exploration and exploitation while enforcing the interference constraint. To handle non-stationarity, we add an adaptive restart mechanism inspired by the bandit-overbandit framework, which dynamically tunes the restart interval from feedback. REMARKABLE is theoretically grounded and validated on a real RIS-enabled robotic testbed-unlike prior theoretical works <ref type="bibr">[11,</ref><ref type="bibr">12]</ref>, which remain untested in practice, and existing RIS implementations <ref type="bibr">[5,</ref><ref type="bibr">13,</ref><ref type="bibr">14]</ref>, which predominantly target static scenarios.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.3">Summary of Contributions</head><p>(1) We create an RIS codebook with beam patterns in multiple directions, enabling the online learning algorithm to find the best beam without channel estimation. Additionally, we introduce a reconfigurable rotational mechanism to expand RIS coverage. (2) We formulate beam selection for an MR as a primal-dual GP-UCB framework to maximize signal strength while avoiding interference. To address the time-varying or non-stationarity, we adopt "bandit over bandit" concept restart strategy, which adaptively forgets past data by tuning the restart interval via an adversarial bandit. Our method achieves sub-linear dynamic regret and constraint violation without prior knowledge of budget variations, safely learning beam selection even under a time-varying channel. (3) We show that we achieve &#213; (&#119861; 1/4 &#119879; 3/4 ) regret and &#213; (&#119861; 1/4 &#119879; 3/4 ) violation bound where &#119861; indicates the total change in the environment (i.e., the change in the reward and the constraint, defined in Sec. 6.3.1) over &#119879; time steps. We improve the existing bound of &#213; (&#119861;&#119879; 3/4 ) dynamic regret and the dynamic violation bound achieved in <ref type="bibr">[12]</ref>. (4) We demonstrate REMARKABLE in a real-world setting using USRP X310-B210 SDRs (Software Defined Radios), with MRs and a PCB-fabricated RIS, as shown in Fig. <ref type="figure">1c</ref>. Our results show that REMARKABLE achieves 46.8% improved performance over classical methods with 94.2% selection accuracy. <ref type="bibr">(5)</ref> We release the software pipeline for the online learning framework and the RIS configuration-orchestration software <ref type="bibr">[15]</ref>.</p><p>2 Related Work</p><p>&#8226; RIS &amp; Smart Surface: RIS technology and similar concepts like metasurfaces have recently been proposed to enhance applications</p><p>(a) 0 4 8 12 16 20 Locations -90 -75 -60 Average Power (dBm) AP1-no_obstacle AP2-no_obstacle AP1-with_obstacle AP2-with_obstacle (b) such as security <ref type="bibr">[16]</ref>, virtual reality <ref type="bibr">[17]</ref>, localization and sensing <ref type="bibr">[18]</ref>, beamforming <ref type="bibr">[5,</ref><ref type="bibr">19]</ref>, and over-the-air aggregation <ref type="bibr">[20]</ref>. Recent research has focused on optimizing transceivers and RIS phase-shifts to minimize signal distortion <ref type="bibr">[19,</ref><ref type="bibr">20]</ref>, especially with imperfect CSI estimation <ref type="bibr">[21]</ref>. However, these methods assume knowledge of wireless channels, which is challenging due to RIS's limited processing capabilities. Additionally, some practical works <ref type="bibr">[5,</ref><ref type="bibr">13,</ref><ref type="bibr">14,</ref><ref type="bibr">18,</ref><ref type="bibr">22]</ref> rely on real-time channel estimation, causing overhead proportional to system size and requiring fast feedback. In contrast, REMARKABLE uses a predefined RIS codebook, avoiding such overhead. Similar to our approach, other works consider configuring RIS elements with pre-defined coding patterns <ref type="bibr">[20,</ref><ref type="bibr">23]</ref> and leveraging an extra degree of freedom by optimizing rotation of RIS plane/elements to improve system performance <ref type="bibr">[24,</ref><ref type="bibr">25]</ref>. In fact, the work <ref type="bibr">[11]</ref> proposes a hierarchical codebook-generating method using pattern synthesis, followed by a beam training method using the two-mainlobe codewords from the designed codebook for beam sweeping. Unlike these stationary setups that use exhaustive beam searching, REMARKABLE offers a novel method for faster beam selection, even considering mobility.</p><p>&#8226; Beam Selection with Bandit: Online Learning (OL), particularly MAB frameworks, has become prominent for beam selection due to its inherent ability to balance exploration and exploitation. Standard MAB frameworks utilized in beam selection cannot capture the correlation among beam directions. The authors in <ref type="bibr">[26]</ref><ref type="bibr">[27]</ref><ref type="bibr">[28]</ref><ref type="bibr">[29]</ref> leverage contextual information to exploit such correlations. However, these papers do not consider the time-varying channel and interference constraints that we considered, assuming quasi-static channels; thereby, driven models cannot capture time-varying channels, most likely mapped to real-world settings. A recent kernelized MAB approach <ref type="bibr">[12]</ref> addresses time-varying, interference-constrained channels but neglects RIS settings and lacks real-world implementation. Our work explicitly incorporates RIS, demonstrates improved theoretical bounds compared to <ref type="bibr">[12]</ref>, and provides experimental validation in practical scenarios.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Motivation for Designing REMARKABLE</head><p>Before designing REMARKABLE, we conduct preliminary experiments in a factory floor use-case to investigate Problem 1.</p><p>&#8226; Experimental Setup: Consider a scenario where low-mobility MRs roam a factory floor to complete assigned tasks (see Fig. <ref type="figure">5</ref>). We use a Turtlebot2 robot, which navigates the floor and stops at target locations to collect data. The data collection part is obtained at the 5GHz band with 20MHz bandwidth signal by SDR X310 radios equipped with omni-directional VERT2450 tx-rx antennas, where one of them is mounted on the Thurtlebot, while the other two are placed in designated areas in the environment as APs to communicate with the MR.</p><p>&#8226; Observation: We conducted two factory-floor experiments-one with obstacles and one without-while the MR navigated and RSS was measured from each AP at target locations. As shown in Fig. <ref type="figure">2a</ref>, the AP1-MR channel exhibits frequency-selective fading that varies with MR position, whereas Fig. <ref type="figure">2b</ref> shows AP2 providing better coverage in regions where AP1 is weak, even without obstacles. However, with obstacles, neither AP ensures reliable coverage, indicating the need for additional APs. This, in turn, introduces interference management challenges and increases overhead in terms of coordination and communication resources (e.g., bandwidth).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">REMARKABLE Codebook Design</head><p>We aim to create a codebook of beam patterns by optimizing the phases of RIS's reflective elements to achieve the desired reflections. We start by looking at a scenario with a single AP and a single RIS. The RIS is a planar array with &#119873; &#215;&#119873; passive reflective elements that can be configured for complex-valued amplitude and phase changes. Moreover, each generated codebook should consist of beams with predefined beam resolution and cover desired angular space.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Beam Steering Design</head><p>With &#119873; &#215; &#119873; layout RIS, we can derive the far-field reflection gain pattern of the surface w.r.t a specific target angle as:</p><p>where &#119906; &#119894; = &#119896;&#119889; &#119909; &#119888;&#119900;&#119904;&#120601; &#119894; &#119904;&#119894;&#119899;&#120579; &#119894; and &#119907; &#119894; = &#119896;&#119889; &#119910; &#119904;&#119894;&#119899;&#120601; &#119894; &#119904;&#119894;&#119899;&#120579; &#119894; are &#119906;-&#119907; space coordinates when the source signal contacts on different reflective elements from AP with azimuth and elevation angle of &#120601; &#119894; and &#120579; &#119894; . Similarly, the term &#119906; &#119903; = &#119896;&#119889; &#119909; &#119888;&#119900;&#119904;&#120601; &#119903; &#119904;&#119894;&#119899;&#120579; &#119903; and &#119907; &#119903; = &#119896;&#119889; &#119910; &#119904;&#119894;&#119899;&#120601; &#119903; &#119904;&#119894;&#119899;&#120579; &#119903; represent when the signal reflects from the surface towards the target with the azimuth and elevation angle, &#120601; &#119903; and &#120579; &#119903; , respectively. The reflective elements are placed in half wavelengths along the x and y directions, &#119889; &#119909; = &#119889; &#119910; = &#120582;/2, also &#119896; = 2&#120587;/&#120582;, and &#120582; is the wavelength of the operational frequency. Eventually, the signal path follows the incident angle and impacts the surface. Then, on the surface, it will be perturbed by the configurations of reflective elements. This is represented by the term of reflection coefficient, &#915; &#119898;&#119899; = |&#915; &#119898;&#119899; |&#119890; &#119895;&#934; &#119898;&#119899; , wherein the complex-valued amplitude and phase changes are applied to the incident signal. Assuming the reflection magnitude of all reflective elements is unity, i.e., |&#915; &#119898;&#119899; | = 1. Additionally, we denote &#119866; &#119898;&#119899; (&#120601; &#119894; , &#120579; &#119894; ) as radiated gain per reflective element defined as</p><p>Here, &#119865; &#119890; is obtained from estimated full-wave simulation in the Ansys HFSS 3D electromagnetic simulator. By re-forming Eq.1, we can transform reflection pattern to:</p><p>(2)</p><p>where &#934; &#119904; &#119898;&#119899; = &#934; &#119898;&#119899; -(&#119898;&#119906; &#119894; +&#119899;&#119907; &#119894; ) is constant due to fixed locations of AP and the RIS. From the sampling theory for 2D periodic functions, the reflection array complex weights, &#119860; &#119898;&#119899; can be obtained from the samples of its radiation pattern |&#119865; (&#119906; &#119903; , &#119907; &#119903; )| as follows: &#119860; &#119898;&#119899; (&#120601; &#119894; , &#120579; &#119894; ) = (&#119873; -1)/2 &#8721;&#65025; &#119901;,&#119902;=-(&#119873; -1)/2</p><p>In Eq.4, &#119906; &#119901; , &#119907; &#119902; are sampling points for the RIS, where &#119906; &#119901; = 2&#120587;&#119901;/&#119873; and &#119907; &#119902; = 2&#120587;&#119902;/&#119873; . Also, &#119898; &#8242; = &#119898;-(&#119873; -1)/2 and &#119899; &#8242; = &#119899;-(&#119873; -1)/2 for &#119898;, &#119899; &#8712; [0, &#119873; -1]. In this method, we assign nonuniform phases to the radiation patterns of RIS at the sampling points via &#119865; (&#119906; &#119901; , &#119907; &#119902; ) = |&#119865; (&#119906; &#119901; , &#119907; &#119902; )|&#119890; &#119895;&#934; &#119901;&#119902; <ref type="bibr">[31]</ref>, where &#934; &#119901;&#119902; is the phase assigned to sampling points (&#119906; &#119901; , &#119907; &#119902; ). Then, we can find optimized phase values needed to steer the beam in the intended direction by minimizing the Mean Square Error (MSE) function between desired, &#206;&#119898;&#119899; , and generated, &#119868; &#119898;&#119899; , power of reflection array weights such as</p><p>&#119899;=0 (&#119868; &#119898;&#119899; -&#206;&#119898;&#119899; ) 2 . We apply Gradient Descent (GD) <ref type="bibr">[32]</ref> optimization method to minimize &#119872;&#119878;&#119864;. First, we define the gradient of &#119872;&#119878;&#119864;, &#120597;&#119872;&#119878;&#119864;/&#120597;&#934; &#119901;&#119902; w.r.t non-uniform sampling points of the regenerated beam pattern, which is calculated by chain rule as in</p><p>Here, both derivatives in the chain are derived independently, then we can have:</p><p>&#119872; &#119898;&#119899; &#119890; &#119895; &#934; &#119904; &#119898;&#119899; +&#934; &#119891; <ref type="bibr">(5)</ref> where &#119872; &#119898;&#119899; = &#119886; &#119898;&#119899; (&#119886; 2 &#119898;&#119899; -&#206;&#119898;&#119899; ) and &#934; &#119891; = &#119898; &#8242; &#119906; &#119901; + &#119899; &#8242; &#119907; &#119901; . Since Eq.5 is a form of Fourier Transform, we utilize the Fast Fourier Transform (FFT) techniques to calculate gradients. Then at each iteration, new non-uniform phase samples are found as:</p><p>where &#120578; &#119892; is the learning rate of the GD optimizer, determining the step size to converge it to the optimal point. After obtaining optimized non-uniform phase samples, we can find the phase distribution of the surface, &#934; &#119898;&#119899; , by considering the amplitude and assigned phases, &#934; &#119901;&#119902; , of the radiation pattern &#119865; (&#119906; &#119901; , &#119907; &#119902; ) through Eq.4. Fig. <ref type="figure">3</ref> compares the desired and generated pattern at steered angles of (40&#176;,30&#176;), showing that desired pattern can be achieved by our method, albeit with some increased side slopes due to the effect of quantization.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Beam Coverage Design</head><p>Here, we address RIS's limitation in terms of its angular coverage and propose a solution to mitigate this by integrating a reconfigurable rotational mechanism.  <ref type="figure">4b</ref>). We propose enhancing RIS coverage by adding a reconfigurable rotational mechanism to adjust its orientation through pitch and roll angles. The yaw angle is not used as the z-axis is the projection axis of RIS. Implementing this method necessitates rebuilding the codebook structure and reformulating the beam shaping and steering process. and</p><p>. Notably, we assume the placement of AP satisfies the far-field conditions <ref type="bibr">[33]</ref>. Consequently, the incident angle on each reflective element is given as &#120601; &#119894; and &#120579; &#119894; for a planar array surface. Each element position vector defined as &#119903; &#119898;&#119899; = [&#119898;&#119889; &#119909; , &#119899;&#119889; &#119910; , 0] &#8242; , &#119903; &#119898;&#119899; &#8712; &#119877; 3    &#120590;, &#120590; = &#120587; 2 &#119887; -1 (7) Fig. 4e shows the output of quantization corresponding to Fig.4d. 4.3.2 Codebook Design. For &#119877; rotational angles, the size of the codebook &#119878; &#119903; , &#119903; &#8712; &#119877; is given by</p><p>considering a 3D beam scanning. Here, &#119894; is a correlation constant that increases correlation between neighboring beams in &#119862; &#119903; , while &#120581; denotes RIS angular coverage (e.g., 2&#120587;/3 within [-&#120587;/3, &#120587;/3] for our design). Each beam code in &#119862; &#119903; consists of &#119873; &#215; &#119873; codewords representing the required phase configurations of reflective elements. Implementing mechanical rotation introduces a controlled overhead (2.4 ms/angle). In our experiments, rotations occur infrequently-primarily when robot mobility substantially alters the optimal RIS orientation and demands broader angular coverage-thus balancing the overhead against the achieved coverage gains.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">REMARKABLE Model 5.1 System Model</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.1">RIS-based Channel Model.</head><p>In REMARKABLE, we consider the scenario in Fig. <ref type="figure">1c</ref>, involving an AP with &#119870; antennas, a singleantenna MR, and an RIS with &#119860; = &#119873; &#215; &#119873; passive reflective elements. Each element adjusts the amplitude and phase of the incident signal. Before data transmission, optimal beam selection through AP-RIS and RIS-MR channels is needed for array gain and high throughput, assuming no direct AP-MR link due to the cluttered factory floor. Let &#119908; &#119886; &#8712; C represent the effect of RIS element &#119886; on the reflected signal; &#119882; &#8712; &#119862; be the beamforming weight vector from the predefined codebook &#119862;. The received signal at MR through the RIS for the transmitted pilot signal &#119909; &#119905; at the &#119905; &#119905;&#8462; time slot is:</p><p>Let &#119934; &#119957; = diag[&#119908; 1 , ..., &#119908; &#119886; , ..., &#119908; &#119860; ] &#8712; C &#119860;&#215;&#119860; , &#119945; &#8242; &#8712; C &#119870; &#215;&#119860; , and &#119945; &#8242;&#8242; &#8712; C &#119860;&#215;1 be complex-valued matrices and vectors, with elements representing channel coefficients between the AP and RIS element &#119886; and between &#119886; and MR, respectively. &#119899; &#119905; &#8712; C &#119870; &#215;1 is the additive white Gaussian noise (AWGN), with &#119899; &#119905; &#8764; CN (0, &#120590; 2 &#119920; &#119870; ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.2">Time-varying Channel Model.</head><p>In our setup, the AP and RIS are fixed, while the robots are mobile. Thus, &#8462; &#8242; is quasi-static channel with coherence time &#119879; &#119878; , and &#8462; &#8242;&#8242; is time-varying channel with &#119879; &#119872; , where &#119879; &#119878; &#8811; &#119879; &#119872; . Here, we adopt the time-varying geometric channel model <ref type="bibr">[35]</ref> with &#119871; multipath components between the RIS and MR. The multipath time-varying channel model at the &#119901; &#119905;&#8462; time instance in &#119879; &#119872; is:</p><p>where &#120572; &#119897; &#8764; CN (0, 1) is the complex channel gain of the &#119897; &#119905;&#8462; path, &#119907; &#119897; is the Doppler shift, and &#119938;(&#120601; &#119897; , &#120579; &#119897; ) is the reflection steering vector in the direction of &#120601; &#119897; and &#120579; &#119897; , respectively. We can derive the received signal at &#119901; &#119905;&#8462; time instance as &#119884; &#119901; = &#119945; &#8242; &#119889;&#119894;&#119886;&#119892;(&#119945; &#8242;&#8242; &#119901; )&#120654; &#119953; &#119935; + &#119925; &#119901; , &#119935; &#8712; C 1&#215;&#119879; &#119872; is transmitted signal sequence and &#120654; &#119953; &#8712; C &#119860;&#215;1 . We assume MRs cannot estimate the cascaded channel &#119919; &#119901; = &#119945; &#8242; &#119889;&#119894;&#119886;&#119892;(&#119945; &#8242;&#8242; &#119901; ) from AP to MR over RIS; thereby, they only observe received signal power when the AP selects a beam from the RIS codebook &#119862; and transmits a pilot signal. if the pilot signal &#119883; is set to be 1, then the received signal power (RSP) can be expressed as F &#119901; (&#120654; &#119901; ) = | &#8730; &#119904;&#119919; &#119901; &#120654; &#119901; + &#119925; &#119901; | 2 . Note that we convert the received signal power to the RSS by 10&#119897;&#119900;&#119892;(F &#119901; (&#120654; &#119901; )) for our experiments. The target user is MR M, but other MRs, denoted as &#119894; &#8712; I, should not experience interference from MR M. Received signal power F &#119901;,&#119894; (&#120654; &#119901; ) for &#119894; &#119905;&#8462; MR is also effected by time-varying channel &#119919; &#119901;,&#119894; .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Problem Formulation</head><p>We aim to control the beamforming weight vector &#120654; &#119901; to find the optimal beam that achieves the largest expected RSP, E[F &#119901;,&#119898; (&#120654; &#119901; )] = &#119904; |&#119919; &#119901;,&#119898; &#120654; &#119901; | 2 , for MR M while maintaining the expected maximum RSP, E[&#119898;&#119886;&#119909;</p><p>for MRs in I less than a threshold &#120588;. Given that &#119867; &#119901;,&#119898; and &#119867; &#119901;,&#119894; are unknown and environment-dependent, we formulate the beam selection as an online constrained stochastic optimization problem. Let &#119879; denote the time slots of equal duration for beam selection before transmitting data. In time slot &#119901; &#8712; &#119879; , the AP selects a beamforming weight vector, &#120654; &#119901; , from a set of candidate beams (arms in the bandit), and observes RSP, F &#119901; (&#120654; &#119901; ), from all the MRs. Here, we define &#119903; &#119901; (&#119960; &#119901; ) = F &#119901;,&#119898; (&#120654; &#119901; ) as observed reward and &#119892; &#119901; (&#119960; &#119901; ) = &#119898;&#119886;&#119909; &#119894; &#8712; I {F &#119901;,&#119894; (&#120654; &#119901; )} as observed utility function, and both are time-varying. Sequentially selected beams and corresponding sequential rewards with utilities are presented as {&#119960; 1 , &#119960; 2 , ..., &#119960; &#119901; } and {(&#119903; 1 (&#119960; 1 ), &#119892; 1 (&#119960; 1 )), ..., (&#119903; &#119901; (&#119960; &#119901; ), &#119892; &#119901; (&#119960; &#119901; ))}, respectively. Our objective is to find a policy, &#120587; &#8712; &#928;, that maximizes the expected cumulative reward, i.e., expected RSP, while satisfying a constraint on the expected utility:</p><p>Here, the AP selects the beam vector based on selection probability through policy &#120587; &#119901; . Note that such a policy can depend on the historical information.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Adaptive Beam Selection with REMARKABLE</head><p>We adopt GP bandits due to their proven effectiveness in beam alignment tasks <ref type="bibr">[12]</ref>, particularly over traditional multi-armed bandits. GP bandits capture spatial correlations among beams and adapt to time-varying channels-key for mobile scenarios with dynamic channel states. With this motivation, first, we introduce Gaussian process (GP) kernel to represent the reward and the constraints. Then, we define a constrained GP-bandit problem to determine the beamforming vector (cf,(Eq.10)). We next describe the base algorithm to address the static robot case, followed by our novel modification addressing the non-stationarity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1">Designing Solution with Gaussian Processes</head><p>6.1.1 Gaussian Processes. Our RIS codebook contains beamforming vectors for various rotational angles; different beamforming vectors &#120654;, from different codebooks C &#119903; &#8712; &#119862; can produce similar beams as the coverage sector of each codebook can partially overlap. This results in a high correlation between beamforming vectors (&#120596;, &#120596; &#8242; ). Since our reward &#119903; &#119901; and utility functions &#119892; &#119901; are unknown and non-linear, we use Gaussian Processes and their Reproducing Kernel Hilbert Space (RKHS) to model this correlation, as inspired by <ref type="bibr">[12]</ref>. Note that MAB problem, where each beamforming vector is an orthogonal arm, cannot model such correlation.</p><p>We define a GP over &#119862; as GP &#119862; (&#120583; (&#8226;), &#119896; (&#8226;, &#8226;)) that is completely specified by a mean function &#120583; and covariance function (kernel) &#119896; : &#8704;&#120596; &#8712; &#119862;. We assume that the reward function without noise &#119891; &#119901; and constrained utility function without noise &#119911; &#119901; come from a GP, and perturbed with Gaussian noise: &#119903; &#119901; = &#119891; &#119901; (&#120596;) + &#119899; &#119901; , with &#119899; &#119901; &#8764; N (0, &#120590; 2 ) and &#119891; &#119901; (&#8226;) &#8764; GP (&#120583; &#119901; (&#8226;), &#119896; (&#8226;, &#8226;)). Hence, if a beam vector &#120596; is selected then &#119903; &#119901; &#8764; GP (&#120583; &#119901; (&#120596;), &#119896; (&#120596;, &#8226;) + &#120590; 2 ). Similar argument holds for &#119911; &#119901; and &#119892; &#119901; = &#119911; &#119901; (&#120596;) + &#119899; &#119901; , with kernel k. We use GP (0 &#119888; , &#119896; (&#8226;, &#8226;)) as a prior distribution over &#119891; &#119901; . Given a set of sampling points &#119860; &#119879; = [&#120596; 1 , ..., &#120596; &#119879; ] within &#119862;, observed rewards</p><p>, where the mean and variance are:</p><p>with &#119896; &#119901; (&#120596;) = [&#119896; (&#120596; 1 , &#120596;), ..., &#119896; (&#120596; &#119879; , &#120596;)] &#119879; , &#119870; &#119901; = [&#119896; (&#120596;, &#120596; &#8242; )] &#120596;,&#120596; &#8242; &#8712;&#119860; &#119901; , and I is the identity matrix. Similarly, for the constraint &#119911; &#119901; , we consider the posterior GP ( &#956;&#119901; , &#963;2 &#119901; ) where &#119903; 1:&#119901; is replaced by &#119892; 1:&#119901; , and the kernel &#119896; is replaced by k.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1.2">Reproducing Kernel Hilbert Space (RKHS).</head><p>We assume that &#119891; &#119901; belongs to RKHS H &#119896; . In particular, H &#119896; is equipped with the kernel &#119896; such that &#119891; &#119901; (&#120596;) = &#10216;&#119891; &#119901; (&#8226;), &#119896; (&#120596;, &#8226;)&#10217; H &#119896; . Similarly, we assume that the constraint function &#119911; &#119901; also belongs to H k , i.e., &#119911; &#119901; (&#120596;) = &#10216;&#119911; &#119901; (&#8226;), k (&#120596;, &#8226;)&#10217; H k . Some examples of kernel functions are square exponential, Matern etc., <ref type="bibr">[36]</ref>.</p><p>Throughout the rest of this paper, we assume that the functions are bounded, i.e., ||&#119891; &#119901; (&#119909;)|| H &#119896; &#8804; &#119865; , and ||&#119892; &#119901; (&#119909;)|| H k &#8804; &#119866; for all &#119901;. Such assumptions are also common in practice in the wireless communication <ref type="bibr">[12,</ref><ref type="bibr">37]</ref>.</p><p>6.1.3 Kernel Selection. We employ the Matern kernel to specify the RKHS in GP <ref type="bibr">[36]</ref> as it shows the best performance:</p><p>Here, &#119907; &gt; 0 is the hyperparameter that controls the smoothness of the output, &#119904; = &#119889; (&#120596;, &#120596; &#8242; ) encodes the similarities between two arms with the Euclidean distance, &#119861;(&#119907;) and &#915;(&#119907;) are the modified Bessel function and the gamma function, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2">Base Algorithm</head><p>We now discuss the base algorithm (inspired from <ref type="bibr">[12]</ref>) which we use to find the beamforming vector in the static case. This algorithm also forms the basis in the non-stationary case as well where the robots are mobile.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2.1">GP-Upper confidence bound (GP-UCB</head><p>). Since we do not know &#119891; &#119901; and &#119911; &#119901; , rather we are learning. We only get feedback (noisy) corresponding to selecting the beamforming vector &#120596;; we need to balance between the exploration and exploitation carefully. For the unconstrained GP-bandit, GP-UCB algorithm is proposed <ref type="bibr">[36]</ref> where the idea is to select the points that have a higher mean estimate reward (exploitation) or have a higher posterior variance (exploration as it does not have enough information). Similarly, we maintain the upper confidence of the &#119891; &#119901; (at time &#119901;) as the following term f &#119901; (&#120596;) = &#120583; &#119901; -1 (&#120596;) + &#120573; &#119901; -1 &#120590; &#119901; -1 (&#120596;). &#120573; &#119901; -1 is the weight factor:</p><p>where &#120574; is the information gain<ref type="foot">foot_0</ref> and the &#120575; is the confidence parameter. Please see <ref type="bibr">[12,</ref><ref type="bibr">36]</ref> for details.</p><p>Primal-Dual: Unlike the unconstrained version <ref type="bibr">[36]</ref>, we consider a constrained optimization problem. Similar to <ref type="bibr">[12]</ref>, we consider the Lagrangian of Eq.10 as E[&#119903; &#119901; (&#120596; &#119901; )] -&#120601; (E[&#119892; &#119901; (&#120596; &#119901; )] -&#120588;).</p><p>We, then seek to solve for the Lagrangian:</p><p>Hence, unlike the unconstrained version, we have to develop the UCB for the Lagrangian for a given dual variable &#120601;. Since noise is zero-mean,</p><p>We, thus, only need to find the lower confidence bound of &#119911; &#119901; as we already obtained UCB for &#119891; &#119901; , for which, we use &#7825;&#119901;</p><p>With probability 1 -&#120575;, &#119891; &#119901; (&#120596;) &#8804; f &#119901; (&#120596;), and &#119911; &#119901; (&#120596;) &#8805; &#7825;&#119901; (&#120596;) (from <ref type="bibr">[12]</ref>) ensuring that if we use f &#119901; and &#7825;&#119901; , it will be indeed UCB. For the static channel &#119891; &#119901; and &#119911; &#119901; are drawn from the GP with time invariant mean. We decide to choose the solution at time &#119901; as:</p><p>After selecting a beamforming vector based on Eq.15, all the posterior mean and variance, &#120583; &#119901; (&#119909;), &#120590; &#119901; (&#119909;) and &#956;&#119901; (&#119909;), &#963;&#119901; (&#119909;) are updated through Eq.12 based on the received value. Finally, we update dual variable &#120601; with the gradient descent in the dual domain &#120601; = max{&#120601; + &#120578; ( &#7825;&#119901; (&#120596; &#119901; ) -&#120588;), 0}, where &#120578; is the learning rate Initialization: Reward &#119877; (&#119896; ) = 0, constraint value &#119866; (&#119896; ) = 0.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>5:</head><p>Set the probability for arms &#119895; = 1, . . . , &#119869; as</p><p>Choose arm &#119895; according to the probability &#119901; ( &#119895; ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>7:</head><p>Set the restart interval &#119882; = 2 &#119895; .</p><p>8:</p><p>Run the base algorithm with the restart interval &#119882; . 9:</p><p>Collect the total reward &#119877; (&#119896; ) and the constraint value &#119866; (&#119896; ) across the interval &#119868; .</p><p>10:</p><p>for &#119895; = 1, . . . , &#119869; do 11:</p><p>Update &#119884; = min(max(&#119884; + &#120578; &#8242; (&#119866; (&#119896; )/&#119868; -&#120588; ), 0), &#119884; &#119898;&#119886;&#119909; )</p><p>We also clip the dual variable at &#120601; &#119898;&#119886;&#119909; . Please see Algorithm 1 in <ref type="bibr">[12]</ref> for details of the Base algorithm.</p><p>6.2.2 Learning Metric. : For the static-case, we are interested in minimizing the regret and the violation:</p><p>The regret measures the sub-optimality gap between the reward following the optimal policy &#120587; * , and the reward following the policy &#120587; &#119901; at time &#119901;. The violation measures the constraint violation at time &#119901;. Here, we seek to have sub-linear growth of &#119877;(&#119879; ) and &#119881; (&#119879; ), i.e., &#119877;(&#119879; )/&#119879; &#8594; 0 as &#119879; &#8594; &#8734; as it will ensure that in most of the episodes, the policy is feasible and optimal. The following result signifies that the base algorithm indeed achieves the sub-linear regret and violation.</p><p>Proposition 1 <ref type="bibr">([12]</ref>). With probability 1 -&#120575;, the base algorithm achieves &#119877;(&#119879; ) &#8804; &#213; (&#119879; 1/2 ), &#119881; (&#119879; ) &#8804; &#213; (&#119879; 1/2 ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.3">Addressing Non-Stationary Conditions</head><p>We now discuss how we modify the base algorithm to address the non-stationarity. Our key contribution is adapting the 'bandit over bandit' approach inspired from <ref type="bibr">[38]</ref> to the constrained GPbandit. We first quantify non-stationarity, discuss existing metricdependent methods, and finally introduce our novel approach that removes this dependency.</p><p>6.3.1 Time-varying Budget. Since our scenario is mobile, the channel conditions are time-varying. These time-varying channels affect reward/utility; thereby, &#119891; &#119901; and &#119911; &#119901; also vary over time. We assume that these variations are bounded by &#119861; &#119891; and &#119861; &#119911; . In particular, &#119861; &#119891; :=</p><p>&#119879; -1 &#119901;=1 max &#119909; &#119891; &#119901; -&#119891; &#119901;+1 H &#119896; and &#119861; &#119911; := &#119879; -1 &#119901;=1 max &#119909; &#119911; &#119901; -&#119911; &#119901;+1 H &#119896; . The total combined variation budget would be &#119861; = max{&#119861; &#119891; , &#119861; &#119911; }. 6.3.2 Restart Strategy. Similar to [12], to combat non-stationary conditions, we adopt the restart strategy, which resets the kernels and forgets previous observations, no longer useful for deciding the new beamforming vector as perhaps the environment has changed. Restart enables efficient adaptation by discarding outdated observations in non-stationary environments. Note that instead of restart, one can employ sliding window, or weight-based algorithm. The key algorithmic contribution is selecting the restart interval &#119882; which we explain in the following. 6.3.3 Unknown Variation Budget. Estimating the variation budgets &#119861; &#119891; and &#119861; &#119911; in real time is challenging and channel-dependent; thus, especially in mobile settings, the restart window &#119882; should be learned adaptively. We propose Algorithm 1, a bandit-over-bandit scheme that treats a candidate set &#119869; = {2 &#119895; } &#8970;0.5 log 2 &#119879; &#8971; &#119895;=1</p><p>of restart intervals as outer-loop arms. We partition the horizon &#119879; into epochs of length &#119868; = &#8730; &#119879; . At epoch &#119896;, EXP3 <ref type="bibr">[39]</ref> selects an arm &#119895; from weights &#119908; &#119895; (&#119896;) (updated via Eq. ( <ref type="formula">18</ref>)), yielding &#119882; = 2 &#119895; ; the base algorithm (Sec. 6.2) then runs for &#119868; steps with restarts every &#119882; . We aggregate reward &#119877;(&#119896;) and interference &#119866; (&#119896;) over the epoch, update arm probabilities using Eq. ( <ref type="formula">17</ref>), and update the dual variable via &#119884; &#8592; &#928; [0,&#119884; max ] (&#119884; + &#120578; &#8242; (&#119866; (&#119896;)/&#119868; -&#120588;)). Arms that induce low reward and/or higher constraint violation (see, (Eq.18)) receive lower weight and are selected less often in subsequent epochs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.4">Performance Metrics</head><p>&#8226;Regret and Violation: Since the optimal policy can change over time in a non-stationary environment, we evaluate our algorithm using dynamic regret Dy -R(&#119879; ) and constraint violation &#119881; (&#119879; ). We define the dynamic regret as:</p><p>The constraint violation metric &#119881; (&#119879; ) is still given by Eq.16. Compared to regret, bounding dynamic-regret is fundamentally more challenging. Nevertheless, we obtain sub-linear dynamic-regret and violation bound. Theorem 1. Algorithm 1 achieves with probability 1 -&#120575;,</p><p>The proof is in our technical report <ref type="bibr">[40]</ref> owing to space constraint. <ref type="bibr">[12]</ref> achieves dynamic regret and violation bound of &#213; (&#119861;&#119879; 3/4 ) (Theorem 1 there) where one can choose the worst estimated budget if the budget is unknown. In contrast, our bound achieves &#213; (&#119861; 1/4 &#119879; 3/4 ). Hence, we improve the dependency of the budget.</p><p>Note that <ref type="bibr">[12]</ref> indicates that even if &#119861; = &#119900; (&#119879; 1/4 ), one can have linear regret and violation. In contrast, our result still achieves sub-linear regret and sub-liner violation bound as long as the time-varying budget grows sub-linearly. <ref type="bibr">[12]</ref> has also achieved &#213; (&#119861; 1/4 &#119879; 3/4 ) dynamic regret and violation, however, it requires the knowledge of the budget &#119861; (Corollary 2 there). Our result achieves the same bound without the information of &#119861;.</p><p>For an online setting, it is norm to assume that &#119879; is known. If &#119879; is unknown, one can use the doubling 'trick' <ref type="bibr">[41]</ref>, which scales the regret and violation bound by log(&#119879; ). In particular, one can choose &#119879; = 2 0 , 2 1 , 2 2 , . . . , and run the algorithm until reaching the &#119879; . (a) (b) (c) (d) Figure 7: Validation of codebook design; a) measured beam pattern at (-35&#176;,-19&#176;), b) after changing the orientation of RIS ( MR was outside of the coverage at (65&#176;,-13&#176;)); c) Comparison of REMARKABLE's SNR performance against multi-AP scneraio; and d) Comparing the performance of REMARKABLE and exhaustive search (ES) w.r.t selection accuracy and beam selection time.</p><p>As this is a real-time experiment, obtaining the best policy while running the algorithm is not feasible. Consequently, we conduct two consecutive epochs: one for beam selection using the proposed approach and another utilizing an exhaustive search technique to find the best policy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.3">Non-stationary Scenario</head><p>In this study, the MRs travel along a pre-determined path and send their observations (RSS) to the AP. We first conduct the experiment with our base algorithm (GP-UCB-C) (without restart strategy). The results (Fig. <ref type="figure">6c-6d</ref>) show that GP-UCB-C has a significant increase in TA-regret (DynR(&#119879; )/&#119879; ) due to changes in the channel, specifically starting around 300, even if with non-violated constraint. We then apply the restart strategy GP-UCB-RE proposed in <ref type="bibr">[12]</ref> with an upper estimate on the budget. While this restart strategy fails but still beats the baseline as it ultimately restarts and the TA-regret starts decreasing. Finally, our proposed approach Algorithm 1 (GP-UCB-A) has the best performance (Fig. <ref type="figure">6c-6d</ref>) compared to other two baselines as it can quickly adapt to the change in the environment.</p><p>In particular, our proposed approach has the lowest TA-regret even when the environment changes while also finding feasible solution.</p><p>The last experiment corresponds to a time-varying channel with different channel coherence times as we adjust the speed of the MR from min to the allowed max speed as in [0.15&#119898;/&#119904;, 0.3&#119898;/&#119904;, 0.45&#119898;/&#119904;]. Fig. <ref type="figure">6e</ref> presents that our approach Alg.1 adaptively selects the best optimum beams over time-varying channel quickly which is evident from its TA-regret performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.4">Comparison with Classical Methods</head><p>Here, we first compare REMARKABLE's SNR performance with the classical multi-AP deployment, and then evaluate REMARKABLE against classical methods in terms of the beam selection time, and selection accuracy from Sec. 6.4. For benchmarking, REMARKABLE is compared with exhaustive search (ES)-the widely adopted baseline in mmWave protocols (e.g., IEEE 802.11ad/ay)-to highlight the efficiency of our approach without resorting to full codebook scans, rather than against complex learning methods such as reinforcement learning, which demands extensive training. As shown in Fig. <ref type="figure">7c</ref>, the MR suffers an SNR drop when encountering obstacles or interference, whereas REMARKABLE maintains reliable links with consistent SNR. Despite RIS-induced cascaded path loss, RE-MARKABLE consistently achieves high SNR levels, underscoring the robustness of the design. Furthermore, we define a term of time slot as the end-to-end latency of each selected beam, comprising: (i) the beam searching latency of the employed algorithm, including both selection and real-time execution; (ii) feedback latency between the MR and AP, where the MR reports the observed RSS to the AP; and (iii) control latency between the AP and the RIS, where the AP transmits the selected codewords to the RIS. The control latency for transmitting a single codeword to the RIS is 208 &#181;s, the average feedback latency over the wireless backhaul is 8.1 ms, and the beam searching latency is approximately 23.8 ms for codebook CB1 (higher for ES). When using ES, 32 time slots are needed to examine all beams from &#119862;&#119861;1 and select the one with the highest RSS, resulting in a guaranteed overhead delay. In contrast, REMARK-ABLE averages 18 time slots (based on multiple experiments) to find the best beam with our adaptive algorithm GP-UCB-A 1 . After first initiated restart, GP-UCB-A 2 achieves adaptation in only 10 slots, whereas ES 2 must re-scan. Finally, Fig. <ref type="figure">7d</ref> shows that REMARK-ABLE achieves a 46.8% improvement in beam selection time while maintaining 94.2% accuracy. Note that this study primarily focuses on low-mobility scenarios, while high-mobility cases are left for future work with further latency optimization. Time Complexity Analysis: The codebook design (see Sec 4) is performed offline, which significantly supports scalability of the hardware. As we use gradient-descent, its time complexity is O (&#119878;&#119868; &#119873; 2 log &#119873; ), where &#119878; is the codebook size, &#119873; 2 the number of RIS elements, and &#119868; the number of gradient descent iterations. At runtime, the codebook size affects the online complexity of the bandit algorithm. Specifically, computing the mean and variance requires matrix inversion with worst-case complexity &#119874; (&#119882; 2 ), where &#119882; is the restart interval. Evaluating GP-UCB over &#119878; codebook cardinality results in overall complexity &#119874; (&#119878;&#119882; 2 ). Since &#119882; &#8804; &#8730; &#119879; , the runtime remains linear in both &#119878; and &#119879; . Through empirical RIS measurements, the runtime is 25% fraction of the total end-to-end latency reported in Sec 8.4. This implies that the practical runtime overhead is suitable for real-world environments. Designing faster algorithms and potentially hardware-accelerated solutions remains an important future direction, especially in high-mobility settings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="9">Conclusions and Future Work</head><p>We demonstrate optimal beam selection for robot connectivity under interference-constrained, time-varying channels. Our RIS codebook enables selection without channel estimation, and a reconfigurable mechanism extends coverage. Using an adaptive banditover-bandit restart strategy, REMARKABLE safely learns optimal beams in dynamic conditions, achieving 46.8% faster selection and 94.2% accuracy-outperforming classical methods. Characterizing the lower bound on the dynamic regret and violation is an important future work. Extending this framework to multiple robots and multiple RIS also constitutes an important future research direction.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>For matern kernel, &#120574; &#119879; = &#119879; &#119889; (&#119889;+1) /(2&#119907;+&#119889; (&#119889;+1) ) log(&#119879; )</p></note>
		</body>
		</text>
</TEI>
