<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>CURE: Simulation-Augmented Autotuning in Robotics</title></titleStmt>
			<publicationStmt>
				<publisher>IEEE</publisher>
				<date>01/01/2025</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10616068</idno>
					<idno type="doi">10.1109/TRO.2025.3548546</idno>
					<title level='j'>IEEE Transactions on Robotics</title>
<idno>1552-3098</idno>
<biblScope unit="volume">41</biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Md Abir Hossen</author><author>Sonam Kharade</author><author>Jason M O'Kane</author><author>Bradley Schmerl</author><author>David Garlan</author><author>Pooyan Jamshidi</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Robotic systems are typically composed of various subsystems, such as localization and navigation, each encompassing numerous configurable components (e.g., selecting different planning algorithms). Once an algorithm has been selected for a component, its associated configuration options must be set to the appropriate values. Configuration options across the system stack interact nontrivially. Finding optimal configurations for highly configurable robots to achieve desired performance poses a significant challenge due to the interactions between configuration options across software and hardware that result in an exponentially large and complex configuration space. These challenges are further compounded by the need for transferability between different environments and robotic platforms. Data efficient optimization algorithms (e.g., Bayesian optimization) have been increasingly employed to automate the tuning of configurable parameters in cyber-physical systems. However, such optimization algorithms converge at later stages, often after exhausting the allocated budget (e.g., optimization steps, allotted time) and lacking transferability. This article proposes causal understanding and remediation for enhancing robot performance (CURE)-a method that identifies causally relevant configuration options, enabling the optimization process to operate in a reduced search space, thereby enabling faster optimization of robot performance. CURE abstracts the causal relationships between various configuration options and the robot performance objectives by learning a causal model in the source (a low-cost environment such as the Gazebo simulator) and applying the learned knowledge to perform optimization in the target (e.g., Turtlebot 3 physical robot). We demonstrate the effectiveness and transferability of CURE by conducting experiments that involve varying degrees of deployment changes in both physical robots and simulation.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>These components interact to achieve specific goals in a physical environment. Unfortunately, robots are prone to a wide variety of faults <ref type="bibr">[1]</ref>. Incorrect configurations (called misconfigurations) in robotic algorithms are one of the most prevalent causes of such faults <ref type="bibr">[2]</ref>, <ref type="bibr">[3]</ref>, <ref type="bibr">[4]</ref>. Misconfigurations can cause various bugs <ref type="bibr">[5]</ref>, <ref type="bibr">[6]</ref> leading to crashes, robots becoming unstable, deviations from planned trajectory, controller faults, and nonresponsiveness. Several studies have reported misconfigurations as one of the key reasons for cyber-physical system failures. Such misconfigurations caused 19.6% of autonomous aerial vehicles (AAV) bugs <ref type="bibr">[7]</ref>, 27.25% of autonomous vehicle bugs <ref type="bibr">[8]</ref> (a faulty configuration in actuation layer even caused the vehicle to collide with a static object on the curb <ref type="bibr">[9]</ref>) and 55% of traffic dispatch algorithm bugs <ref type="bibr">[10]</ref>. All of these issues were fixed by configuration changes.</p><p>Most robotic algorithms require customization through configuration parameters to suit certain tasks and situations. For example, most AAV controllers include a wide range of configurable parameters that can be customized to different vehicles, flight conditions, or even particular tasks (e.g., when speed is more important than energy use). Finding configurations that optimize performance on a given task is a challenging problem for designers and end users <ref type="bibr">[11]</ref>. A developer might request a feature such as "Create a tool to automatically tune naviga-tion2 node parameters using state-of-the-art machine learning techniques." <ref type="bibr">[12]</ref>. In another instance, a developer encounters a planner performance issue <ref type="bibr">[13]</ref> and asks "I have tuned this for almost 5-6 hours. Sometimes it is going toward the goal but still failing in the middle of the trajectory." After several backand-forth communications, the algorithm designer concludes, "I cannot provide personalized tuning assistance to every user." In addition, developers aim to maintain the performance of the tuned parameters when deployment changes (e.g., from the robot operating system (ROS) 1 to ROS 2) to avoid retuning. Specifically, the optimal configuration determined in one environment often becomes suboptimal in another, as demonstrated in Fig. <ref type="figure">1</ref>.</p><p>Our solution: In this work, we propose causal understanding and remediation for enhancing robot performance (CURE), a multiobjective optimization method that finds optimal configurations for robotic platforms, converges faster than the state-ofthe-art, and transfers well from simulation to real robot and even to new untrained platforms. CURE has two main phases. In Phase 1, CURE reduces the search space by eliminating configuration options that do not affect the performance objective causally. For this, we collect observational data in a low-cost source environment, such as simulation. Then, a causal model is learned on the basis of the data, representing the underlying causal mechanisms that influence robot performance. We then estimate the causal effects of options on performance objectives. Finally, we reduce the search space to a subset of options that have non-negligible causal effects. In Phase 2, CURE performs traditional Bayesian optimization in the target environment, but only over the reduced search space, to find the optimal configuration. We show that CURE not only finds the optimal configuration faster than the state-of-the-art, but the learned causal model in the simulation speeds up optimization in the real robot. The results demonstrate that the learned causal model is transferable across similar but different settings, that is, environments, mission/tasks, and for new robotic platforms. In other words, the existence of a common abstract structure (the causal relations between options, system-level variables, and performance objectives) is invariant across domains, and the behavior of specific features of the environment remains constant across domains.</p><p>Evaluations: We evaluated CURE in terms of its effectiveness and transferability across two tasks: navigation and manipulation. The navigation task forms the core of our experiments, using two highly configurable robotic systems (Husky and Turtlebot 3) under varying degrees of deployment changes. The manipulation task involves simulating a robot arm (Franka Emika Panda) in Gazebo to demonstrate CURE's adaptability by complementing the effectiveness evaluation. We compared CURE with traditional multiobjective Bayesian optimization (MOBO) using the AX framework <ref type="bibr">[14]</ref>, and RidgeCV <ref type="bibr">[15]</ref>, <ref type="bibr">[16]</ref> integrated with MOBO to reduce the search space. Our results indicate that compared to MOBO, CURE finds a configuration that improves performance by 2&#215; and achieves this improvement with gains in efficiency of 4.6&#215; when we transfer the knowledge learned from Husky in simulation to Turtlebot 3 physical robot.</p><p>Contributions: The contributions of our work are as follows. 1) We propose CURE, a multiobjective optimization method that operates in the reduced search space involving causally relevant configuration options and allows faster convergence. 2) We conducted a comprehensive empirical study by comparing CURE with state-of-the-art optimization methods in both simulation and real robots under different severities of deployment changes, and studied effectiveness and transferability.</p><p>3) The code and data are available at: <ref type="url">https://github.com/  softsys4ai/cure</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. RELATED WORK</head><p>In this work, we focus on performance optimization through the lens of causality. Specifically, we learn a causal model from a low-cost environment and utilize causal knowledge to optimize performance in the target system. This section groups related work into four categories: optimizing robotic parameters, machine learning (ML) for performance modeling, transfer learning strategies, and causal analysis in configurable systems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Optimization Techniques in Robotic Configurations</head><p>Researchers have considered robotic algorithms as a closedbox, as the objective functions in most robotic problems can only be accessible through empirical experiments. Evolutionary algorithms <ref type="bibr">[17]</ref>, <ref type="bibr">[18]</ref> have been used to find optimal configurations in dynamic-window approach (DWA) <ref type="bibr">[19]</ref> algorithm. However, the application of evolutionary algorithms in robotic systems is hindered by the limited availability of observations and the difficulty in extracting meaningful information from these observations due to the presence of noise. Approaches such as variational heteroscedastic Gaussian process (GP) regression <ref type="bibr">[20]</ref> and Bayesian optimization with safety constraints <ref type="bibr">[21]</ref> attempt to address these challenges, but struggle with highdimensional search spaces, yield only local improvements, and lack transferability across different environments and platforms. Furthermore, the complexity of environmental dynamics models, coupled with the biases introduced by optimization formulation, poses significant challenges. Moreover, formalizing safety constraints that allow for computationally efficient solutions, specifically solutions in polynomial time with closed-form expressions, is complex if at all feasible.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Learning-Based Methods for Performance Modeling</head><p>Expanding on traditional optimization techniques, machine learning methods offer diverse approaches to improve robotic performance. Approaches such as learning from demonstration <ref type="bibr">[22]</ref>, learning human-aware path planning <ref type="bibr">[23]</ref>, and mapping sensory inputs to robot actions <ref type="bibr">[24]</ref>, <ref type="bibr">[25]</ref> have been widely applied to robot navigation beyond fine-tuning configuration parameters, as opposed to heavily relying on human expertise. These methods aim to replace classical methods, casting doubt on the robustness, generality, and safety of the systems. To provide a deeper understanding of performance behavior in robotic algorithms, performance influence models <ref type="bibr">[26]</ref>, <ref type="bibr">[27]</ref>, <ref type="bibr">[28]</ref> can be used. These models predict system performance by capturing important options and interactions that influence performance behavior using machine learning and sampling heuristics. However, performance influence models face limitations in adapting to unexpected environments due to not being able to capture changes in the performance distribution and often produce incorrect explanations <ref type="bibr">[4]</ref>. In addition, the collection of training data for these models is costly and requires extensive human supervision.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Transfer Learning for Performance Modeling</head><p>Addressing the challenges of adapting to unexpected environments and costly data collection in learning-based methods, transfer learning accelerates optimization by selectively reusing knowledge from previous tasks. Techniques such as simulationto-real learning <ref type="bibr">[29]</ref>, <ref type="bibr">[30]</ref> and transferring Pareto frontiers across different platforms <ref type="bibr">[31]</ref> improve sampling efficiency and improve training datasets. Each of these techniques uses the predicted transfer learning frameworks based on correlational analysis. However, changes in the environment and robotic platform can cause a distribution shift. The ML models used in these transfer learning approaches are vulnerable to spurious correlations <ref type="bibr">[32]</ref>, <ref type="bibr">[33]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Causal Analysis in Configurable Systems</head><p>While machine learning techniques excel in uncovering correlations between variables, their ability to identify causal links is limited <ref type="bibr">[34]</ref>. Using the information encoded in causal models, we can benefit from analyses that are only possible when we explicitly employ causal models, such as interventional and counterfactual analyses <ref type="bibr">[34]</ref>, <ref type="bibr">[35]</ref>. Causal analysis has been used for various debugging and optimization tasks in configurable systems, including finding the root cause of intermittent failures in database applications <ref type="bibr">[36]</ref>, detecting and understanding the root causes of the defect <ref type="bibr">[37]</ref>, <ref type="bibr">[38]</ref>, and improving fault localization <ref type="bibr">[39]</ref>. The causality analysis in these studies is confined to a single environment and platform, while our approach transfers causal knowledge across different environments and platforms. In robotic systems, the causal models learned in simulation are used to find explanations for failures in real robots <ref type="bibr">[4]</ref>, <ref type="bibr">[40]</ref>. However, such methods are limited to identifying root causes of failures, whereas our approach extends beyond diagnosis to also prescribe remedies, new configuration option values that rectify the failure.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. PROBLEM FORMULATION AND CHALLENGES</head><p>In this section, we first motivate our work by illustrating how an optimal configuration found in one environment often becomes suboptimal in another. We then formally define the problem and describe the challenges.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Motivating Scenario</head><p>We motivate our work by demonstrating the nontransferability of traditional Bayesian optimization through a simple experiment for robot navigation. In particular, we explore two deployment scenarios: 1) Sim2Real: Transferring the optimal configurations for energy consumption identified from simulations to the Turtlebot 3 physical robot [see Fig. <ref type="figure">1(a)</ref>] Real2Real: Transferring the optimal configurations for position error 1 identified 1 defined as the Euclidean distance between goal position and robot's actual position from Husky to Turtlebot 3 [see Fig. <ref type="figure">1(b)</ref>]. In both scenarios, we observe that the optimal configurations identified by Bayesian optimization in the source environments fail to retain their optimality in the target environment. We observe that energy consumption increases by 2.57&#215;, and a significant increase in position error is observed by 8.64 &#215; 10 5 times.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Problem Formulation</head><p>Consider a highly configurable robot with d distinct configurations. Let X i indicate the configuration parameter i, which can be assigned a value from a finite domain Dom(X i ). In general, X i may be set to the following:</p><p>1) a real number (e.g., the number of iterative refinements in a localization algorithm, the frequency of the controller) within specified bounds, denoted as</p><p>where X i and Xi are the lower and upper bounds, respectively; 2) binary (e.g., whether to enable recovery behaviors); 3) categorical (e.g., planner algorithm names). The configuration space is mathematically a Cartesian product of all the domains of the parameters of interest X = Dom(X 1 ) &#215; &#8226; &#8226; &#8226; &#215; Dom(X d ). Then, a configuration x, which is in the configuration space x &#8712; X, can be instantiated by setting a specific value for each option within its domain,</p><p>Finding a configuration that uniformly optimizes all objectives is typically not possible; instead, there is a tradeoff between them. Pareto optimal solutions signify the prime balance among all objectives. In the context of minimization, a configuration x is said to dominate another configuration x if f (x) &#8804; f (x ). A configuration x &#8712; X is called Pareto-optimal if it is not dominated by any other configuration x &#8712; X, where x = x . The goal is to find x * , a configuration that gives rise to Pareto-optimal performance in the multiobjective space (e.g., f 1 : failure rate, f 2 : mission time, f 3 : energy consumption), given some constraints (h : safety). Here, we assume that the performance measure can be evaluated in experiments for any configuration x, and we do not know the underlying functional representation of the performance. The problem can be generalized by defining an arbitrary number of performance objectives (if they can be computed over a finite time horizon). Mathematically, we represent performance objectives as black-box functions that map from a configuration space to a real-valued one: f (x) : X &#8594; R. In practice, we learn f by sampling the configuration space and collecting the observations data, i.e., y i = f (x i ) + i with &#8764; N(0, &#963; 2 ). In other words, we only partially know the response function through observations</p><p>We define the problem formally as follows:</p><p>where x * &#8712; X is a Pareto-optimal configuration and adhere to the safety constraints.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Challenges</head><p>In this article, our objective is to propose a solution to address the following key challenges. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1) Software-Hardware Interactions and Exponentially</head><p>Growing Configuration Space: A robotic system consists of software components (e.g., localization, navigation, and planning), hardware components (e.g., computer and sensors onboard), and middleware components (e.g., ROS), with most components being configurable. The configuration space of only 100 parameters with only 10 possible values for each comprises of 10 100 possible configurations. (For comparison, the number of atoms in the universe is estimated to be only 10 82 .) Therefore, the task of finding Pareto-optimal configurations for highly configurable robots and other cyberphysical systems is orders of magnitude more difficult because of software-hardware interactions, compared with software systems.</p><p>2) Reality Gap and Negative Transfer From STR: Robot simulators have been extensively used in testing new behaviors before the new component is used in real robots. However, the measurements from simulators typically contain noise, and the observable effect for some configuration options may not be the same in a real robot operating in a real environment, and in some cases, such effect may even have the opposite effect. Therefore, any reasoning based on the model predictions learned based on simulation data may become misleading. Such a reality gap between the sim and real exists due to unobservable confounders as a result of simplifications in the sim. Still, there exist stable relationships between configuration options and performance objectives in the two environments that can facilitate performance optimization of real robots.</p><p>3) Multiple Objectives: It is common to find multiple performance objectives in mission specifications (e.g., mission time, energy, and safety). Typically, the objectives involved in the specification are independent of each other <ref type="bibr">[41]</ref>, but in some cases they can be correlated and conflicting; for example, faster task completion could lead to higher energy consumption. Therefore, finding the optimal configuration (for a given robotic platform in a specific environment and for a specific task) should be treated as a multiobjective optimization problem.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>4) Costly Acquisition of Training Data and the Safety Critical</head><p>Nature of Robotic Systems: Algorithm parameters can be manually adjusted by experiments on real robots or by using massive amounts of training data when the robotic system contains elements that are difficult to hard-code (e.g., computer vision components) <ref type="bibr">[42]</ref>. However, collecting training data from real robots is time-consuming and often requires constant human supervision <ref type="bibr">[43]</ref>. To guarantee the safe behavior of the robot, the practitioner must either meticulously select configurations that are safe or acquire an ample amount of representative data that lead to safe behavior.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. CAUSAL UNDERSTANDING AND REMEDIATION FOR ENHANCING ROBOT PERFORMANCE</head><p>To solve the optimization problem described in Section III, we propose a novel approach, called CURE. The high-level overview of CURE is shown in Fig. <ref type="figure">2</ref>. CURE works in two phases. In Phase I, CURE reduces the search space for the optimization problem using data from the source environment, while in Phase II, CURE performs a black-box optimization in the reduced search space on the target platform. To elaborate on the details, in Phase I, CURE learns a structural causal model that enforces structural relationships and constraints between variables using performance evaluations from the source platform (e.g., Husky in simulation). Specifically, we learn a causal model for a set of random samples<ref type="foot">foot_1</ref> taken in the source environment. <ref type="foot">3</ref> The configuration options are then ranked by measuring their average causal effect (ACE) on the performance objectives through causal interventions. Options with the largest causal effect are selected to reduce the search space. Next, in Phase II, CURE performs a black-box optimization in the reduced search space given a fixed sampling budget in the target platform (e.g., the physical Turtlebot 3). Specifically, CURE searches for Paretooptimal configurations in the target, iteratively fits a surrogate model to the samples, and selects the next sample based on an acquisition function until the budget is exhausted. CURE's high-level procedure is described in Algorithm 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Phase I: Reducing the Search Space Via Causal Inference</head><p>Phase I begins by recording performance metrics for s initial configurations {(x 1 , y 1 ), . . . , (x s , y s )} in the source environment (Algorithm 1: lines 1-2). We define the following three types of variables to learn the causal structure:</p><p>1) software-level configuration options (e.g., hyperparameters in different algorithms <ref type="bibr">[44]</ref>) and hardware-level options (e.g., sensor frequency), 2) intermediate performance metrics (e.g., different system events in ROS) that map the influence of configuration options on performance objectives, and 3) end-to-end performance objectives (e.g., task completion rate, mission time). We also define structural constraints (e.g., X i X j ) over the causal structure to incorporate domain knowledge that facilitates learning with low sample sizes. <ref type="foot">4</ref>To discover the causal structure, we use an existing structure learning algorithm fast causal inference (FCI). We select FCI because 1) it can identify unobserved confounders <ref type="bibr">[35]</ref>, <ref type="bibr">[45]</ref>, and 2) it can handle variables of various typologies, such as nominal, ordinal, and categorical given a valid conditional independence test. Algorithm 2 describes the details of our causal learning procedure. It starts by constructing an undirected fully connected graph G, where the nodes represent the variables (options, intermediate variables, performance metrics). Next, we evaluate the independence of all pairs of variables conditioned on all remaining variables using Fisher's z test <ref type="bibr">[46]</ref> to remove the edges between independent variables. Finally, a partial ancestral graph (PAG) is generated (Algorithm 2: line 2), orienting the undirected edges using the edge orientation rules <ref type="bibr">[35]</ref>, <ref type="bibr">[45]</ref>, <ref type="bibr">[47]</ref>.</p><p>A PAG is composed of directed, undirected, and partially directed edges. The partially directed edges must be fully resolved to discover the true causal relationships. We employ the information-theoretic LatentSearch algorithm proposed by Kocaoglu <ref type="bibr">[48]</ref> to orient partially directed edges in PAG through entropic causal discovery (line 3). For each partially directed edge, we follow two steps: (i) establish if we can generate a latent variable (with low entropy) to serve as a common cause between two vertices; (ii) if such a latent variable does not exist, then pick the direction which has the lowest entropy. For the first step, we assess whether there could be an unmeasured confounder (say Z) that lies between two partially oriented nodes (say X and Y ). LatentSearch outputs a joint distribution q(X, Y, Z) that can be used to compute the entropy H(Z) of the unmeasured confounder Z. Following the Kocaoglu guidelines, we set an entropy Algorithm 1: CURE.</p><p>of the unmeasured confounder falls below this threshold, then we declare that there is a simple unmeasured confounder Z (with a low enough entropy) to serve as a common cause between X and Y and accordingly replace the partial edge with a bidirected (&#8596;) edge. When there is no latent variable with sufficiently low entropy, there are two possibilities: 1) the variable X causes Y ; then there is an arbitrary function f (&#8226;) such that Y = f (X, E), where E is an exogenous variable (independent of X) that accounts for system noise; or 2) the variable Y causes X; then there is an arbitrary function g(&#8226;) such that X = g(Y, &#7868;), where Algorithm 2: Causal Model Learning.</p><p>&#7868; is an exogenous variable (independent of Y ) that accounts for noise in the system. The distribution of E and &#7868; can be inferred from the data. With these distributions, we measure the entropies H(E) and H( &#7868;). If H(E) &lt; H( &#7868;), then it is simpler to explain X causes Y (that is, the entropy is lower when Y = f (X, E)) and we choose X &#8594; Y . Otherwise, we choose Y &#8594; X.</p><p>The final causal model is an acyclic-directed mixed graph (ADMG). When interpreting a causal model, we view the nodes as variables and the arrows as the assumed direction of causality, whereas the absence of an arrow shows the absence of direct causal influence between variables. To quantify the influence of a configuration option on a performance objective, we need to locate the causal paths. A causal path P X Y is a directed path that originates from a configuration option X to a subsequent nonfunctional property S (e.g., planner failed) and ends at a performance objective Y . For example, X &#8594; S &#8594; Y denotes X causes Y through a subsequent node S on the path. We discover P X Y by backtracking the nodes corresponding to each of the performance objectives until we reach a node without a parent. We then measure the ACE, by measuring the causal effects of the configuration options on the performance metrics and taking the average over the causal paths. We then rank the configuration options according to their ACE:</p><p>, where CE X i &#8805; CE X i+1 for all i3d. Finally, we select a subset of configuration options with the highest ACE:</p><p>and reduce the search space to X &#8834; X (Algorithm 1: lines 4-5).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Phase II: Performance Optimization Through Black-Box Optimization With Limited Budget</head><p>In the configuration optimization phase (lines 6-18), we search for Pareto optimal configurations using an active learning approach that operates in the reduced search space in the target environment. Here, the target environment is typically the target robotic platform that we want to optimize. The assumption is that any intervention in the target environment is costly and that we typically assume a small sampling budget. In some situations, we could assume that the cost of measuring configurations varies. For example, if the likelihood of violating safety confidence is high for a specific configuration, we could assign a higher cost to that configuration because it may damage the robot. We leave this assumption for future work. Specifically, we start by bootstrapping optimization by randomly sampling the reduced configuration space to produce an initial design D = {x 1 , . . . , x n }, where x i &#8712; X. After obtaining the measurements regarding the initial design, CURE then fits a GP model to the design points D to form our belief about the underlying response function. The while loop in Algorithm 1 iteratively updates the belief until the budget runs out: As we accumulate the data S 1:t = {(x i , y i )} t i=1 , where</p><p>and the likelihood function Pr(S 1:t |f T ) form the posterior distribution: Pr(f T |S 1:t ) &#8733; Pr(S 1:t |f T ) Pr(f T ). We describe the steps of Phase II as follows.</p><p>1) Bayesian Optimization With GP: Bayesian optimization is a sequential design strategy that allows us to perform global optimization of black-box functions <ref type="bibr">[49]</ref>. The main idea of this method is to treat the black-box objective function f (x) as a random variable with a given prior distribution and then optimize the posterior distribution of f (x), given experimental data. In this work, we use GPs to model this black-box objective function at each point x &#8712; X. That is, let S 1:t be the experimental data collected in the first t iterations, and let x t+1 be a candidate configuration that we can select to run the next experiment. Then, the probability that this new experiment could find an optimal configuration using the posterior distribution will be assessed</p><p>)) where &#181; t (x t+1 ) and &#963; 2 t (x t+1 ) are suitable estimators of the mean and standard deviation of a normal distribution used to model this posterior. The main motivation behind the choice of GPs as prior here is that it offers a framework in which reasoning can be based not only on mean estimates, but also on variance, providing more informative decision making. The other reason is that all the computations in this framework are based on a solid foundation of linear algebra. Fig. <ref type="figure">3</ref> illustrates Bayesian optimization based on GP using a one-dimensional response surface. The blue curve represents the unknown true posterior distribution, while the mean is shown in green, and the confidence interval 95% is shaded. Stars indicate measurements carried out in the past and recorded in S 1:t (i.e., observations). The configuration corresponding to x 1 has a large confidence interval due to the lack of observations in its neighborhood.</p><p>On the contrary, x 4 has a narrow confidence since neighboring configurations have been experimented with. The confidence interval in the neighborhood of x 2 and x 3 is not large, and correctly our approach does not decide to explore these zones. The next configuration x t+1 , indicated by a small circle on the right side of x 4 , is selected based on a criterion that will be defined later. A GP is a distribution over functions, specified by its mean and covariance</p><p>where k(x, x ) defines the distance between x and x . Assume S 1:t = {(x 1:t , y 1:t )|y i := f (x i )} to be the collection of observations t. The function values are drawn from a multivariate Gaussian distribution N(&#181;, K), where &#181; := &#181;(x 1:t )</p><p>In the while loop in CURE, given the observations we accumulated so far, we intend to fit a new GP model</p><p>where</p><p>] and I is identity matrix. Given (4), the new GP model can be drawn from this new Gaussian distribution</p><p>where</p><p>These posterior functions are used to select the next point x t+1 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2) Configuration Selection Criteria:</head><p>The selection criteria is defined as u : X &#8594; R that selects x t+1 &#8712; X, should f (&#8226;) be evaluated next (step 7)</p><p>Although there are several different criteria in the literature for multiobjective optimization <ref type="bibr">[50]</ref>, <ref type="bibr">[51]</ref>, <ref type="bibr">[52]</ref>, CURE utilizes expected hypervolume improvement (EHVI). EHVI has demonstrated its strength in balancing exploration and exploitation, and in producing Pareto fronts (PFs) with excellent coverage and faster optimization <ref type="bibr">[53]</ref>. EHVI operates by assessing the expected improvement of a given point in the solution space in terms of the hypervolume (HV) measure-a widely accepted metric for comparing the quality of solutions in multiobjective optimization. EHVI is particularly useful in robotic applications, where the solution landscape can be highly complex and multidimensional. The steps of Algorithm 1 are illustrated in that maximizes the selection criteria is used to run the next experiment and provide data to reconstruct a more accurate model [see Fig. 4(d)].</p><p>3) Model Fitting: Here, we provide some practical considerations to make GPs applicable for configuration optimization. In CURE, as shown in Algorithm 1, the covariance function k : X &#215; X &#8594; R dictates the structure of the response function that we fit to the observed data. For integer variables, we implemented the Mat&#233;rn kernel <ref type="bibr">[54]</ref>. The main reason behind this choice is that along each dimension of the configuration response functions, a different level of smoothness can be observed. Mat&#233;rn kernels incorporate a smoothness parameter &#957; &gt; 0 that allows greater flexibility in modeling such functions. The following is a variation of the Mat&#233;rn kernel for &#957; = 1/2</p><p>where r 2 (x i , x j ) = (x ix j ) &#923;(x ix j ) for some positive semidefinite matrix &#923;. For categorical variables, we implement the following <ref type="bibr">[55]</ref>:</p><p>where d is the number of dimensions (i.e., the number of configuration parameters), &#952; &#8467; adjust the scales along the function dimensions, and &#948; is a function gives the distance between two categorical variables using Kronecker delta <ref type="bibr">[55]</ref>, <ref type="bibr">[56]</ref>. TL4CO uses different scales {&#952; &#8467; , &#8467; = 1 . . . d} on different dimensions as suggested in <ref type="bibr">[54]</ref> and <ref type="bibr">[56]</ref>, this technique is called automatic relevance determination. After learning the hyperparameters (step 6), if the &#8467;th dimension turns out to be irrelevant, then &#952; &#8467; will be a small value and, therefore, will be discarded. This is particularly helpful in high-dimensional spaces where it is difficult to find the optimal configuration. Although the kernel controls the structure of the estimated function, the prior mean &#181;(x) : X &#8594; R provides a possible offset for our estimate. By default, this function is set to a constant &#181;(x) := &#181;, which is inferred from observations <ref type="bibr">[56]</ref>. However, the prior mean function is a way of incorporating expert knowledge, and if it is available, then we can use this knowledge. Fortunately, we have collected extensive experimental measurements and based on our datasets, we observed that, for robotic systems, there is typically a significant distance between the minimum and the maximum of each function (see Figs. <ref type="figure">17</ref> and <ref type="figure">18</ref>). Therefore, a linear mean function &#181;(x) := ax + b allows for more flexible structures and provides a better fit for the data than a constant mean. We only need to learn the slope for each dimension and the offset (denoted &#181; &#8467; = (a, b)). Due to the heavy learning computation (step 12 in Algorithm 1), this process is computed only for every N l th iteration. To learn the hyperparameters of the kernel and also the prior mean functions, we maximize the marginal likelihood <ref type="bibr">[56]</ref> of the observations S 1:t . To do that, we train the GP model ( <ref type="formula">6</ref>) with S 1:t . We optimize the marginal likelihood using multistarted quasi-Newton hill climbers <ref type="bibr">[54]</ref>. For this purpose, we used the Ax + BoTorch library. Using the kernel defined in <ref type="bibr">(10)</ref>, we learn &#952; := (&#952; 0:d , &#181; 0:d , &#963; 2 ), which comprises the hyperparameters of the kernel and the mean functions. The learning is performed iteratively, resulting in a sequence of &#952; i for i = 1 . . . N max N &#8467; .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. EXPERIMENTS AND RESULTS</head><p>To evaluate this work, we answer the following research questions (RQs).</p><p>1) RQ1 (Effectiveness): How effective is CURE in the following: 1) ensuring optimal performance; 2) utilizing the budget; 3) respecting the safety constraints compared to the baselines? 2) RQ2 (Transferability): How does the effectiveness of CURE change when the severity of deployment changes varies [e.g., environment and platform change (PC)]? We answered these questions in a robot navigation task, using Husky and Turtlebot 3 platforms. In addition, to illustrate adaptability of CURE to different tasks, we also demonstrate RQ1 on a robot manipulation task, using the Franka Emika Panda platform in Gazebo.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Experimental Setup 1) Robot Navigation:</head><p>We simulate Husky and Turtlebot 3 in Gazebo to collect the observational data by measuring the performance metrics (e.g., planner failed) and performance objectives (e.g., energy consumption) under different configuration settings to train the causal model. Note that we use simulator data to evaluate the transferability of the causal model to physical robots, but CURE also works with data from physical robots. We deploy the robot in a controlled indoor environment and direct the robot to navigate autonomously to the target locations [see Fig. <ref type="figure">5(a)</ref>]. The robot was expected to encounter obstacles and narrow passageways, where the locations of the obstacles were unknown prior to deployment. The mission was considered successful if the robot reached each of the target locations. We fixed the goal tolerance parameters (xy_goal_tolerance=0.2, and yaw_goal_tolerance=0.1) to determine whether a target was reached. We defined the following properties for the ROS navigation stack <ref type="bibr">[44]</ref>.</p><p>1) Task completion rate: T cr = ( Tasks completed )/( Tasks). ) 2 , where N is the total number of time steps, a j (t) is the acceleration of joint j at time t, and &#8710;t is the time interval between consecutive time steps; and 2) Task execution time: The total execution time from picking up an object to placing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Evaluation</head><p>To learn a causal model from the source (a low-cost environment), we generated the values for the configurable parameters using random sampling and recorded the performance metrics (the intermediate layer of the causal model that maps the influence of the configuration options to the performance objective) for different values of the configurable parameters. We use a budget of 200 iterations for each method. When running each method for the same budget, we compare the Pareto front (PF) and Pareto HV. The PF is the set of objective vectors corresponding to all Pareto-optimal configurations in the configuration space X. The Pareto HV is commonly used to measure the quality of an estimated PF <ref type="bibr">[58]</ref>, <ref type="bibr">[59]</ref>. We define the PF and HV as follows:</p><p>where HV(x * , f ref ) resolves the size of the dominated space covered by a nondominated set x * , f ref refers to a user-defined reference point in the objective space, and &#923;(.) refers to the Lebesgue measure. In our experiments, we fixed the f ref points to the maximum observed values of each objective among all the methods.</p><p>To compare the efficiency of each method, we define an efficiency metric &#951; = ( n k=1 T k )/( n k=1 k), where T k is a binary variable taking values 0 or 1, denoting the success of a task during the kth iteration. We also compare the number of unsuccessful execution (e.g., when the robot failed to complete a task) and the number of constraint violations (e.g., when the robot completed the task but violated a constraint). We compared CURE with the following baselines.</p><p>1) MOBO: We implement MOBO using AX <ref type="bibr">[14]</ref>-an optimization framework that can optimize discrete and continuous configurations. 2) RidgeCV <ref type="bibr">[15]</ref>, <ref type="bibr">[16]</ref>: A feature extraction method that selects the important features based on the highest absolute coefficient. We use RidgeCV to determine the important configuration options and generate a reduced search space, which consists of only the important configuration options. We then perform an optimization using MOBO on the reduced search space.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. RQ1: Effectiveness</head><p>We evaluated the effectiveness of CURE in finding an optimal configuration compared to the baselines. We collect observational data by running a mission 1000 times from Husky in simulation under different configuration settings and recorded the performance objectives. In Fig. <ref type="figure">6</ref>, the histograms of performance objectives are depicted along the diagonal line, while scatter plots illustrating pairs of performance objectives are displayed outside the diagonal. The histograms of performance objectives, namely planner failed, recovery executed, obstacle distance, and energy, have shapes similar to one half of a Gaussian distribution. Scatter plots depicting different pairs of performance objectives, such as mission time, distance traveled, and energy, exhibit positive linear relationships. We selected energy and position error as the two performance objectives given the imperative to incorporate uncorrelated objectives in the multiobjective optimization framework, underscored by their lowest correlation coefficient, ensuring the diversity of the optimization criteria. We then learn a causal model using observational data. The search space was reduced according to the estimated causal effects on performance objectives and constraints by selecting top K Fig. 7. function.</p><p>configuration options (e.g., {Energy topK } &#8746; {PoseError topK } &#8746; {Safety topK }) and performed optimization using Algorithm 1.</p><p>1) Setting: For the Husky robot, we set the objective thresholds Energy Th = 40 Wh and PoseError Th = 0.18 m. We compute the HV using ( <ref type="formula">12</ref>) by setting the f ref points at 400 for energy and 35 for position error within the coordinate system. We incorporate the safety constraint h(x) by defining a test case, where the robot must maintain a minimum distance from obstacles to avoid collisions. We incorporate a user defined penalty function (see Fig. <ref type="figure">7</ref>) for each instance 0 &#8804; &#945;h(x) &#8804; 1 that penalizes T cr if h(x) is violated. In Fig. <ref type="figure">7</ref>, Th 1 is a soft constraint threshold and Th 2 is a hard constraint threshold. That is, we penalize T cr gradually if Th 1 &gt; h(x) &gt; Th 2 and give the maximum penalty if h(x) &lt; Th 2 to ensure safety. In our experiments, we set Th 1 = 0.25 and Th 2 = 0.18. We defined the safety constraint:</p><p>where &#952; is a user-defined threshold. In our experiments, we set &#952; = 0.8. For the manipulation task, we set the f ref points at 16 for task execution time and 113 for average trajectory jerk.</p><p>2) Results: CURE performed better than MOBO and RidgeCV-MOBO in finding a PF with a higher HV, as shown in Fig. <ref type="figure">8</ref>. In our experiments, we observed a comparable PF between CURE and MOBO [see Fig. <ref type="figure">8(a)</ref>], which can be attributed to MOBO's exploration of an extensive search space that includes all possible configuration options. On the contrary, CURE confines its exploration to a reduced search space, composing only configuration options with a greater causal effect on performance objectives. Although CURE and MOBO have a similar PF, CURE achieved a higher HV with a less amount of budget [see Fig. <ref type="figure">8(b)</ref>]. Fig. <ref type="figure">9</ref> illustrates the budget utilization of CURE and baseline methods. CURE demonstrated better budget utilization, as reflected in the increased density of purple-colored data points surrounding the PF and the achievement of a higher T cr in fewer iterations compared to the baseline methods. When comparing the penalty response given, we observed CURE selected configuration options that achieved the lower penalty, as shown in Fig. <ref type="figure">8(d)</ref>. Furthermore, CURE outperformed the baselines in terms of efficiency, achieving a 1.3&#215; improvement over MOBO and achieved this improvement 2&#215; faster compared to MOBO [as shown in Fig. <ref type="figure">8(c)</ref>]. RidgeCV-MOBO, however, underperformed, mainly because it was unable to identify the core configuration options influencing the performance objectives [see Figs. <ref type="figure">8(b)</ref>, <ref type="figure">(c</ref>) and 9(b)]. Moreover, CURE continuously outperformed the baselines in the manipulation task (see Fig. <ref type="figure">10</ref>). Therefore, CURE is more effective in finding optimal configurations compared to the baselines.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. RQ2: Transferability</head><p>Understanding CURE's sensitivity to different degrees of deployment changes, such as transfer of the causal model learned from a source platform (e.g., Gazebo simulation) to a target platform (e.g., real robot), is critical. Sensitivity analysis is especially crucial for such scenarios, considering that distribution shifts can occur during deployment changes. We answer RQ2 through an empirical study. We examine different levels of severity in deployment changes, where severity is determined by the number of changes involved. For example, a deployment change is considered more severe when both the robotic platform and the operating environment change, as opposed to changes limited solely to the environment.</p><p>1) Setting: We consider Husky and Turltebot 3 in simulation as the source and Turtlebot 3 physical robot as the target. We evaluate two deployment scenarios (see Fig. <ref type="figure">11</ref>): (i) STR: We trained the causal model using Algorithm 2 on observational data obtained by conducting a mission 1000 times using Turtlebot 3 in Gazebo environment [see Fig. <ref type="figure">5(b)</ref>]. The robot was expected to encounter dynamic obstacles [the trajectories of the obstacles are shown in Fig. <ref type="figure">5(b)</ref>]. The mission was considered successful if Turtlebot 3 reached each of the target locations. Subsequently, we used the causal model learned from simulation (environment A) to the Turtlebot 3 physical robot for performance optimization in two distinct environments (environment B and C). (ii) STR and PC: We consider the change of two categories, the STR and robotic PC. In particular, we applied the causal model used in RQ1 (learned using Husky in simulation) to the Turtlebot 3 physical robot in a real environment, as shown in Fig. <ref type="figure">11</ref>. We use the identical experimental setting for the Husky as described in Section V-C. For Turtlebot 3, we set the objective thresholds, Energy Th = 2 Wh and PoseError Th = 0.1 m. We compute the HV using <ref type="bibr">(12)</ref> by setting the f ref points at 19.98 for energy and 3 for position error within the coordinate system. We also set Th 1 = 0.25 and Th 2 = 0.15 in the penalty function (see Fig. <ref type="figure">11</ref>).</p><p>2) Results: As shown in Fig. <ref type="figure">12</ref>, CURE continuously outperforms the baselines in terms of HV [see Fig. <ref type="figure">12(a)</ref>], PF [see Fig. <ref type="figure">12(b)]</ref>, efficiency [see Fig. <ref type="figure">12(c)</ref>], penalty response [see Fig. <ref type="figure">12(d)]</ref>, and violations and failures [see Fig. <ref type="figure">12(e)</ref>] for each severity changes. Specifically, compared to MOBO, CURE finds a configuration with 1.5&#215; higher HV in STR setting (low severity), and 2&#215; higher HV when we change the platform in addition to sim-to-real (high severity). Moreover, CURE achieved efficiency gains of 2.2&#215;, and 4.6&#215; over MOBO with low and high severity of deployment changes, respectively. To provide insights into the factors contributing to CURE's enhanced performance, we compared constraint violation &#952; V and task failure T F , revealing reductions of 48% in &#952; V , while also demonstrating 28% lower T F under high severity changes compared to RidgeCV-MOBO. Therefore, we conclude that CURE performs better compared to the baseline methods as the deployment changes become more severe.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. PERFORMANCE AND SENSITIVITY ANALYSIS OF CURE</head><p>To explain CURE's advantages over other methods, we conducted a case study employing the same experimental setup described in Section V-C. We also demonstrate CURE's sensitivity by varying the top K values. Our key findings are discussed in the following.   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. CURE's Efficient Budget Utilization Is Attributed to a Comprehensive Evaluation of the Core Configuration Options</head><p>For a more comprehensive understanding of the optimization process, we visually illustrate the response surfaces of three pairs of options, each with varying degrees of ACE in energy. Fig. <ref type="figure">13(b</ref>) contains options with high ACE values, while Fig. <ref type="figure">13(d</ref>) contains only options with lower ACE values. Options with ACE values close to the median are presented in Fig. <ref type="figure">13(c</ref>). We observe that response surfaces with higher ACE values are more complex compared to those with lower ACE values. Fig. <ref type="figure">13(b)-(d</ref>) also shows that CURE explored a range of configurations within the range by systematically varying configurations associated with higher ACE values than those associated with lower ones. In particular, because they have the lowest ACE, the pair of options involving trans_stopped_vel and max_scaling_factor was not considered by CURE in the optimization process, avoiding allocating the budget to less effective options. In contrast, both MOBO and RidgeCV-MOBO wasted the budget exploring less effective options [see Fig. <ref type="figure">13(d)]</ref>. Note that the option pair involving Min_vel_x and scalling_speed in Fig. <ref type="figure">13(b</ref>), which exhibits the highest ACE, was not identified by RidgeCV-MOBO. We also observe that due to having a larger search space (entire configuration space), MOBO struggled to explore regions effectively (exhibits a more denser data distribution) compared to CURE. In our previous study <ref type="bibr">[4]</ref>, we evaluated the accuracy of the key configuration options identified using causal inference through a comprehensive empirical study. Therefore, CURE strategically prioritize core configuration options with high ACE values, ensuring efficient budget utilization and demonstrating a better understanding of such complex behavior, while bypassing less effective options.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. CURE Leverages the Knowledge Derived From the Causal Model Learned on the Source Platform</head><p>In Fig. <ref type="figure">13</ref>(a), we compare the adjacency matrix between causal graphs learned from the source and target platforms, respectively. We compute the adjacency matrix A from a causal graph G = (V, E), where V is the set of vertices and E is the set of edges, as follows:</p><p>where (i, j) represents the edge from vertex i to vertex j.</p><p>In particular, both causal graphs share a significant overlap, providing a rationale for CURE's enhance performance when transferring the causal model learned from a source (e.g., Husky in simulation) to a target (e.g., Turtlebot 3 physical platform). Therefore, a causal model developed on one platform or environment can be leveraged as prior knowledge on another,  demonstrating the cross-platform applicability and usefulness of the acquired causal understanding.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. How Sensitive Is CURE When the Value of Top K Varies?</head><p>We investigate CURE's performance with different K values and how it affects the optimization process. We conduct a single-objective optimization on the Turtlebot 3 platform to demonstrate the sensitivity of CURE. As shown in Fig. <ref type="figure">14</ref>, there is a tradeoff between the top K values and the iterations required to achieve high-quality solutions. Smaller K values allow the optimization process to quickly find low energy values but may limit exploration, leading to early plateauing. Conversely, larger K values enable more extensive exploration, leading to more gradual improvements and potentially better solutions, but requiring more iterations. This is because, when the search space is smaller, the optimization process can exploit known good areas more effectively. In contrast, a larger search space requires more exploration, which extends the optimization process. One approach for selecting K is to define a threshold on the ACE values and select options that exceed this threshold. This can be done by using a threshold defined as {X | X ACE &gt; &#181; ACE + &#963; ACE }, where &#181; ACE is the mean and &#963; ACE is the standard deviation of the ACE values. Alternatively, a threshold based on the percentile of ACE values can be employed, such as selecting options with ACE values greater than the 75th percentile. We leave this selection up to the practitioner as user preferences may vary depending on the task, environment, and robotic system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VII. DISCUSSION</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Usability of CURE</head><p>The design we have proposed is general and extendable to other robotic systems but would require some engineering effort. In particular, to apply CURE to a novel problem, the practitioner must identify the following: 1) configuration options; 2) performance metrics;</p><p>3) key performance indicators (KPIs).</p><p>Note that the abstraction level of the variables in the causal model depends on the practitioner and can go all the way down to the hardware level. In defining the metrics and KPIs, guidelines provided by the National Institute of Standards and Technology can be used <ref type="bibr">[60]</ref>, <ref type="bibr">[61]</ref>. These guidelines help classify variables as nonmanipulable in the three-layer causal model design <ref type="bibr">[4]</ref>, which simplifies the performance modeling process by allowing a clear distinction between configurable and performance variables. Moreover, we provide various performance metrics and performance objectives for mobile robot navigation and robot manipulation tasks in Section V.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Limitations 1) Causal Model Error:</head><p>The NP-hard complexity of causal discovery introduces a challenge <ref type="bibr">[62]</ref>, implying that the identified causal model may not always represent the ground-truth causal relationships among variables. It is crucial to recognize the potential for discrepancies between the causal structure discovered and the actual structures. However, such causal models can still be employed to achieve better performance compared to ML-based approaches in systems optimization <ref type="bibr">[63]</ref> and debugging tasks <ref type="bibr">[4]</ref>, because causal models avoid capturing spurious correlations <ref type="bibr">[45]</ref>.</p><p>2) Potential Biases When Transferring the Causal Model: Caution must be exercised when reusing the entire causal graph learned from the source platform, as differences between causal graphs in the two platforms [as indicated by the green and yellow squares in Fig. <ref type="figure">13</ref>(a), representing edges unique to the source and target, respectively] can induce bias. It is crucial to discover new causal connections [indicated by the yellow squares in Fig. <ref type="figure">13(a)</ref>] on the target platform based on observations. Given the small number of edges to be discovered, this task can easily be accomplished with a limited number of observational samples from the target platform.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Future Directions 1) Incorporating Causal Gaussian Process (CGP):</head><p>Using CGP in the optimization process has the potential to capture the behavior of the performance objective better compared to traditional GP <ref type="bibr">[64]</ref>. Unlike GP, CGP represents the mean using interventional estimates via do-calculus. This characteristic renders CGP particularly useful in scenarios with a limited amount of observational data or in areas where observational data is not available.</p><p>2) Updating the Causal Model at Run-Time: There is potential in employing an active learning mechanism that combines the source causal model G s with a new causal model G t learned from a small number of samples from the target platform. This approach is particularly promising considering the limitations discussed In Section VII-B.</p><p>3) Dynamically selecting top K at Run-Time: In our framework, K is a hyperparameter and its value is defined by the practitioner. Motivated by Fig. <ref type="figure">14</ref>, there is potential for implementing a dynamic selection approach. This approach would start with a lower K and progressively increase the K if the objective reaches a plateau.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VIII. CONCLUSION</head><p>In this article, we presented CURE, a multiobjective optimization method that identified optimal configurations for robotic systems. CURE converged faster than the baseline methods and demonstrated effective transferability from simulation to real robots, and even to new untrained platforms. CURE constructs a causal model based on observational data collected from a source environment, typically a low-cost setting such as the Gazebo simulator. We then estimate the causal effects of configuration options on performance objectives, reducing the search space by eliminating configuration options that have negligible causal effects. Finally, CURE employs traditional Bayesian optimization in the target environment, but confines it to the reduced search space, thus efficiently identifying the optimal configuration. Empirically, we have demonstrated that CURE not only finds the optimal configuration faster than the baseline methods, but the causal models learned in simulation accelerate optimization in real robots. Moreover, our evaluation shows the learned causal model is transferable across similar but different settings, encompassing different environments, mission/tasks, and new robotic systems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>APPENDIX A ADDITIONAL DETAILS</head><p>A. Background and Definitions 1) Configuration Space X: Consider X i as the ith configuration option of a robot, which can be assigned a range of values (e.g., categorical, Boolean, and numerical). The configuration space X is a Cartesian product of all options and a configuration x &#8712; X in which all options are assigned specific values within the permitted range for each option. Formally, we define the following.</p><p>1) Configuration option: X 1 , X 2 , . . . , X d .</p><p>TABLE I CONFIGURATION OPTIONS IN MOVE BASE TABLE II CONFIGURATION OPTIONS IN COSTMAP COMMON</p><p>TABLE III CONFIGURATION OPTIONS IN COSTMAP COMMON INFLATION 2) Option value: x 1 , . . . , x d . 3) Configuration:</p><p>2) Partial Ancestral Graph: Each edge in the PAG denotes the ancestral connections between the vertices. A PAG is composed of the following types of edges.</p><p>1) A B: The vertex A causes B.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2) A</head><p>B: There are unmeasured confounders between the vertices A and B.</p><p>3) A B: A causes B, or there are unmeasured confounders that cause both A and B. 4) A B: A causes B, or B causes A, or there are unmeasured confounders that cause both A and B. For a comprehensive theoretical foundation on these ideas, we refer the reader to <ref type="bibr">[47]</ref>, <ref type="bibr">[65]</ref>, <ref type="bibr">[66]</ref> 3) Causal Model G: A causal model is an ADMG <ref type="bibr">[67]</ref>, <ref type="bibr">[68]</ref>, which encodes performance variables, functional nodes (which defines functional dependencies between performance variables such as how variations in one or multiple variables determine variations in other variables), causal links that interconnect performance nodes with each other via functional nodes. An ADMG is defined as a finite collection of vertices, denoted by V , and directed edges E d (ordered pairs E d &#8834; V &#215; V , such that (v, v) &#8712; E d for any vertex v), together with bidirected edges,</p><p>TABLE IV CONFIGURATION OPTIONS IN DWAPLANNERROS</p><p>denoted by E b (unordered pairs of elements of V ). If (v, w) &#8712; E b then v &#8596; w, and if in addition (v, w) &#8712; E d then v &#8596; &#8594; w.</p><p>4) Causal Paths P X Y : We define P = v 0 , v 1 , . . . , v n so that the following conditions hold.</p><p>1) v o is the root cause of the functional fault and v n = y F . 2) &#8704; 0 &#8804; i &#8804; n, v i &#8712; V and &#8704; 0 &#8804; i &#8804; n, (v i , v i+1 ) &#8712; (E d &#8744; E b ). 3) &#8704; 0 &#8804; i &#8804; j &#8804; n, v i is a counterfactual cause of v j . 4) |P | is maximized. 5) Why Do Robotic Systems Fail?: A robotic system may fail to perform a specific task or deteriorate from the desired performance due to the following.</p><p>1) Hardware faults: Physical faults of the robot's equipment (e.g., faulty controller).</p><p>TABLE V CONFIGURATION OPTIONS IN MOVEIT CHMOP PLANNING TABLE VI ACE VALUES OF THE CONFIGURATION OPTIONS 2) Software faults: Faulty algorithms and/or faulty implementations of correct algorithms (e.g., incorrect cognitive behavior of the robot). 3) Interaction faults: Failures that result from uncertainties in their environments. The software stack is typically composed of multiple components (e.g., localization and navigation), each with a plethora of configuration options (different planner algorithms and/or parameters in the same planner algorithm). Similarly to software components, hardware components also have numerous configuration options. Incorrect configurations can cause a functional fault (the robot cannot perform a task successfully) or a nonfunctional fault (the robot may be able to finish tasks, but with undesired performance).</p><p>6) Nonfunctional Fault: The nonfunctional faults (interchangeably used as performance faults) refer to cases where the robot can perform the specified task but cannot meet the specified performance requirements of the task specification. For example, the robot reached the target location(s); however, it consumed more energy. We define the nonfunctional property N = {p 1 , . . . , p n }, where p 1 , . . . , p n represents different nonfunctional properties of the robotic system (e.g., energy, mission time) and p j is the value of jth N. The specified performance goal is denoted as p js . Performance failure occurs when p j |= p js . Extending the previous scenario, let E i be the energy consumption during task i and let T i be the mission completion time. The specified performance goals for the task are indicated as E s-&gt;t &lt;= en, T s-&gt;t &lt;= tt, respectively. A nonfunctional fault can be defined as N F = (E i &gt; en) &#8744; (T i &gt; tt).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Additional Details About Experimental Setup 1) Configuration Options in ROS Nav</head><p>Core and Moveit: Table <ref type="table">I</ref>-IV show the configuration space for each component in the ROS navigation stack and Table <ref type="table">V</ref> shows the configuration space in Moveit chomp planning used in our experiments. We fixed the goal tolerance parameters (xy_goal_tolerance and yaw_goal_tolerance) to determine if a target was reached. Complex interactions between options (intra or inter components) give rise to a combinatorially large configuration space.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Additional Details for Evaluation 1) RQ1 Additional Details:</head><p>We also compared &#952; V and T F , revealing reductions of 8.5% in &#952; V , while also demonstrating lower 13.5% T F compared to MOBO as shown in Fig. <ref type="figure">16</ref>.   <ref type="table">VI</ref> shows the corresponding ACE values of the configuration options on the performance objectives and constraints. We set the top K = 5, represented by blue. Note that CURE reduces the search space from 34 configuration options to 10 by eliminating configuration options that do not affect the performance objective causally.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2) ACE Values of Configuration Options: Table</head><p>3) Observational Data Additional Details: In Figs. 17 and 18, we visualize the interactions between core configuration options (pairwise) and their influence on the energy, position error, task success rate, and the safety constraint from the observational data. We observe that the surface response of configuration options with higher ACE values is complex than those with lower ACE values.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Authorized licensed use limited to: University of South Carolina. Downloaded on July 17,2025 at 15:32:43 UTC from IEEE Xplore. Restrictions apply.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1"><p>Instead of random samples, other partial designs (e.g., Latin Hypercube) could have been used, however, we experimentally found that random samples give rise to more reliable conditional independence tests in the structure learning algorithm.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2"><p>Here, the source environment could be a simulator like Gazebo or another robotic platform. The assumption is that the source is an environment in which we can intervene at a lower cost.Authorized licensed use limited to: University of South Carolina. Downloaded on July 17,2025 at 15:32:43 UTC from IEEE Xplore. Restrictions apply.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3"><p>e.g., there should not be any causal connections between configuration options and their values are determined independently.</p></note>
		</body>
		</text>
</TEI>
