<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>A Cost-Aware Multi-Agent System for Black-Box Design Space Exploration</title></titleStmt>
			<publicationStmt>
				<publisher>American Society of Mechanical Engineers</publisher>
				<date>08/21/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10538707</idno>
					<idno type="doi">10.1115/1.4065914</idno>
					<title level='j'>Journal of Mechanical Design</title>
<idno>1050-0472</idno>
<biblScope unit="volume">147</biblScope>
<biblScope unit="issue">1</biblScope>					

					<author>Siyu Chen</author><author>Alparslan Emrah Bayrak</author><author>Zhenghui Sha</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Effective coordination of design teams must account for the influence of costs incurred while searching for the best design solutions. This article introduces a cost-aware multi-agent system (MAS), a theoretical model to (1) explain how individuals in a team should search, assuming that they are all rational utility-maximizing decision-makers and (2) study the impact of cost on the search performance of both individual agents and the system. First, we develop a new multi-agent Bayesian optimization framework accounting for information exchange among agents to support their decisions on where to sample in search. Second, we employ a reinforcement learning approach based on the multi-agent deep deterministic policy gradient for training MAS to identify where agents cannot sample due to design constraints. Third, we propose a new cost-aware stopping criterion for each agent to determine when costs outweigh potential gains in search as a criterion to stop. Our results indicate that cost has a more significant impact on MAS communication in complex design problems than in simple ones. For example, when searching in complex design spaces, some agents could initially have low-performance gains, thus stopping prematurely due to negative payoffs, even if those agents could perform better in the later stage of the search. Therefore, global-local communication becomes more critical in such situations for the entire system to converge. The proposed model can serve as a benchmark for empirical studies to quantitatively gauge how humans would rationally make design decisions in a team.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>effective teamwork, in which members share information and resources, is crucial to optimize resource use and ensure that DSE meets budget constraints.</p><p>In this article, we quantitatively investigate the impact of cost on design decisions in teams for problems with unknown objective and constraints. We present a novel and unified framework that incorporates the three characteristics mathematically. Assuming idealized decision-making behaviors, we use this framework to define how design teams should work under different cost scenarios. This framework consists of a cost-aware multi-agent system (MAS) based on Bayesian optimization (BO) and reinforcement learning to model the sequential decision-making process of a rational design team in the exploration of complex design spaces with unknown constraints. Figure <ref type="figure">1</ref> shows an overview of this MAS. In particular, our study answers the following research question: What impact would the cost-aware stopping criteria have on collaboration between multiple agents in design space exploration?</p><p>The core contributions of this article revolve around three decision-making principles in a cost-aware MAS strategy to find the global optimum.</p><p>(1) Where to sample: Each agent in the MAS decides where to sample based on a multi-agent Bayesian optimization <ref type="bibr">[12]</ref>, a strategy to identify optimal solutions within complex design spaces by enabling information exchange between agents, which enhances the collective intelligence and effectiveness of an MAS. <ref type="bibr">(2)</ref> Where not to sample: We address the issue of unknown constraints using MARL formulated based on multi-agent deep deterministic policy gradient (MADDPG) <ref type="bibr">[13]</ref>. This approach enables agents to recognize and adapt to constraints autonomously. If the suggested points of the multi-agent Bayesian optimization (MABO) fall within those recognized constraints, agents sample around the infeasible regions. (3) When to stop: We develop a cost-aware stopping criterion for each agent based on two elements: information gain (IG) and performance gain (PG). IG is the potential improvement that the agent could get in the future, while PG is the performance that the agent has already received. By adjusting the parameters of IG and PG, the agent can balance between what it has already gained (PG) and potential new improvements (IG). Each agent is equipped with knowledge of its own stopping criterion based on the gains it has achieved in relation to the cost it has incurred.</p><p>The remainder of the article is structured as follows. Section 2 presents an overview of existing computational approaches in DSE and presents the contribution. Section 3 presents the technical background on BO and multi-agent reinforcement learning as preliminaries for the proposed model. Next, Sec. 4 delves into the technical details of the main proposed framework, presenting the problem formulation and how the MAS navigates in the design space. Section 5 presents the experimental settings and results under different design scenarios with varying complexity. Section 6 provides a discussion of the research findings and draws insight into the impact of cost on collaboration in design teams. Finally, Sec. 7 concludes the article with a summary of the findings and limitations that lead to the future work.</p><p>2 Bayesian Models of Design Decision-Making DSE often presents itself as a black-box optimization challenge, particularly when the configuration space or the form of the function is unknown or not well defined. In such scenarios, the process of sampling subsequent design candidates for evaluation becomes a crucial decision-making task that is highly dependent on experiences. BO employs knowledge-based reasoning to navigate these unknown design spaces, using insights obtained from historical data and experiences <ref type="bibr">[14]</ref>, which offers a systematic approach to providing informed design recommendations that guide sampling decisions. In what follows, we discuss the three challenges that this study aims to address within the BO literature.</p><p>2.1 Team-Based Decisions. BO is a commonly employed methodology to model design search in black-box problems. Conventionally, BO treats each design experiment sequentially, with a new one proposed only after completing the previous one <ref type="bibr">[6]</ref>. However, this method can be time consuming when dealing with complex design spaces due to the step-by-step approach to finding the optimum. Advancements in computational and communication technologies, as detailed by Kontar et al. <ref type="bibr">[15]</ref>, have made it possible for MAS as a design team to handle complex DSE based on BO. In another study, Peralta et al. <ref type="bibr">[16]</ref> develop an MABO for multiobjective optimization, with the aim of enhancing the availability and affordability of water quality monitoring. Since solving such complex problems requires teams in practice, we introduce the MABO framework as a model of teamwork in our recent study <ref type="bibr">[12]</ref>. This MABO framework can significantly improve convergence through the global-local communication strategy, enabling faster identification of optimal solutions in complex design spaces.</p><p>2.2 Design Constraints. In many practical design problems, certain regions of the space are infeasible due to design constraints 011703-2 / Vol. 147, JANUARY 2025</p><p>Transactions of the ASME <ref type="bibr">[7]</ref>. If the constraints are predefined, they can be integrated into the acquisition function (AF) to be maximized in the BO process. However, the cases where constraints are not previously predicted pose a greater challenge. In response to this challenge, many constraint-handling techniques have been developed. The augmented Lagrangian relaxation method is one such technique that integrates constraints into a Lagrangian function to be optimized, making them amenable to BO. Although originally developed for gradientbased optimization, it has been used in black-box optimization without requiring an explicit formulation of the constraints <ref type="bibr">[17]</ref>. However, this approach involves nonstationary surrogate models that lead to modeling complexities, as highlighted in Ref. <ref type="bibr">[18]</ref>. The second technique integrates AFs with a probability of feasibility such as constrained expected improvement (cEI) or constraintweighted expected improvement (EI) <ref type="bibr">[7,</ref><ref type="bibr">19]</ref>. However, this method requires the best current observation, which poses difficulties for noisy experiments. Letham et al. <ref type="bibr">[20]</ref> addressed this by extending cEI to noisy observations, although it remains sensitive to highly constrained problems. Bernardo et al. <ref type="bibr">[21]</ref> proposed an integrated expected conditional improvement AF, which defines an expected reduction in EI with limited satisfaction probability to allow infeasible regions to provide information. Despite existing methodologies that reduce the likelihood of sampling within infeasible regions, a significant degree of uncertainty persists. In this study, we present an approach that integrates MARL into the proposed MABO framework to prevent the agent from sampling in infeasible regions. In this approach, while direct sampling is prohibited in infeasible areas, agents still obtain information at the place that is closest to the infeasible regions as their best attempts of samples.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Cost of Design</head><p>Search. Most BO processes work for a predefined number of iterations. This approach, while beneficial for comparative studies, does not realistically reflect the constraints in team decision-making scenarios, where the number of design evaluations is often bound by budget constraints <ref type="bibr">[22,</ref><ref type="bibr">23]</ref>. In traditional BO, the next sample is chosen solely based on the maximization of an AF, such as EI <ref type="bibr">[24,</ref><ref type="bibr">25]</ref>, without considering the budget. However, a more practical alternative, which aligns with the purpose of the present article, is to incorporate the cost of sampling into the decision-making process <ref type="bibr">[26]</ref>. By selecting the option with the highest net value, calculated as the maximum EI minus the cost of further evaluation, we address the practical challenge of limited resources. This strategy underscores the necessity of including sampling costs in the stopping criteria for each agent in BO. In this article, we develop a cost-aware stopping criterion aligned with the design team's decision-making process.</p><p>In this article, a cost-aware stopping criterion is represented by a net value or utility score. This score is calculated as the cumulative gain minus the total cost incurred so far. Therefore, evaluating the cumulative gain is essential for the decision-making process. In most BO studies, the decision on where to sample next is usually based on an AF, as mentioned previously. The rationale behind this approach is that the AF determines the potential net value or information gain that can be achieved in the subsequent step. However, focusing solely on AF overlooks a crucial aspect of the decision-making process: the actual performance achieved by the agent in previous iterations. This performance is often quantified as "regret," a metric used to determine when to terminate the BO process. A study by Lorenz et al. <ref type="bibr">[27]</ref> suggests an approach based on the Euclidean distance (ED) as their regret. They recommend terminating the BO algorithm when the ED between the point of the most recent observation and the forthcoming observation falls below a certain threshold. A notable method proposed by Ref. McLeod et al. <ref type="bibr">[28]</ref> involves the concept of regret, splitting it into local and global components as the stopping criterion for BO. However, it is plausible that even though regret falls below a certain threshold, indicating the performance gain in the system, the value of the AF, as a measure of potential gain, might still be large. In <ref type="bibr">[29]</ref>, it was found that when human designers make the decision to stop evaluating, they tend to look a few steps further after achieving the best design to reduce uncertainty in the process. In this study, we propose a similar cost-aware stopping criterion based on a combination of both actual performance gains and information gains (value of the AF).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Preliminaries</head><p>3.1 Bayesian Optimization. Commonly used as a global optimization method, BO searches for the optimum of a black-box objective function f (x), where x &#8712; A; A is a domain of d dimensional design space, where A &#8838; R d . BO relies on two main components: (1) a statistical inference method, typically a Gaussian process (GP) regression, to model the unknown objective function value based on the collected data, and (2) an acquisition function (AF) to determine where to sample within the design space <ref type="bibr">[30]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.1">Gaussian Process.</head><p>Gaussian process is a commonly used statistical inference model, which defines a distribution over possible unknown functions <ref type="bibr">[31]</ref>. BO realizes the reasoning about f (x) by choosing an appropriate Gaussian process prior:</p><p>where the set of observations is</p><p>.., &#956; 0 x k ( ) is the mean vector by evaluating a mean function &#956; 0 at each x 1 ,..., x k , and</p><p>) between each observation. Given the observation data D, the posterior probability distribution is defined as follows <ref type="bibr">[30]</ref>:</p><p>where &#956;(x) denotes the posterior mean and &#963; 2 (x) denotes the posterior variance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.2">Acquisition Function.</head><p>AF is used to identify the next point to sample within the design space. This function utilizes the probabilistic surrogate model, which is described in Sec. 3.1.1 and approximates the objective function. When choosing the next observation, an AF is optimized <ref type="bibr">[30]</ref>. AFs are developed to balance the exploration of new areas in the design space and the exploitation of areas already known to provide high-value results.</p><p>Chaudhari et al. <ref type="bibr">[32]</ref> categorize AFs primarily into two groups: expected utility (EU) based and heuristic based. A widely used EU-based model is EI, while lower confidence bound (LCB) <ref type="bibr">[33]</ref> is representative of a heuristic model. In Ref. <ref type="bibr">[32]</ref>, they show that the heuristic model provides the best descriptive model of the sequential information acquisition process. Furthermore, it is possible to create distinct sampling strategies for different agents by adjusting the LCB parameter, which is instrumental in managing the balance between exploitation and exploration. Thus, we adopted the LCB as a typical acquisition function of the heuristic model in this study. Its formulation is given as follows:</p><p>where &#956; represents the mean function of the posterior probability distribution for f and &#963; is its standard deviation. In addition, &#955; &gt; 0 is a parameter that determines the trade-off between exploitation and exploration.</p><p>With AF a LCB presented in Eq. ( <ref type="formula">3</ref>), the next sampling point x is chosen as the one that minimizes this LCB. The main idea behind LCB is to find a balance between evaluating points where the function value (mean) is expected to be low (exploitation) and evaluating points where the uncertainty about the function value is high (exploration). By subtracting a scaled version of the uncertainty from the mean, LCB encourages the algorithm to explore regions of the search space where there is a lot of uncertainty, but it also tends to exploit regions with low expected function values.</p><p>3.2 Multi-Agent Reinforcement Learning. In this article, we apply an MARL based on a MADDPG <ref type="bibr">[13]</ref> to implement a constraint handling mechanism for MAS when sampling in a constrained design space autonomously. By this model, agents can find the shortest path to targets suggested by the global evaluator and avoid sampling in infeasible regions.</p><p>The MADDPG framework for training the MAS is shown in Fig. <ref type="figure">2</ref>. The core idea of this framework is centralized training and decentralized execution. Within this framework, each agent in MAS has a specific actor-critic network, which is trained through the deep deterministic policy gradient algorithm <ref type="bibr">[34]</ref>. During the training process, a centralized critic network for each agent Q i is updated using shared observations o i and actions a i of all agents in the MAS. This network evaluates the efficacy of the actions a i proposed by the actor network to optimize the policy &#960; i using a policy gradient methodology. In the execution process, each agent relies solely on its actor network to offer a deterministic policy &#960; i . This policy guides the updates of actions a i based on environmental observations o i .</p><p>Assume that N agents are set for exploration in the design space, &#952; = [&#952; 1 , &#952; 2 , . . . , &#952; N ] are the parameters for deterministic policies for N agents &#956; = [&#956; &#952;1 , &#956; &#952;2 , . . . , &#956; &#952;N ], then the policy gradient for agent i can be given as follows:</p><p>where x = [o 1 , . . . , o N ] denotes the state of MAS, Q &#956; i is the value function, a i and o i are the action and observation, respectively, of agent i. D r represents the experience replay buffer containing a series of tuples (x, x &#8242; , a 1 , . . . , a N , r 1 , . . . , r N ), where x &#8242; is the new state. The critic network Q &#956; i is updated by the loss function as follows:</p><p>where y = R i + &#947;Q &#956; &#8242; i (x &#8242; , a &#8242; 1 , . . . , a &#8242; N ), R i is the reward for the agent i, designed by the tasks for the MAS in a specific scenario. The actor network is updated by minimizing the policy gradient as follows:</p><p>where S is a random minibatch size and n is its index.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Methods</head><p>4.1 Problem Setup. In this section, we show a problem formulation to find the minimum black-box function in a d dimensional design space A &#8838; R d with N agents in a team (i.e., an MAS). The goal of agent i, where i &#8712; {1, 2, . . . , N}, is to find the location of global minimum x * :</p><p>where f (&#8226;) represents a black-box objective function and x = (x 1 , x 2 , . . . , x d ) &#8712; R d is an element of the set A.</p><p>We assume that the design space is partitioned into N local regions a priori, with each agent assigned to a unique region A i within the design space A for the division of labor between agents in the MAS. In this search process, agents are not allowed to extend their search beyond their local regions. Instead, they communicate by sharing sampled local points with a global evaluator. There exist infeasible regions (constraints) in the design space, denoted as D i &#8838; A i .  Figure <ref type="figure">3</ref> shows an example of this design space exploration with unknown constraints based on an MAS with three agents. In this example, the objective function is shown in Fig. <ref type="figure">3(a)</ref>. The design space has been arbitrarily divided into three regions, as shown in Fig. <ref type="figure">3(b)</ref>. In each region, only one agent is responsible for searching the local space to find the global minimum (star in Area 3). Agents are not allowed to sample points within infeasible regions D i (circles in Area 1, 2, and 3) in each local space A i . It is worth noting that the impact of design space partitioning on MAS performance is not within the scope of this study. We adopt a particular strategy that divides the design space into N regions in every scenario to illustrate our approach and study the impact of cost on coordination between design agents. 4.2 Three Decisions for Design Space Exploration. The decision-making process of each agent in the proposed MAS involves three key decisions: where to sample, where not to sample, and when to stop. Initially, the MAS is trained via MARL, utilizing MARL to facilitate the identification of infeasible regions by the MAS in the decision of where not to sample. Subsequent to training, during implementation, MABO is adopted to determine where to sample next, and a proposed cost-aware stopping criterion guides each agent in deciding when to stop.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.1">Decision 1:</head><p>Where to Sample. In this study, we apply MABO to model each agent's decision-making process about where to sample within a team, aiming to optimize an unknown objective function by assigning local regions to each agent <ref type="bibr">[12]</ref>. By enabling collaboration between agents, the system can search the complex design space more efficiently.</p><p>We use the MABO framework shown in Fig. <ref type="figure">4</ref> from our previous study <ref type="bibr">[12]</ref>. First, the search space is divided into local regions. The number of these regions corresponds to the number of agents participating in exploration. Each region has an agent performing a local search on a specific segment of the objective function. To foster collaboration and information exchange among agents, a global-local communication strategy is enabled. This mechanism allows each local agent to share its sampling points with a global evaluator. This global evaluator consolidates data across all local searches, ensuring that the search process benefits from all the individual agents' information. When determining the next sample point, each agent works within its local region. However, rather than having access to the evaluation of the acquisition function across the entire design space, each agent is restricted to the evaluation of the acquisition function within its local region. Consequently, each agent makes its decision about the design that maximizes the value of the local acquisition function.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.2">Decision 2:</head><p>Where Not to Sample. Prior to implementing the MAS for DSE, it is initially trained using the MARL framework, as introduced in Sec. 3.2. After training, the objective of MARL is path planning, transitioning from one assigned sampling point to the next as directed by a global evaluator while avoiding infeasible areas during the sampling process.</p><p>MAS agents in the DSE need to learn how to take action a i in a state x i based on the system reward R they receive <ref type="bibr">[35]</ref>. In this context, the state x i is defined as the location of the agent in the design space, which is the value of the design variables. a i indicates how much the design variable x i needs to be increased or decreased.</p><p>In Sec. 3.2, we introduced that MADDPG contains a centralized training process and decentralized execution. This centralized training process generates optimal policies &#960; i for each agent through this framework. Every agent has the ability to perform actions a i to reach a specific target x * i in its local region. In the training process, we define x * i as a random location in the design space that excludes unknown infeasible regions. In the implementation of MAS after training, the target x * i is set as the location where to sample next suggested by the global evaluator. After taking action a i &#8712; R d , the agent updates its current state x</p><p>During the decentralized execution in each episode, agents do not necessarily need to communicate or synchronize their actions with each other. Each agent independently decides its action based on its current state and the policy learned through the MADDPG algorithm.</p><p>This training process is described in Algorithm 1. During this process, the MAS undertakes the following two tasks:</p><p>(1) Target tracking, enabling agents to execute actions within the design space; (2) Infeasible region detection, determining whether the targets are located in infeasible regions. In order to complete the two tasks, a well-structured system reward R is required, which measures the quality of the action a i for each agent. Therefore, we develop a reward mechanism specifically for target tracking and infeasible region detection as follows:</p><p>(1) Target tracking reward, R o i , is calculated according to the distance between the agent and the target</p><p>, where x i is the location of the agent i;</p><p>(2) Reward for infeasible region detection R c i , is calculated as</p><p>, where N c the number of collision times. The rationale for not structuring the reward for detecting unknown infeasible regions as the distance between agents and these areas is that the agents need to sample points close to infeasible regions, which could be the potential global optimum.</p><p>Note that this model is configured in a cooperative setting. This means that each agent within the system is not only concerned with maximizing its own reward R i but also contributes towards maximizing the total system reward R. The total system reward R is defined as the aggregate of individual rewards R i obtained by each agent in the system.</p><p>Algorithm 1 MARL training process via MADDPG Initialize the locations of constraints D i &#8704;i = {1, 2, N}, MAXepisode and MAXstep for k = 1 to MAXepisode do Initialize R o i &#8592; 0, R c i &#8592; 0, for agent i = 1 to N do Receive initial state x i Random generate sampling point x * i for t = 1 to MAXstep Select action a i based on policy &#960; i Update new state</p><p>Return system reward R Store experience in replay buffer Sample a random minibatch of S in replay buffer Update the critic network by minimizing the loss in Eq. ( <ref type="formula">5</ref>) Update the actor network by policy gradient in Eq. ( <ref type="formula">6</ref>) end for end for</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.3">Decision 3: When to Stop.</head><p>To include cost considerations in the design process, we establish a stopping criterion for each agent based on a cost-aware utility function that we propose, presented as Eq. <ref type="bibr">(8)</ref>. Each agent stops the search if the utility score U is less than or equal to 0.</p><p>In this equation, the total number of iterations taken is represented by K, while c indicates the cost associated with each sample to be defined. G refers to the cumulative gain achieved by the agent, defined as G = &#969; PG PG + &#969; IG IG. Here, IG is information gain, calculated as IG = K k=1 a LCB, k . Here, a LCB, k represents the normalized value of the acquisition function at the kth step. Performance gain is represented by PG and calculated as</p><p>). Here, f * k represents the normalized best value at step k. Intuitively, PG represents the cumulative performance each agent already achieves in each step, and IG indicates the potential gain it can achieve in the next sampling iteration. It is important to note that we use normalized values for both PG and IG to ensure a fair trade-off between performance gain and information gain. The tunable parameters &#969; PG and &#969; IG help to strike a balance between these two elements.</p><p>A key parameter in this article is the value of c, which has a significant impact on the agents' decision about when to stop. In particular, agents within the MAS exhibit varying sampling behaviors and influence overall system performance. Specifically, we experiment with both the same and agent-specific cost configurations. Furthermore, variations in the weights of IG and PG, represented by &#969; PG and &#969; IG , affect cumulative gains and the stopping criterion. Exploring cost settings in different &#969; PG and &#969; IG configurations plays an essential role in observing agent behaviors about when to stop in the MAS.</p><p>In this study, three cost-setting strategies are designed in this cost-aware approach:</p><p>&#8226; Strategy A: Different costs for each value of &#969; PG and &#969; IG , different costs for each agent; &#8226; Strategy B: Same cost for each agent, different costs for each value of &#969; PG and &#969; IG ; &#8226; Strategy C: Different costs for each agent, same costs for each value of &#969; PG and &#969; IG .</p><p>By adjusting the costs, we observe variations in the sampling behavior of individual agents and the interactions between agents within the MAS.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Cost-Aware Multi-Agent System for Design Space</head><p>Exploration. In this section, we build the cost-aware MAS for design space exploration and integrate the process of making three decisions in this method. The structure of the design space exploration process with an MAS we propose is illustrated in Fig. <ref type="figure">5</ref>. Algorithm 2 presents our cost-aware search strategy.</p><p>We begin by training the MAS to identify unknown constraints using the MADDPG framework, as outlined in Algorithm 1.</p><p>After training, the agents can sample points within the design space. Before MAS starts sampling this design space, the design space A is divided into local areas A i , and each is assigned to a specific agent i. Within their local region, each agent receives a prior Gaussian process given the initial observations D.</p><p>Our algorithm allows MAS agents to navigate a series of sampling iterations K. For each iteration, a global evaluator calculates the posterior mean and variance using all points sampled by local agents. The acquisition function is then calculated using the posterior mean and variance to guide the next sampling decisions for local agents. Each local agent only has access to the acquisition function evaluated in its local area. The agents select the design that offers the highest value of the acquisition function within this local space.</p><p>Next, we employ a cost-aware stopping criterion, which facilitates a balance between PG and IG, to quantify the utility scores Fig. 5 Process flow for the cost-aware search 011703-6 / Vol. 147, JANUARY 2025</p><p>Transactions of the ASME of points the agents sampled and decide when to stop. If the value of the utility function in Eq. ( <ref type="formula">8</ref>) exceeds zero, the relevant x * i suggested by the global evaluator is communicated to the trained agent. This allows agents to take action within the design space, meaning that they can transition from the current points to the next suggested points and sample. In cases where the suggested points fall within infeasible regions, agents are restricted from sampling within those areas. However, they can still observe the function f at the destination x i around the infeasible regions and send this information back to the global evaluator. Following the agents' actions, all associated data, including points and their associated function values, are collected into the global evaluator. If the value of the cost-aware utility function falls below or is equal to zero, which means that further sampling will not yield beneficial outcomes, the sampling process is terminated.</p><p>For a practical demonstration of Algorithm 2, refer to Fig. <ref type="figure">3</ref>(a) that showcases the objective function, and the contour plot shown in Fig. <ref type="figure">3(b)</ref>. Figure <ref type="figure">6</ref> shows the corresponding sampling process under these constraints, involving three agents that are equipped with global information. These agents perform tasks in unique local regions, and dashed lines represent the paths of their operation. We present the sampling point in each iteration as a filled point, each of which is distinctly labeled by a numerical index. The interaction of the agents with their respective environments is critical to address the unknown constraints. For instance, when the agents get the points suggested by the global evaluator (represented as hollow points), like the point with index 5 in area 2, the points with index 0 and 11 in area 3, these points may sometimes fall within certain constraints, as shown by the gray circle. Accordingly, the agents have the capacity to sample points close to these constraints within feasible regions, like the filled points with index 5 in area 2, with index 0 and 11 in area 3, although the suggested points are located in the constraints. As a result, MAS has the ability to interact with the environment, allowing agents to address constraints without wasteful sampling steps and to send information about locations close to constraints to global evaluators. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Experiments</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Experimental Setups</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.1">MARL Settings.</head><p>We use multi-agent particle environments to perform the experiments based on the environment presented by Ref. <ref type="bibr">[13]</ref>. This environment is set in a two-dimensional space and is equipped with three agents and three infeasible regions, serving as a representation of a 2D unknown design space. During the training process, as detailed in Algorithm 1, we set specific parameters, MAXepisode = 100,000 and MAXstep = 20, in each episode.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.2">MABO Settings.</head><p>Three benchmark functions of varying complexity, from simple to complex, were evaluated: (1) the Cosines function, (2) the Michalewicz function, and (3) the Eggholder function, as illustrated in Table <ref type="table">1</ref> and Fig. <ref type="figure">7</ref>,to Fig. <ref type="figure">6</ref> An example of a sampling process involving three agents that possess global information, with each sampling point (filled points) labeled by its index. The dashed lines are the sampling trajectories of three agents. When the suggested points (hollow points, e.g., index 5 in area 2, index 0 and 11 in area 3) are located in the constraints (the gray circle areas), the agents can sample the corresponding points closest to the constraints in the feasible regions (filled points, index 5 in area 2, index 0 and 11 in area 3).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1 Three black-box functions</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Name</head><p>Formula Global minimum Global domain A</p><p>understand the impact of the cost-aware stopping criterion on the collaborative search behaviors between agents. Specifically, the Eggholder function introduces greater complexity than the Cosines and Michalewicz functions because of its numerous local minima and maxima within the search space. These functions were selected on the basis of their distinct mathematical properties, which present various challenges to optimization algorithms. These functions serve as benchmarks in the BO literature and are widely recognized in the global optimization context <ref type="bibr">[36]</ref><ref type="bibr">[37]</ref><ref type="bibr">[38]</ref>.</p><p>For each scenario, we applied LCB as the acquisition function, represented by Eq. ( <ref type="formula">3</ref>), and &#955; is fixed as 2.5 in each scenario. The initial number of samples for the Gaussian prior distribution,</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Experimental Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.1">Results</head><p>Without a Stopping Criterion. By executing the proposed method with a predefined number of iterations without specifying any cost information for the agents, our objective is to understand how many steps it would take to achieve the convergence and therefore to identify the appropriate cost c to be set in different objective functions of varying complexities. Note that in a real-world design problem, the cost of design iterations for a designer is set externally by the problem context. However, in our theoretical (context-free) study, we need to choose a meaningful value for the cost to study its impact on team collaboration. Instead of choosing arbitrary values for the cost, we set the cost for sampling a new design based on a baseline study in which agents work together without any consideration of cost. On the basis of convergence results of this baseline study, we choose different values for c following the three strategies described earlier. In this context, the convergence of the MAS is characterized by the ability of the agents to achieve the global optimum. When the global optimum falls within the infeasible region, the convergence is then defined by the MAS's inability to achieve further improvement. The experimental results with fixed iterations of these objective functions are shown in Figs. <ref type="figure">8</ref><ref type="figure">9</ref><ref type="figure">10</ref>.</p><p>(1) Cosines function. Each local region to which each agent is assigned and the infeasible region defined in each local region are shown in Table <ref type="table">2</ref>. The sampling process and the search trajectory (indicated by the number index) of each agent in its own region are displayed in Fig. <ref type="figure">8</ref>(a). The best f (x) (i.e., f * in Eq. ( <ref type="formula">3</ref>)) observed so far in each step shown in Fig. <ref type="figure">8</ref>  The trends of cumulative gains G defined in Eq. ( <ref type="formula">8</ref>) during search are shown in Figs. <ref type="figure">8(c</ref>), 9(c), and 10(c). Specifically, the cosines function shows a linear increase in cumulative gain across iterations (see Fig. <ref type="figure">8(c)</ref>). On the other hand, the Michalewicz function follows a convex trend (see Fig. <ref type="figure">9(c</ref>)), indicating an increase in growth with a decreasing rate in each iteration. Meanwhile, the Eggholder function exhibits a stepwise behavior (see Fig. <ref type="figure">10(c</ref>)), with significant gains at certain iterations and minimal gains at others. These varying cumulative gain patterns can affect individual agents in the system in deciding when to 011703-8 / Vol. 147, JANUARY 2025</p><p>Transactions of the ASME stop and the convergence in the MAS. The unique trend of each cumulative gain exhibited in these three tested objective functions is another reason we chose them in this study, besides the varying complexity of the three functions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.2">Results</head><p>With Stopping Criterion. We evaluated three cost-aware strategies, introduced in Sec. 4.2.3, on three objective functions with varying complexities. Table <ref type="table">3</ref> presents detailed strategies along with their corresponding tables for three benchmark functions.  Each table details the various parameter settings and performance of MAS tested on the cosines (Tables <ref type="table">4</ref><ref type="table">5</ref><ref type="table">6</ref>), Michalewicz (Tables <ref type="table">7</ref><ref type="table">8</ref><ref type="table">9</ref>), and Eggholder functions (Tables <ref type="table">10</ref><ref type="table">11</ref><ref type="table">12</ref>). Columns &#969; PG and &#969; IG indicate the values of the complementary parameters of PG and IG in Eq. ( <ref type="formula">8</ref>)-as the weight of PG decreases from 1 to 0 in 0.1 steps, the weight of IG increases in the same increment. Columns Cost 1, Cost 2, and Cost 3 represent the cost c in Eq. ( <ref type="formula">8</ref>) for each of the three agents in these settings. The number of iterations taken by each agent during the search process is shown in the agent 1 Iter, 2 Iter, and 3 Iter columns. The column Global optimum? indicates whether the global optimum is reached, and if so, the number of iterations at which the global optimum was achieved is reported. In Figs.</p><p>11-13, we demonstrate a few examples of convergence speed of the cosines, Michalewicz, and Eggholder functions with a particular stopping criterion configuration, &#969; PG = 0.5 and &#969; IG = 0.5.</p><p>(1) Cosines function Strategy A. Agents have different costs, and the costs are different in each setting of the parameters, &#969; PG and &#969; IG . We determine the appropriate cost values (e.g., not too high or too low) using the average gain observed when an agent converges in the scenario without stopping criteria, as detailed in Sec. 5.2.1. For example, in the results of the cosines function (Fig. <ref type="figure">8</ref>), agents identify their local optima at the 3th, 13th, and 9th steps, respectively. At these steps, the corresponding cumulative gains G are 0.016, 7.35, and 0.461 (Fig. <ref type="figure">8(c</ref>)) under the conditions of &#969; PG = 0 and &#969; IG = 1. The average gains derived from these are 0.004, 0.525, and 0.461. It should be noted that varying the values of &#969; PG and &#969; IG can lead to different cumulative gains and, consequently, different cost settings, as presented in Table <ref type="table">4</ref>.</p><p>We summarize three key observations from Table <ref type="table">4</ref>. First, we pinpointed an optimal approach to setting cost values. By adopting the first cost-aware strategy, we achieved consistent convergence, and the convergence speed of the global optimum was stable around 11 steps in all parameter settings of &#969; PG and &#969; IG . Second, we observed that the agents did not stop sampling as expected in the scenario without a stopping criterion, although we set the cost as the average gain when the agents converged. For example, agents did not stop at the 3rd, 13th, and 9th iterations. This behavior links to agent 1 stopping early in the sampling process, which left agent 2 and agent 3 without getting essential information from agent 1's local region for the search of global minimum. As a result, these agents extended their sampling, as IG from their own local region increased and costs were relatively low. Third, by setting the cost according to the average gain upon agents' convergence and noting the cost's increase with the growing weight of IG, it is clear that IG takes precedence over PG.</p><p>Strategy B. Agents have the same cost, but the costs are different in each parameter setting, &#969; PG and &#969; IG . In Table <ref type="table">4</ref>, we assign varying costs to each agent. Taking the average of these costs, we set the same cost for all three agents, as shown in Table <ref type="table">5</ref>. In the initial three settings, agent 2 and agent 3 registered small PG, resulting in limited sampling due to higher cost and therefore negative utility scores. Consequently, they ended their search in their local region, hindering the convergence of MAS to find the global optimum. As the weight of IG increases, from the fourth setting, agent 2 and agent 3 can achieve a higher cumulative gain and, therefore, a positive utility score, and the search process can continue until the global optimum is found.</p><p>Strategy C. Agents have different costs, but the costs are the same in each setting of the parameters, &#969; PG and &#969; IG . In Table <ref type="table">6</ref>, we applied the same cost for each setting of &#969; PG and &#969; IG but kept different costs for each agent. These cost values were derived from the averages in Table <ref type="table">4</ref>. For instance, we used the average of cost 1 across all settings of &#969; PG and &#969; IG from Table <ref type="table">4</ref> for the value of cost 1 in Table <ref type="table">6</ref>. Therefore, we obtain the costs for each agent as 0.159, 0.288, and 0.288, respectively. With this cost-setting strategy, the system did not achieve convergence in the first five settings. However, it succeeded in the next five. This is due to the same reason observed in Table <ref type="table">5</ref> that, as the weight of IG increased, agent 2 and agent 3 obtained higher gains in their local regions. This led to positive utility scores; thus, the agents will not stop sampling before reaching the global optimum. 011703-10 / Vol. 147, JANUARY 2025 Transactions of the ASME</p><p>(2) Michalewicz function Strategy A. Agents have different costs, and the costs are different in each setting of the parameters, &#969; PG and &#969; IG . We followed a similar approach in cosines function to determine the cost, using the average gain when the agent reaches a local optimum, which is 6th, 18th, and 25th, as shown in Fig. <ref type="figure">9</ref>. With this cost-setting strategy, the MAS consistently reached its convergence at the 13th step for all values of &#969; PG and &#969; IG , as shown in Table <ref type="table">7</ref>. In this design space, we noticed sampling patterns and collaboration between agents similar to those in the cosine function. A significant observation from our research involved some agents stopping their sampling earlier than anticipated. For example, agent 1 stopped sampling around the 8th step. However, this early stopping did not hinder the system's performance. Interestingly, MAS converged faster, reaching its goal at the 13th step compared to the 18th step in the scenario without the stopping criterion. This indicates not only potential savings in sampling costs relative to strategies without a stopping criterion but also a more efficient route to convergence with fewer sampling iterations.</p><p>Strategy B. Agents have the same cost, but the costs are different in each setting of the parameters, PG and &#969; IG . To determine the costs, we utilized an approach in which each agent was assigned the same cost. This method appears beneficial, as the MAS consistently achieves convergence for every combination of &#969; PG and &#969; IG ,</p><p>Table 2 Cosines function</p><p>Note: Local design space domain and infeasible areas for three agents.   <ref type="formula">8</ref>) for each of the three agents. The number of iterations taken by each agent during the search process is shown in the agent 1 Iter, 2 Iter, and 3 Iter columns. The column Global optimum? indicates if the global optimum is reached, and if so, the number of iterations at which the global optimum was achieved is reported. The same notation applies to Tables <ref type="table">5</ref><ref type="table">6</ref><ref type="table">7</ref><ref type="table">8</ref><ref type="table">9</ref><ref type="table">10</ref><ref type="table">11</ref><ref type="table">12</ref>.  Table 9 Michalewicz function &#969; PG &#969; IG Cost 1 Cost 2 Cost 3 Agent 1 Iter Agent 2 Iter Agent 3 Iter Global optimum? 1 0 0.32 0.402 0.183 6 6 14 No 0.9 0.1 0.32 0.402 0.183 6 8 16 No 0.8 0.2 0.32 0.402 0.183 6 9 21 No 0.7 0.3 0.32 0.402 0.183 7 10 30 No 0.6 0.4 0.32 0.402 0.183 7 12 35 No 0.5 0.5 0.32 0.402 0.183 8 20 26 13 0.4 0.6 0.32 0.402 0.183 9 22 32 13 0.3 0.7 0.32 0.402 0.183 12 25 31 13 0.2 0.8 0.32 0.402 0.183 14 27 34 13 0.1 0.9 0.32 0.402 0.183 17 30 37 13 0 1 0.32 0.402 0.183 19 32 40 13 Note: Strategy C: Agents have different costs, but the costs are the same in each setting of the parameters, &#969; PG and &#969; IG .  <ref type="table">8</ref>. Given that the global optimum resides in area 2, agent 2 tends to accumulate substantial gains until it locates this optimum. Using the average cost in each setting as described in Table <ref type="table">7</ref>, the cost of agent 2 becomes relatively lower. Consequently, agent 2 consistently registers a positive utility, enabling it to perform more iterations and discover the global optimum.</p><p>Strategy C. Agents have different costs, but the costs are the same in each setting of the parameters, &#969; PG and &#969; IG . In this cost-setting strategy, the search performance of the MAS is similar to that of the cosines function, as shown in Table <ref type="table">9</ref>. Using this method, the system could not achieve convergence in the first five settings of &#969; PG and &#969; IG , but it succeeded in the following five. As we increased the weight of IG, they achieved better gains locally, leading to positive utility and increased iterations of sampling.</p><p>(3) Eggholder function Strategy A. Agents have different costs, and the costs are different in each setting of the parameters, &#969; PG and &#969; IG . In the experiment based on the Eggholder function, we set different costs for each agent based on their steps to reach their local optima: 30th, 40th, and 27th steps, respectively. The results based on the first cost setting are shown in Table <ref type="table">10</ref>. Compared to simple objective functions (cosines and Michalewicz functions), the agent stopping early would have a great impact on the convergence in a complex objective function. We can imagine that, for this complex design space, agents need to sample more for finding the optimal design compared to the simple case, and the MAS needs more information from local regions to better understand the entire design space. For example, in the first three settings with different &#969; PG and &#969; IG , the MAS is unable to achieve convergence. It is due to agent 2 stopping searching within the first ten steps. We can see from the cumulative gain without setting a stopping criterion (Fig. <ref type="figure">10(c</ref>)) that agent 2, although initially showing low gains, had the potential for greater gains as the process continued. This means that if we establish the average gain of the agent achieved Table 12 Eggholder function &#969; PG &#969; IG Cost 1 Cost 2 Cost 3 Agent 1 Iter Agent 2 Iter Agent 3 Iter Global optimum? 1 0 0.055 0.041 0.123 2 2 2 No 0.9 0.1 0.055 0.041 0.123 23 10 12 No 0.8 0.2 0.055 0.041 0.123 29 10 14 No 0.7 0.3 0.055 0.041 0.123 35 10 16 No 0.6 0.4 0.055 0.041 0.123 38 11 27 No 0.5 0.5 0.055 0.041 0.123 38 32 28 27 0.4 0.6 0.055 0.041 0.123 41 33 31 27 0.3 0.7 0.055 0.041 0.123 43 34 34 27 0.2 0.8 0.055 0.041 0.123 46 35 37 27 0.1 0.9 0.055 0.041 0.123 49 36 40 27 0 1 0.055 0.041 0.123 51 51 42 27 Note: Strategy C: Agents have different costs, but the costs are the same in each setting of the parameters, &#969; PG and &#969; IG .   011703-14 / Vol. 147, JANUARY 2025 Transactions of the ASME upon reaching the local optimum as the cost for each sample, the cost would be too high for agent 2 at the early stage of the sampling process. However, a turning point was observed in the fourth setting. Here, the system can identify the global optimum, mainly because all agents, especially agent 3, where the optimum is located, continue to sample until the stopping criterion is satisfied, and all agents can obtain the information from other regions consistently during this process. This phenomenon illustrates that information from all agents is crucial for identifying the global optimum in the MAS, which shows the importance of collaboration between agents for convergence in the complex objective function with many local optima. Strategy B. Agents have the same cost, but the costs are different in each setting of the parameters, &#969; PG and &#969; IG . According to Table <ref type="table">8</ref>, the second cost-setting strategy seems ineffective for the convergence of the system in the Eggholder function, as agents often fail to identify the global optimum. This outcome is mainly due to the inconsistent gains realized by different agents. When the same cost was applied to all agents, those with low gains and high sampling costs, resulting in positive utility, had a propensity to stop searching. This early stop adversely impacted the MAS's capability to converge to the global optimum. Such findings emphasize the crucial role of collaboration between agents in complex design space exploration.</p><p>Strategy C. Agents have different costs, but the costs are the same in each setting of the parameters, &#969; PG and &#969; IG . Referring to Table <ref type="table">12</ref>, with this cost-setting strategy, the system does not converge in the first five settings. However, it achieves convergence in the subsequent five settings. A side-by-side examination of Tables <ref type="table">11</ref> and <ref type="table">12</ref> indicates the important role of agent 2's information in system convergence. In Table <ref type="table">11</ref>, agent 2 stops its operation early, sampling no more than ten steps. This early termination of agent 2 impedes the MAS's convergence. On the contrary, in the last five settings of &#969; PG and &#969; IG in Table <ref type="table">12</ref>, the system achieves convergence due to agent 2 sampling until the later stages.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Discussion</head><p>In this article, our primary research question is: What impact would the cost-aware stopping criteria have on the collaboration between multiple agents in design space exploration? On the basis of our experimental results, we have identified several key observations that answer this RQ. In the following section, we will dive into these observations, providing insight and drawing takeaways from their practical scenarios.</p><p>(1) Our results confirm the findings of our previous study in Ref. <ref type="bibr">[12]</ref> that communication or information-sharing in MABO can greatly influence the individual agent's behavior and the team performance. We obtain these results with the assumption that the design task is divided into subtasks assigned to each member, and each design iteration incurs some cost. For instance, in our first test without stopping criteria, the cumulative gain looks like a step function (Fig. <ref type="figure">10(c)</ref>). If the cost was high, as the value was set based on the average gain, some agents could stop early because they could not achieve much improvement in the initial search, even if they could make large improvements later. However, stopping early affects other agents' search performance, as they could not continuously receive information from those who stopped. Although certain regions of the design space do not yield promising performance results at the beginning of the search, spending additional resources to further explore those regions could provide valuable insight that could help other members of the design team achieve the global optimum. As a solution, we can consider creating an additional parameter that serves as an initial bet (I) for all agents. For instance, the utility score, as defined in Eq. ( <ref type="formula">8</ref>), can be reformulated as U = (I + G) -Kc. This can prevent certain "promising" or "high-potential" agents, who may have modest gains in the beginning but are likely to see significant improvement with continued exploration, from stopping searching too early. Even if this may accompany the waste of additional costs on exploring agents' local regions, this approach could effectively guide the system toward convergence. Consequently, there exists a trade-off between convergence/optimum achievability and search cost incurred.</p><p>(2) Our results show that the impact of communication is more profound in problem with higher complexity. Specifically, the complexity of the objective function plays a crucial role on when agents stop searching. For relatively simple functions, early stops in a few local regions do not significantly influence other agents' performance. In particular, in the area where the global optimum is located, if the agent in that area does not stop early, it can continue the search of its local region even with less information acquired from other areas where agents stop early, thus successfully converging to the global optimum. However, when dealing with more complex objective functions, the contribution of each agent becomes indispensable to effectiveness and the possibility of achieving the global optimum.</p><p>As an illustration, in both the cosines and Michalewicz functions, it appears that agent 1's information is neglectable. When this agent stops sampling, not only is the convergence unaffected but also it enhances the convergence speed in certain cases. This can be observed with the cosines function, in which the convergence speed is improved from the 13th step to the 11th step, as presented in Table <ref type="table">4</ref>. Similarly, the convergence in the Michalewicz function is improved from the 18th step to the 13th, as shown in Table <ref type="table">7</ref>. In contrast, the case of the Eggholder function with many local optima demonstrates how vital every agent's information is for convergence. Specifically, if agent 2 stops within the first ten steps, the system fails to converge, as highlighted in Tables <ref type="table">10</ref> and <ref type="table">12</ref>. However, when agent 2 continues its search, sharing its information with other agents, the system can achieve convergence (see Table <ref type="table">12</ref>). These observations shed light on the formation of a design team. For a design team, communication mechanisms and incentive structures for solution search shall be designed and tailored according to the complexity of the problem to be solved.</p><p>(3) Our results also indicate the delicate balance between the value of design space exploration and cost of design iterations. In practical applications of design optimization, the concepts of "gain" and "cost" are pivotal. "Gain" typically refers to the benefits or improvements achieved through the design process. This could be in terms of performance, efficiency, utility, or any other metric of value in a specific context. For instance, in architectural design optimization, the "gain" might be maximized living space, energy efficiency, or aesthetic appeal. On the other hand, the "cost" in design optimization encapsulates the resources expended to achieve these gains. Beyond just monetary expenditures, the cost could represent the time taken, the manpower used, the environmental impact, or any other resource consumed in the process. In our earlier architectural example, the "cost" could be construction time, materials used, or even the environmental toll of sourcing those materials. Understanding and measuring gain and cost within the same unit or system is crucial. This ensures that we are making evaluations and decisions based on a consistent frame of reference. In real-world scenarios, this consistency assists stakeholders in making informed choices, prioritizing where to allocate resources, and understanding the trade-offs involved in pursuing specific design objectives.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Conclusions</head><p>This article develops a model centered around a cost-aware MAS to study the impact of search cost on collaboration between multiple rational agents in design space exploration. This process involves three key decisions: where to sample, where not to sample, and when to stop. We leveraged MABO to determine optimal solutions while enabling a global-local communication mechanism for information exchange among agents, thus accelerating the speed of finding the best solutions. Additionally, multi-agent reinforcement learning (MARL) allows agents to recognize and adapt to unfeasible regions autonomously during the sampling process. We also propose a cost-aware stopping criterion, constructing each agent's utility (also known as payoff) that incorporates PG and IG. The experimental findings reveal that cost significantly impacts the communication and performance of MAS in complex design problems more than in simpler ones. In complex scenarios, agents may exhibit low performance initially, leading to minimal gains and potentially terminating search efforts due to negative payoffs. These early terminations, driven by initial negative outcomes, can substantially affect the overall system performance. Consequently, for a design team, it becomes essential to design and customize incentive structures to find solutions according to the specifics of the design problem to be solved. This insight, derived from a theoretical model of how rational design agents should work with a rigorous mathematical foundation, serves as a novel benchmark. It can be used in empirical studies to quantitatively evaluate how humans actually make design decisions within a team setting.</p><p>Our findings are subject to the following limitations. First, we used the fixed number of agents in the MAS, which means that we did not test the impact of the scalability of the MAS on the cost-aware strategy. In future work, we plan to dive into studying how the size of an MAS affects cost strategies, with the objective of identifying how different team sizes can change the efficiency and financial planning of design projects. Second, in this article, we adopted the global evaluator in BO for information exchange, which means that the agents in the MAS are fully connected by the global evaluator. In our future work, we will explore the topology of communication and team structures within distributed MAS, trying to understand how various structures might influence system efficiency and reliability. A key goal will be to create a model for MAS that reflects the varied preferences of individual agents, such as different attitudes toward risk, to model heterogeneity within human design teams. Third, this article initiates by partitioning the design space, ensuring that agents operate in distinct regions and maintaining their independence. In our future work, another strategy could be adopted to set overlapping boundaries between agents, using probability to choose who can sample this overlapping region or comparing the improvement of two agents in the same overlapping region. Fourth, we conclude that incentive structures to find solutions should be customized according to the specifics of the design problem to be solved. However, we did not examine how incentive structures influence the performance of the MAS. In future work, we intend to examine how different initial funds or budgets impact the performance of our MAS, especially in systems that represent varied agent preferences and decisions. Finally, to ensure that the insights about effective team coordination obtained from this study are grounded in reality beyond theoretical simulations, more validation is needed. This involves extensive testing in real-world DSE scenarios, demonstrating the practical applicability and effectiveness of our approach in cost management in design teams. With the aim of examining the prescriptive feature of the proposed model, we plan to conduct human-subject experiments to collect human behavior data in collaborative DSE and compare their actual behaviors (i.e., when to stop and what the next point sampled) against those predicted by the MABO model. One potential value of such a comparison is a measure of design irrationality quantified by the distance between the empirical data and the simulation results.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Journal of Mechanical DesignJANUARY 2025, Vol. 147 / 011703-9</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1"><p>Journal of Mechanical DesignJANUARY 2025, Vol. 147 / 011703-15</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_2"><p>Journal of Mechanical DesignJANUARY 2025, Vol. 147 / 011703-17</p></note>
		</body>
		</text>
</TEI>
