skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
Attention:The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 7:00 AM ET to 7:30 AM ET on Friday, April 24 due to maintenance. We apologize for the inconvenience.


Title: Developing Heuristics for Resource Allocation and Utilization in Systems Design: A Hierarchical Reinforcement Learning Approach
Abstract Systems design involves decomposing a system into interconnected subsystems and allocating resources to teams responsible for designing each subsystem. The outcomes of the process depend on how well limited resources are allocated to different teams, and the strategy each team uses to design the subsystems. This article presents an approach based on hierarchical reinforcement learning (RL) to generate heuristics for solving complex design problems under resource constraints. The approach consists of formulating systems design problems as hierarchical multiarmed bandit (MAB) problems, where decisions are made at both the system level (allocating budget across subsystems) and the subsystem level (selecting heuristics for sequential information acquisition). The approach is demonstrated using an illustrative example of a race car optimization in The Open Racing Car Simulator (TORCS) environment. The results indicate that the RL agent can learn to allocate resources strategically, prioritize the subsystems with the greatest influence on overall performance, and identify effective information acquisition heuristics for each subsystem. For example, the RL agent learned to allocate a larger portion of the budget to the gearbox subsystem, which has a higher-dimensional design space compared to other subsystems. The results also indicate that the extracted heuristics lead to convergence to high-performing car configurations with greater efficiency when compared to using Bayesian optimization for design.  more » « less
Award ID(s):
2129574 2129539
PAR ID:
10667618
Author(s) / Creator(s):
; ;
Publisher / Repository:
ASME
Date Published:
Journal Name:
Journal of Mechanical Design
Volume:
147
Issue:
6
ISSN:
1050-0472
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Heuristics are essential for addressing the complexities of engineering design processes. The goodness of heuristics is context-dependent. Appropriately tailored heuristics can enable designers to find good solutions efficiently, and inappropriate heuristics can result in cognitive biases and inferior design outcomes. While there have been several efforts at understanding which heuristics are used by designers, there is a lack of normative understanding about when different heuristics are suitable. Towards addressing this gap, this paper presents a reinforcement learning-based approach to evaluate the goodness of heuristics for three sub-problems commonly faced by designers while carrying out design under resource constraints: (i) learning the mapping between the design space and the performance space, (ii) sequential information acquisition in design, and (iii) decision to stop information acquisition. Using a multi-armed bandit formulation and simulation studies, we learn the heuristics that are suitable for these sub-problems under different resource constraints and problem complexities. The results of our simulation study indicate that the proposed reinforcement learning-based approach can be effective for determining the quality of heuristics for different sub-problems, and how the effectiveness of the heuristics changes as a function of the designer's preference (e.g., performance versus cost), the complexity of the problem, and the resources available. 
    more » « less
  2. Serverless Function-as-a-Service (FaaS) offers improved programmability for customers, yet it is not server-“less” and comes at the cost of more complex infrastructure management (e.g., resource provisioning and scheduling) for cloud providers. To maintain function service-level objectives (SLOs) and improve resource utilization efficiency, recent research has been focused on applying online learning algorithms such as reinforcement learning (RL) to manage resources. Compared to rule-based solutions with heuristics, RL-based approaches eliminate humans in the loop and avoid the painstaking generation of heuristics. Despite the initial success of applying RL, we first show in this paper that the state-of-the-art single-agent RL algorithm (S-RL) suffers up to 4.8x higher p99 function latency degradation on multi-tenant serverless FaaS platforms compared to isolated environments and is unable to converge during training. We then design and implement a scalable and incremental multi-agent RL framework based on Proximal Policy Optimization (SIMPPO). Our experiments on widely used serverless benchmarks demonstrate that in multi-tenant environments, SIMPPO enables each RL agent to efficiently converge during training and provides online function latency performance comparable to that of S-RL trained in isolation (which we refer to as the baseline for assessing RL performance) with minor degradation (<9.2%). In addition, SIMPPO reduces the p99 function latency by 4.5x compared to S-RL in multi-tenant cases. 
    more » « less
  3. Abstract Heuristics are essential for addressing the complexities of engineering design processes. The goodness of heuristics is context-dependent. Appropriately tailored heuristics can enable designers to find good solutions efficiently, and inappropriate heuristics can result in cognitive biases and inferior design outcomes. While there have been several efforts at understanding which heuristics are used by designers, there is a lack of normative understanding about when different heuristics are suitable. Towards addressing this gap, this paper presents a reinforcement learning-based approach to evaluate the goodness of heuristics for three sub-problems commonly faced by designers: (1) learning the map between the design space and the performance space, (2) acquiring sequential information, and (3) stopping the information acquisition process. Using a multi-armed bandit formulation and simulation studies, we learn the suitable heuristics for these individual sub-problems under different resource constraints and problem complexities. Additionally, we learn the optimal heuristics for the combined problem (i.e., the one composing all three sub-problems), and we compare them to ones learned at the sub-problem level. The results of our simulation study indicate that the proposed reinforcement learning-based approach can be effective for determining the quality of heuristics for different problems, and how the effectiveness of the heuristics changes as a function of the designer’s preference (e.g., performance versus cost), the complexity of the problem, and the resources available. 
    more » « less
  4. Serverless Function-As-A-Service (FaaS) is an emerging cloud computing paradigm that frees application developers from infrastructure management tasks such as resource provisioning and scaling. To reduce the tail latency of functions and improve resource utilization, recent research has been focused on applying online learning algorithms such as reinforcement learning (RL) to manage resources. Compared to existing heuristics-based resource management approaches, RL-based approaches eliminate humans in the loop and avoid the painstaking generation of heuristics. In this paper, we show that the state-of-The-Art single-Agent RL algorithm (S-RL) suffers up to 4.6x higher function tail latency degradation on multi-Tenant serverless FaaS platforms and is unable to converge during training. We then propose and implement a customized multi-Agent RL algorithm based on Proximal Policy Optimization, i.e., multi-Agent PPO (MA-PPO). We show that in multi-Tenant environments, MA-PPO enables each agent to be trained until convergence and provides online performance comparable to S-RL in single-Tenant cases with less than 10% degradation. Besides, MA-PPO provides a 4.4x improvement in S-RL performance (in terms of function tail latency) in multi-Tenant cases. 
    more » « less
  5. Abstract Engineering design relies heavily on heuristics, yet there is a lack of systematic methods for identifying and validating design heuristics. This paper introduces a computational approach to representing engineering design problems that involve decomposition and assignment decisions, facilitating systematic extraction of generalizable heuristics. We model design processes using a Markov Decision Process (MDP) framework, characterizing problems through attributes of the problem space, solver capabilities, and trade-offs embedded within preference functions. Reinforcement learning methods are employed to learn optimal policies, from which we extract inclusionary and exclusionary heuristics using Gaussian Mixture Models. The effectiveness of the approach is demonstrated through two case studies: solver-aware system architecting (SASA) for a robotic arm design and sequential information acquisition in parametric design optimization. The results highlight the context-dependent nature of learned heuristics, demonstrating how problem complexity, designer preferences, and solver characteristics influence their selection. 
    more » « less