This paper introduces a new graph neural network architecture for learning solutions of Capacitated Vehicle Routing Problems (CVRP) as policies over graphs. CVRP serves as an important benchmark for a wide range of combinatorial planning problems, which can be adapted to manufacturing, robotics and fleet planning applications. Here, the specific aim is to demonstrate the significant real-time executability and (beyond training) scalability advantages of the new graph learning approach over existing solution methods. While partly drawing motivation from recent graph learning methods that learn to solve CO problems such as multi-Traveling Salesman Problem (mTSP) and VRP, the proposed neural architecture presents a novel encoder-decoder architecture. Here the encoder is based on Capsule networks, which enables better representation of local and global information with permutation invariant node embeddings; and the decoder is based on the Multi-head attention (MHA) mechanism allowing sequential decisions. This architecture is trained using a policy gradient Reinforcement Learning process. The performance of our approach is favorably compared with state-of-the-art learning and non-learning methods for a benchmark suite of Capacitated-VRP (CVRP) problems. A further study on the CVRP with demand uncertainties is conducted to explore how this Capsule-Attention Mechanism architecture can be extended to handle real-world uncertainties by embedding them through the encoder.
This content will become publicly available on December 10, 2024
- Award ID(s):
- 2214177
- PAR ID:
- 10534446
- Publisher / Repository:
- Advances in neural information processing systems (NeurIPS) 2023
- Date Published:
- ISSN:
- 1049-5258
- Format(s):
- Medium: X
- Location:
- New Orleans, LA
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
Abstract Objective. A major challenge in designing closed-loop brain-computer interfaces is finding optimal stimulation patterns as a function of ongoing neural activity for different subjects and different objectives. Traditional approaches, such as those currently used for deep brain stimulation, have largely followed a manual trial-and-error strategy to search for effective open-loop stimulation parameters, a strategy that is inefficient and does not generalize to closed-loop activity-dependent stimulation.Approach. To achieve goal-directed closed-loop neurostimulation, we propose the use of brain co-processors, devices which exploit artificial intelligence to shape neural activity and bridge injured neural circuits for targeted repair and restoration of function. Here we investigate a specific type of co-processor called a ‘neural co-processor’ which uses artificial neural networks and deep learning to learn optimal closed-loop stimulation policies. The co-processor adapts the stimulation policy as the biological circuit itself adapts to the stimulation, achieving a form of brain-device co-adaptation. Here we use simulations to lay the groundwork for futurein vivo tests of neural co-processors. We leverage a previously published cortical model of grasping, to which we applied various forms of simulated lesions. We used our simulations to develop the critical learning algorithms and study adaptations to non-stationarity in preparation for futurein vivo tests.Main results. Our simulations show the ability of a neural co-processor to learn a stimulation policy using a supervised learning approach, and to adapt that policy as the underlying brain and sensors change. Our co-processor successfully co-adapted with the simulated brain to accomplish the reach-and-grasp task after a variety of lesions were applied, achieving recovery towards healthy function in the range 75%–90%.Significance. Our results provide the first proof-of-concept demonstration, using computer simulations, of a neural co-processor for adaptive activity-dependent closed-loop neurostimulation for optimizing a rehabilitation goal after injury. While a significant gap remains between simulations andin vivo applications, our results provide insights on how such co-processors may eventually be developed for learning complex adaptive stimulation policies for a variety of neural rehabilitation and neuroprosthetic applications. -
As more and more people are expected to work with complex AI-systems, it becomes more important than ever that such systems provide intuitive explanations for their decisions. A prerequisite for holding such explanatory dialogue is the ability of the systems to present their proposed decisions to the user in an easy-to-understand form. Unfortunately, such dialogues could become hard to facilitate in real-world problems where the system may be planning for multiple eventualities in stochastic environments. This means for the system to be effective, it needs to be able to present the policy at a high-level of abstraction and delve into details as required. Towards this end, we investigate the utility of temporal abstractions derived through analytically computed landmarks and their relative ordering to build a summarization of policies for Stochastic Shortest Path Problems. We formalize the concept of policy landmarks and show how it can be used to provide a high level overview of a given policy. Additionally, we establish the connections between the type of hierarchy we generate and previous works in temporal abstractions, specifically MaxQ hierarchies. Our approach is evaluated through user studies as well as empirical metrics that establish that people tend to choose landmarks facts as subgoals to summarize policies and demonstrates the performance of our approach on standard benchmarks.more » « less
-
J. Christopher Beck, Olivier Buffet (Ed.)As more and more people are expected to work with complex AI-systems, it becomes more important than ever that such systems provide intuitive explanations for their decisions. A prerequisite for holding such explanatory dialogue is the ability of the systems to present their proposed decisions to the user in an easy-to-understand form. Unfortunately, such dialogues could become hard to facilitate in real-world problems where the system may be planning for multiple eventualities in stochastic environments. This means for the system to be effective, it needs to be able to present the policy at a high-level of abstraction and delve into details as required. Towards this end, we investigate the utility of temporal abstractions derived through analytically computed landmarks and their relative ordering to build a summarization of policies for Stochastic Shortest Path Problems. We formalize the concept of policy landmarks and show how it can be used to provide a high level overview of a given policy. Additionally, we establish the connections between the type of hierarchy we generate and previous works in temporal abstractions, specifically MaxQ hierarchies. Our approach is evaluated through user studies as well as empirical metrics that establish that people tend to choose landmarks facts as subgoals to summarize policies and demonstrates the performance of our approach on standard benchmarks.more » « less
-
Abstract The United States is experiencing growing impacts of climate change but currently receives a limited policy response from its national leadership. Within this policy void, many state governments are stepping up and taking action on adaptation planning. Yet we know little about why some states adopt State Adaptation Plans (SAPs), while others do not. This article investigates factors that predict the emergence of SAPs, both in terms of policy adoption and policy intensity (goal ambitiousness). Applying the diffusion of innovation theory, I consider the relative influence of internal state characteristics, regional pressures, and test for conditional effects between government ideologies and severity of the problem. The results show interesting differences between predictors that influence policy adoption and ambitiousness. States are more motivated to adopt a policy when faced with greater climate vulnerability, have more liberal citizenry, and where governments have crossed policy hurdles by previously passing mitigation plans. The intensity of policies and goal setting, moreover, is more likely to be driven by interest group politics and diffuse through policy learning or sharing information among neighboring states in Environmental Protection Agency regions. These findings support an emerging scholarship that uses more complex dependent variables in policy analysis. These variables have the potential to differentiate symbolic from substantive policies and capture finer information about predictors of importance.