skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Budget-Constrained Traveling Salesman Problem: a Cooperative Multi-Agent Reinforcement Learning Approach
We study a new variation of the Traveling Salesman Problem (TSP) called the Budget-Constrained Traveling Salesman Problem (BC-TSP). BC-TSP is inspired by a few emerging network applications, such as robotic sensor networks. We design a prize-driven multi-agent reinforcement learning (MARL) framework to solve the BC-TSP. The main novelty of the framework, named P-MARL, is that it makes a connection between the prize maximization in BC-TSP and the cumulative reward maximization in reinforcement learning (RL) to design a more efficient MARL algorithm. In particular, P-MARL integrates the prizes available at nodes into the reward model of the MARL to guide the cooperative effort of multiple learning agents. Via extensive simulations using synthetic data of state capital cities of the U.S., we show that a) the P-MARL outperforms the existing prize-oblivious MARL work by collecting 28.8 % of more prizes under the same budget constraints, b) it takes two orders of magnitudes of shorter training time than the state-of-the-art deep reinforcement learning-based approach while collecting 45.3 % more prizes under the same budgets, and c) P-MARL collects prizes at least 91.9% of optimal obtained by the Integer Linear Programming (ILP) under different network parameters.  more » « less
Award ID(s):
2240517
PAR ID:
10646723
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
IEEE
Date Published:
Page Range / eLocation ID:
1 to 9
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We focus on robotic sensor networks (RSNs), wherein mobile data collectors or robots are dispatched into the sensor field to collect data from the sensor nodes, and study a new algorithmic problem called battery-constrained data collection in RSNs (BC-DCR). Given an RSN of sensor nodes with varying numbers of sensory data packets to be collected and a robot with limited battery power, the goal of the BC-DCR is to dispatch the robot into the sensor field to collect the maximum number of data packets before it runs out of battery power and returns to the depot for recharging. Although extensive research has been conducted to achieve various performance objectives of data collection in RSNs, not much work has focused on the robot’s limited battery power. It is critical to consider the robot’s limited battery power to optimize the data-collecting performance of a large-scale RSN. We show that at the core of the BC-DCR is a new variation of the classic traveling salesman problem called the Budget-Constrained Traveling Salesman Problem (BC-TSP), which has not been adequately solved. We design an Integer Linear Programming (ILP)–based optimal algorithm and a time- efficient iterative greedy algorithm to solve the BC-TSP. Via extensive simulations using real measurements of battery power and mobility models of robots, we show that a) our algorithms outperform the existing work by collecting 29.1% more packets with the same battery power of the robots and b) our BC-TSP- based approach achieves 32.02% more network lifetime of the RSN compared to the existing approach. 
    more » « less
  2. Facet-defining inequalities of the symmetric traveling salesman problem (TSP) polytope play a prominent role in both polyhedral TSP research and state-of-the-art TSP solvers. In this paper, we introduce a new class of facet-defining inequalities, the circlet inequalities. These inequalities were first conjectured in Gutekunst and Williamson [Gutekunst SC, Williamson DP (2019) Characterizing the integrality gap of the subtour LP for the circulant traveling salesman problem. SIAM J. Discrete Math. 33(4):2452–2478] when studying the circulant TSP, and they provide a bridge between polyhedral TSP research and number-theoretic investigations of Hamiltonian cycles stemming from a conjecture from Marco Buratti in 2007. The circlet inequalities exhibit circulant symmetry by placing the same weight on all edges of a given length; our main proof exploits this symmetry to prove the validity of the circlet inequalities. We then show that the circlet inequalities are facet-defining and compute their strength following Goemans [Goemans MX (1995) Worst-case comparison of valid inequalities for the TSP. Math. Programming 69:335–349]; they achieve the same worst case strength as the similarly circulant crown inequalities of Naddef and Rinaldi [Naddef D, Rinaldi G (1992) The crown inequalities for the symmetric traveling salesman polytope. Math. Oper. Res. 17(2):308–326] but are generally stronger. Funding: This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program [Grant DGE-1650441] and by the National Science Foundation Division of Computing and Communications Foundations [Grant CCF-1908517]. 
    more » « less
  3. Abstract Prize-Collecting TSP is a variant of the traveling salesperson problem where one may drop vertices from the tour at the cost of vertex-dependent penalties. The quality of a solution is then measured by adding the length of the tour and the sum of all penalties of vertices that are not visited. We present a polynomial-time approximation algorithm with an approximation guarantee slightly below 1.6, where the guarantee is with respect to the natural linear programming relaxation of the problem. This improves upon the previous best-known approximation ratio of 1.774. Our approach is based on a known decomposition for solutions of this linear relaxation into rooted trees. Our algorithm takes a tree from this decomposition and then performs a pruning step before doing parity correction on the remainder. Using a simple analysis, we bound the approximation guarantee of the proposed algorithm by$$(1+\sqrt{5})\big /2 \approx 1.618$$ ( 1 + 5 ) / 2 1.618 , the golden ratio. With some additional technical care we further improve the approximation guarantee to 1.599. Furthermore, we show that for the path version of Prize-Collecting TSP (known as Prize-Collecting Stroll) our approach yields an approximation guarantee of 1.6662, improving upon the previous best-known guarantee of 1.926. 
    more » « less
  4. Abstract This paper proposed a collaborative neurodynamic optimization (CNO) method to solve traveling salesman problem (TSP). First, we construct a Hopfield neural network (HNN) with $$n \times n$$ n × n neurons for the n cities. Second, to ensure the convergence of continuous HNN (CHNN), we reformulate TSP to satisfy the convergence condition of CHNN and solve TSP by CHNN. Finally, a population of CHNNs is used to search for local optimal solutions of TSP and the globally optimal solution is obtained using particle swarm optimization. Experimental results show the effectiveness of the CNO approach for solving TSP. 
    more » « less
  5. Due to information asymmetry, finding optimal policies for Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) is hard with the complexity growing doubly exponentially in the horizon length. The challenge increases greatly in the multi-agent reinforcement learning (MARL) setting where the transition probabilities, observation kernel, and reward function are unknown. Here, we develop a general compression framework with approximate common and private state representations, based on which decentralized policies can be constructed. We derive the optimality gap of executing dynamic programming (DP) with the approximate states in terms of the approximation error parameters and the remaining time steps. When the compression is exact (no error), the resulting DP is equivalent to the one in existing work. Our general framework generalizes a number of methods proposed in the literature. The results shed light on designing practically useful deep-MARL network structures under the "centralized learning distributed execution" scheme. 
    more » « less