Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Large language models (LLMs) demonstrate impressive reasoning abilities, but translating reasoning into actions in the real world remains challenging. In particular, it is unclear how to complete a given task provably within a minimum number of interactions with the external environment, e.g., through an internal mechanism of reasoning. To this end, we propose the first framework with provable regret guarantees to orchestrate reasoning and acting, which we call “reason for future, act for now” (RAFA). Specifically, we design a prompt template for reasoning that learns from the memory buffer and plans a future trajectory over a long horizon (“reason for future”). At each step, the LLM agent takes the initial action of the planned trajectory (“act for now”), stores the collected feedback in the memory buffer, and reinvokes the reasoning routine to replan the future trajectory from the new state. The key idea is to cast reasoning in LLMs as learning and planning in Bayesian adaptive Markov decision processes (MDPs). Correspondingly, we prompt LLMs with the memory buffer to estimate the unknown environment (learning) and generate an optimal trajectory for multiple future steps that maximize a value function (planning). The learning and planning subroutines are performed in an “incontext” manner to emulate the actor-critic update for MDPs. Our theoretical analysis establishes a √T regret, while our experimental validation demonstrates superior empirical performance. Here, T denotes the number of online interactions.more » « lessFree, publicly-accessible full text available October 29, 2025
-
As one of the most fundamental concepts in transportation science, Wardrop equilibrium (WE) has always had a relatively weak behavioral underpinning. To strengthen this foundation, one must reckon with bounded rationality in human decision-making processes, such as the lack of accurate information, limited computing power, and suboptimal choices. This retreat from behavioral perfectionism in the literature, however, was typically accompanied by a conceptual modification of WE. Here, we show that giving up perfect rationality need not force a departure from WE. On the contrary, WE can be reached with global stability in a routing game played by boundedly rational travelers. We achieve this result by developing a day-to-day (DTD) dynamical model that mimics how travelers gradually adjust their route valuations, hence choice probabilities, based on past experiences. Our model, called cumulative logit (CumLog), resembles the classical DTD models but makes a crucial change; whereas the classical models assume that routes are valued based on the cost averaged over historical data, our model values the routes based on the cost accumulated. To describe route choice behaviors, the CumLog model only uses two parameters, one accounting for the rate at which the future route cost is discounted in the valuation relative to the past ones and the other describing the sensitivity of route choice probabilities to valuation differences. We prove that the CumLog model always converges to WE, regardless of the initial point, as long as the behavioral parameters satisfy certain mild conditions. Our theory thus upholds WE’s role as a benchmark in transportation systems analysis. It also explains why equally good routes at equilibrium may be selected with different probabilities, which solves the instability problem posed by Harsanyi. Funding: This research is funded by the National Science Foundation [Grants CMMI #2225087 and ECCS #2048075]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/trsc.2023.0132 .more » « lessFree, publicly-accessible full text available July 19, 2025
-
The lack of a unique user equilibrium (UE) route flow in traffic assignment has posed a significant challenge to many transportation applications. The maximum-entropy principle, which advocates for the consistent selection of the most likely solution, is often used to address the challenge. Built on a recently proposed day-to-day discrete-time dynamical model called cumulative logit (CumLog), this study provides a new behavioral underpinning for the maximum-entropy user equilibrium (MEUE) route flow. It has been proven that CumLog can reach a UE state without presuming that travelers are perfectly rational. Here, we further establish that CumLog always converges to the MEUE route flow if (i) travelers have no prior information about routes and thus, are forced to give all routes an equal initial choice probability or if (ii) all travelers gather information from the same source such that the general proportionality condition is satisfied. Thus, CumLog may be used as a practical solution algorithm for the MEUE problem. To put this idea into practice, we propose to eliminate the route enumeration requirement of the original CumLog model through an iterative route discovery scheme. We also examine the discrete-time versions of four popular continuous-time dynamical models and compare them with CumLog. The analysis shows that the replicator dynamic is the only one that has the potential to reach the MEUE solution with some regularity. The analytical results are confirmed through numerical experiments. History: This paper has been accepted for the Transportation Science Special Issue on ISTTT25 Conference. Funding: This research was funded by the United States National Science Foundation’s Division of Civil, Mechanical and Manufacturing Innovation [Grant 2225087]. The work of J. Xie was funded by the National Natural Science Foundation of China [Grant 72371205]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/trsc.2024.0525 .more » « less
-
S. Koyejo; S. Mohamed; A. Agarwal; D. Belgrave; K. Cho; A. Oh (Ed.)