On-demand warehousing platforms match companies with underutilized warehouse and distribution capabilities with customers who need extra space or distribution services. These new business models have unique advantages, in terms of reduced capacity and commitment granularity, but also have different cost structures compared with traditional ways of obtaining distribution capabilities. This research is the first quantitative analysis to consider distribution network strategies given the advent of on-demand warehousing. Our multi-period facility location model – a mixed-integer linear program – simultaneously determines location-allocation decisions of three distribution center types (self-distribution, 3PL/lease, on-demand). A simulation model operationally evaluates the impact of the planned distribution strategy when various uncertainties can occur. Computational experiments for a company receiving products produced internationally to fulfil a set of regional customer demands illustrate that the power of on-demand warehousing is in creating hybrid network designs that more efficiently use self-distribution facilities through improved capacity utilization. However, the business case for on-demand warehousing is shown to be influenced by several factors, namely on-demand capacity availability, responsiveness requirements, and demand patterns. This work supports a firm’s use of on-demand warehousing if it has tight response requirements, for example for same-day delivery; however, if a firm has relaxed response requirements, then on-demand warehousing is only recommended if capacity availability of planned on-demand services is high. We also analyze capacity flexibility options leased by third-party logistics companies for a premium price and draw attention to the importance of them offering more granular solutions to stay competitive in the market.
more »
« less
Maximum Profit Facility Location and Dynamic Resource Allocation for Instant Delivery Logistics
Increasing e-commerce activity, competition for shorter delivery times, and innovations in transportation technologies have pushed the industry toward instant delivery logistics. This paper studies a facility location and online demand allocation problem applicable to a logistics company expanding to offer instant delivery service using unmanned aerial vehicles or drones. The problem is decomposed into two stages. During the planning stage, the facilities are located, and product and battery capacity are allocated. During the operational stage, customers place orders dynamically and real-time demand allocation decisions are made. The paper explores a multi-armed bandit framework for maximizing the cumulative reward realized by the logistics company subject to various capacity constraints and compares it with other strategies. The multi-armed bandit framework provides about 7% more rewards than the second-best strategy when tested on standard test instances. A case study based in Portland Metro Area showed that multi-armed bandits can outperform the second-best strategy by more than 20%.
more »
« less
- PAR ID:
- 10379476
- Publisher / Repository:
- SAGE Publications
- Date Published:
- Journal Name:
- Transportation Research Record: Journal of the Transportation Research Board
- Volume:
- 2676
- Issue:
- 7
- ISSN:
- 0361-1981
- Format(s):
- Medium: X Size: p. 697-710
- Size(s):
- p. 697-710
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper studies multi-stage systems with end-to-end bandit feedback. In such systems, each job needs to go through multiple stages, each managed by a different agent, before generating an outcome. Each agent can only control its own action and learn the final outcome of the job. It has neither knowledge nor control on actions taken by agents in the next stage. The goal of this paper is to develop distributed online learning algorithms that achieve sublinear regret in adversarial environments. The setting of this paper significantly expands the traditional multi-armed bandit problem, which considers only one agent and one stage. In addition to the exploration-exploitation dilemma in the traditional multi-armed bandit problem, we show that the consideration of multiple stages introduces a third component, education, where an agent needs to choose its actions to facilitate the learning of agents in the next stage. To solve this newly introduced exploration-exploitation-education trilemma, we propose a simple distributed online learning algorithm, ϵ-EXP3. We theoretically prove that the ϵ-EXP3 algorithm is a no-regret policy that achieves sublinear regret. Simulation results show that the ϵ-EXP3 algorithm significantly outperforms existing no-regret online learning algorithms for the traditional multi-armed bandit problem.more » « less
-
We propose and evaluate a learning-based framework to address multi-agent resource allocation in coupled wireless systems. In particular we consider, multiple agents (e.g., base stations, access points, etc.) that choose amongst a set of resource allocation options towards achieving their own performance objective /requirements, and where the performance observed at each agent is further coupled with the actions chosen by the other agents, e.g., through interference, channel leakage, etc. The challenge is to find the best collective action. To that end we propose a Multi-Armed Bandit (MAB) framework wherein the best actions (aka arms) are adaptively learned through online reward feedback. Our focus is on systems which are "weakly-coupled" wherein the best arm of each agent is invariant to others' arm selection the majority of the time - this majority structure enables one to develop light weight efficient algorithms. This structure is commonly found in many wireless settings such as channel selection and power control. We develop a bandit algorithm based on the Track-and-Stop strategy, which shows a logarithmic regret with respect to a genie. Finally through simulation, we exhibit the potential use of our model and algorithm in several wireless application scenarios.more » « less
-
This study investigates the problem of decentralized dynamic resource allocation optimization for ad-hoc network communication with the support of reconfigurable intelligent surfaces (RIS), leveraging a reinforcement learning framework. In the present context of cellular networks, device-to-device (D2D) communication stands out as a promising technique to enhance the spectrum efficiency. Simultaneously, RIS have gained considerable attention due to their ability to enhance the quality of dynamic wireless networks by maximizing the spectrum efficiency without increasing the power consumption. However, prevalent centralized D2D transmission schemes require global information, leading to a significant signaling overhead. Conversely, existing distributed schemes, while avoiding the need for global information, often demand frequent information exchange among D2D users, falling short of achieving global optimization. This paper introduces a framework comprising an outer loop and inner loop. In the outer loop, decentralized dynamic resource allocation optimization has been developed for self-organizing network communication aided by RIS. This is accomplished through the application of a multi-player multi-armed bandit approach, completing strategies for RIS and resource block selection. Notably, these strategies operate without requiring signal interaction during execution. Meanwhile, in the inner loop, the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm has been adopted for cooperative learning with neural networks (NNs) to obtain optimal transmit power control and RIS phase shift control for multiple users, with a specified RIS and resource block selection policy from the outer loop. Through the utilization of optimization theory, distributed optimal resource allocation can be attained as the outer and inner reinforcement learning algorithms converge over time. Finally, a series of numerical simulations are presented to validate and illustrate the effectiveness of the proposed scheme.more » « less
-
Feldman, Vitaly; Ligett, Katrina; Sabato, Sivan (Ed.)Many real-world problems like Social Influence Maximization face the dilemma of choosing the best $$K$$ out of $$N$$ options at a given time instant. This setup can be modeled as a combinatorial bandit which chooses $$K$$ out of $$N$$ arms at each time, with an aim to achieve an efficient trade-off between exploration and exploitation. This is the first work for combinatorial bandits where the feedback received can be a non-linear function of the chosen $$K$$ arms. The direct use of multi-armed bandit requires choosing among $$N$$-choose-$$K$$ options making the state space large. In this paper, we present a novel algorithm which is computationally efficient and the storage is linear in $$N$$. The proposed algorithm is a divide-and-conquer based strategy, that we call CMAB-SM. Further, the proposed algorithm achieves a \textit{regret bound} of $$\tilde O(K^{\frac{1}{2}}N^{\frac{1}{3}}T^{\frac{2}{3}})$ for a time horizon $$T$$, which is \textit{sub-linear} in all parameters $$T$$, $$N$$, and $$K$$.more » « less
An official website of the United States government
