skip to main content


This content will become publicly available on March 11, 2025

Title: Matching Tasks and Workers under Known Arrival Distributions: Online Task Assignment with Two-sided Arrivals
In this paper, we consider a setting inspired by spatial crowdsourcing platforms, where both workers and tasks arrive at different times, and each worker-task assignment yields a given reward. The key challenge is to address the uncertainty in the stochastic arrivals from both workers and the tasks. In this work, we consider a ubiquitous scenario where the arrival patterns of worker “types” and task “types” are not erratic but can be predicted from historical data. Specifically, we consider a finite time horizon T and assume that in each time-step the arrival of a worker and a task can be seen as an independent sample from two (different) distributions. Our model, called "Online Task Assignment with Two-Sided Arrival" (OTA-TSA), is a significant generalization of the classical online task-assignment problem when all the tasks are statically available. For the general case of OTA-TSA, we present an optimal non-adaptive algorithm (NADAP), which achieves a competitive ratio (CR) of at least 0.295. For a special case of OTA-TSA when the reward depends only on the worker type, we present two adaptive algorithms, which achieve CRs of at least 0.343 and 0.355, respectively. On the hardness side, we show that (1) no non-adaptive can achieve a CR larger than that of NADAP, establishing the optimality of NADAP among all non-adaptive algorithms; and (2) no (adaptive) algorithm can achieve a CR better than 0.581 (unconditionally) or 0.423 (conditionally on the benchmark linear program), respectively. All aforementioned negative results apply to even unweighted OTA-TSA when every assignment yields a uniform reward. At the heart of our analysis is a new technical tool, called "two-stage birth-death process", which is a refined notion of the classical birth-death process. We believe it may be of independent interest. Finally, we perform extensive numerical experiments on a real-world ride-share dataset collected in Chicago and a synthetic dataset, and results demonstrate the effectiveness of our proposed algorithms in practice.  more » « less
Award ID(s):
1918749
NSF-PAR ID:
10497768
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Association for Computing Machinery (ACM)
Date Published:
Journal Name:
ACM Transactions on Economics and Computation
ISSN:
2167-8375
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Efficient allocation of tasks to workers is a central problem in crowdsourcing. In this paper, we consider a special setting inspired by spatial crowdsourcing platforms where both workers and tasks arrive dynamically. Additionally, we assume all tasks are heterogeneous and each worker-task assignment brings a known reward. The natural challenge lies in how to incorporate the uncertainty in the arrivals from both workers and tasks into our online allocation policy such that the total expected reward is maximized. To formulate this, we assume the arrival patterns of worker “types” and task “types” can be predicted from historical data. Specifically, we consider a finite time horizon T and assume that in each time-step, a single worker and task are sampled (i.e., “arrive”) from two respective distributions independently, and that this sampling process repeats identically and independently for the entire T online time-steps. Our model, called Online Task Assignment with Two-Sided Arrival (OTA-TSA), is a significant generalization of the classical online task assignment where the set of tasks is assumed to be available offline. For the general version of OTA-TSA, we present an optimal non-adaptive algorithm which achieves an online competitive ratio of 0.295. For the special case of OTA-TSA where the reward is a function of just the worker type, we present an improved algorithm (which is adaptive) and achieves a competitive ratio of at least 0.343. On the hardness side, along with showing that the ratio obtained by our non-adaptive algorithm is the best possible among all non-adaptive algorithms, we further show that no (adaptive) algorithm can achieve a ratio better than 0.581 (unconditionally), even for the special case of OTA-TSA with homogenous tasks (i.e., all rewards are the same). At the heart of our analysis lies a new technical tool (which is a refined notion of the birth-death process), called the two-stage birth-death process, which may be of independent interest. Finally, we perform numerical experiments on two real-world datasets obtained from crowdsourcing platforms to complement our theoretical results. 
    more » « less
  2. null (Ed.)
    Modern manufacturing processes are in a state of flux, as they adapt to increasing demand for flexible and self-configuring production. This poses challenges for training workers to rapidly master new machine operations and processes, i.e. machine tasks. Conventional in-person training is effective but requires time and effort of experts for each worker trained and not scalable. Recorded tutorials, such as video-based or augmented reality (AR), permit more efficient scaling. However, unlike in-person tutoring, existing recorded tutorials lack the ability to adapt to workers’ diverse experiences and learning behaviors. We present AdapTutAR, an adaptive task tutoring system that enables experts to record machine task tutorials via embodied demonstration and train learners with different AR tutoring contents adapting to each user’s characteristics. The adaptation is achieved by continually monitoring learners’ tutorial-following status and adjusting the tutoring content on-the-fly and in-situ. The results of our user study evaluation have demonstrated that our adaptive system is more effective and preferable than the non-adaptive one. 
    more » « less
  3. null (Ed.)

    The performance of clustering depends on an appropriately defined similarity between two items. When the similarity is measured based on human perception, human workers are often employed to estimate a similarity score between items in order to support clustering, leading to a procedure called crowdsourced clustering. Assuming a monetary reward is paid to a worker for each similarity score and assuming the similarities between pairs and workers' reliability have a large diversity, when the budget is limited, it is critical to wisely assign pairs of items to different workers to optimize the clustering result. We model this budget allocation problem as a Markov decision process where item pairs are dynamically assigned to workers based on the historical similarity scores they provided. We propose an optimistic knowledge gradient policy where the assignment of items in each stage is based on the minimum-weight K-cut defined on a similarity graph. We provide simulation studies and real data analysis to demonstrate the performance of the proposed method.

     
    more » « less
  4. null (Ed.)
    We consider reinforcement learning (RL) in episodic MDPs with adversarial full-information reward feedback and unknown fixed transition kernels. We propose two model-free policy optimization algorithms, POWER and POWER++, and establish guarantees for their dynamic regret. Compared with the classical notion of static regret, dynamic regret is a stronger notion as it explicitly accounts for the non-stationarity of environments. The dynamic regret attained by the proposed algorithms interpolates between different regimes of non-stationarity, and moreover satisfies a notion of adaptive (near-)optimality, in the sense that it matches the (near-)optimal static regret under slow-changing environments. The dynamic regret bound features two components, one arising from exploration, which deals with the uncertainty of transition kernels, and the other arising from adaptation, which deals with non-stationary environments. Specifically, we show that POWER++ improves over POWER on the second component of the dynamic regret by actively adapting to non-stationarity through prediction. To the best of our knowledge, our work is the first dynamic regret analysis of model-free RL algorithms in non-stationary environments. 
    more » « less
  5. Mobile edge computing (MEC) is an emerging paradigm that integrates computing resources in wireless access networks to process computational tasks in close proximity to mobile users with low latency. In this paper, we propose an online double deep Q networks (DDQN) based learning scheme for task assignment in dynamic MEC networks, which enables multiple distributed edge nodes and a cloud data center to jointly process user tasks to achieve optimal long-term quality of service (QoS). The proposed scheme captures a wide range of dynamic network parameters including non-stationary node computing capabilities, network delay statistics, and task arrivals. It learns the optimal task assignment policy with no assumption on the knowledge of the underlying dynamics. In addition, the proposed algorithm accounts for both performance and complexity, and addresses the state and action space explosion problem in conventional Q learning. The evaluation results show that the proposed DDQN-based task assignment scheme significantly improves the QoS performance, compared to the existing schemes that do not consider the effects of network dynamics on the expected long-term rewards, while scaling reasonably well as the network size increases. 
    more » « less