skip to main content


Title: Assigning Tasks to Workers based on Historical Data: Online Task Assignment with Two-sided Arrivals
Efficient allocation of tasks to workers is a central problem in crowdsourcing. In this paper, we consider a special setting inspired by spatial crowdsourcing platforms where both workers and tasks arrive dynamically. Additionally, we assume all tasks are heterogeneous and each worker-task assignment brings a known reward. The natural challenge lies in how to incorporate the uncertainty in the arrivals from both workers and tasks into our online allocation policy such that the total expected reward is maximized. To formulate this, we assume the arrival patterns of worker “types” and task “types” can be predicted from historical data. Specifically, we consider a finite time horizon T and assume that in each time-step, a single worker and task are sampled (i.e., “arrive”) from two respective distributions independently, and that this sampling process repeats identically and independently for the entire T online time-steps. Our model, called Online Task Assignment with Two-Sided Arrival (OTA-TSA), is a significant generalization of the classical online task assignment where the set of tasks is assumed to be available offline. For the general version of OTA-TSA, we present an optimal non-adaptive algorithm which achieves an online competitive ratio of 0.295. For the special case of OTA-TSA where the reward is a function of just the worker type, we present an improved algorithm (which is adaptive) and achieves a competitive ratio of at least 0.343. On the hardness side, along with showing that the ratio obtained by our non-adaptive algorithm is the best possible among all non-adaptive algorithms, we further show that no (adaptive) algorithm can achieve a ratio better than 0.581 (unconditionally), even for the special case of OTA-TSA with homogenous tasks (i.e., all rewards are the same). At the heart of our analysis lies a new technical tool (which is a refined notion of the birth-death process), called the two-stage birth-death process, which may be of independent interest. Finally, we perform numerical experiments on two real-world datasets obtained from crowdsourcing platforms to complement our theoretical results.  more » « less
Award ID(s):
1749864
NSF-PAR ID:
10065495
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
International Conference on Autonomous Agents and Multiagent Systems (AAMAS)
Page Range / eLocation ID:
318-326
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, we consider a setting inspired by spatial crowdsourcing platforms, where both workers and tasks arrive at different times, and each worker-task assignment yields a given reward. The key challenge is to address the uncertainty in the stochastic arrivals from both workers and the tasks. In this work, we consider a ubiquitous scenario where the arrival patterns of worker “types” and task “types” are not erratic but can be predicted from historical data. Specifically, we consider a finite time horizon T and assume that in each time-step the arrival of a worker and a task can be seen as an independent sample from two (different) distributions. Our model, called "Online Task Assignment with Two-Sided Arrival" (OTA-TSA), is a significant generalization of the classical online task-assignment problem when all the tasks are statically available. For the general case of OTA-TSA, we present an optimal non-adaptive algorithm (NADAP), which achieves a competitive ratio (CR) of at least 0.295. For a special case of OTA-TSA when the reward depends only on the worker type, we present two adaptive algorithms, which achieve CRs of at least 0.343 and 0.355, respectively. On the hardness side, we show that (1) no non-adaptive can achieve a CR larger than that of NADAP, establishing the optimality of NADAP among all non-adaptive algorithms; and (2) no (adaptive) algorithm can achieve a CR better than 0.581 (unconditionally) or 0.423 (conditionally on the benchmark linear program), respectively. All aforementioned negative results apply to even unweighted OTA-TSA when every assignment yields a uniform reward. At the heart of our analysis is a new technical tool, called "two-stage birth-death process", which is a refined notion of the classical birth-death process. We believe it may be of independent interest. Finally, we perform extensive numerical experiments on a real-world ride-share dataset collected in Chicago and a synthetic dataset, and results demonstrate the effectiveness of our proposed algorithms in practice. 
    more » « less
  2. We study the problem of distributed task allocation by workers in an ant colony in a setting of limited capabilities and noisy environment feedback. We assume that each task has a demand that should be satisfied but not exceeded, i.e., there is an optimal number of ants that should be working on this task at a given time. The goal is to assign a near-optimal number of workers to each task in a distributed manner without explicit access to the value of the demand nor to the number of ants working on the task. We seek to answer the question of how the quality of task allocation depends on the accuracy of assessing by the ants whether too many (overload) or not enough (lack) ants are currently working on a given task. In our model, each ant receives a binary feedback that depends on the deficit, defined as the difference between the demand and the current number of workers in the task. The feedback is modeled as a random variable that takes values lack or overload with probability given by a sigmoid function of the deficit. The higher the overload or lack of workers for a task, the more likely it is that an ant receives the correct feedback from this task; the closer the deficit is to zero, the less reliable the feedback becomes. Each ant receives the feedback independently about one chosen task. We measure the performance of task allocation algorithms using the notion of inaccuracy, defined as the number of steps in which the deficit of some task is beyond certain threshold. We propose a simple, constant-memory, self-stabilizing, distributed algorithm that converges from any initial assignment to a near-optimal assignment under noisy feedback and keeps the deficit small for all tasks in almost every step. We also prove a lower bound for any constant-memory algorithm, which matches, up to a constant factor, the accuracy achieved by our algorithm. 
    more » « less
  3. In this work, we study the online multidimensional knapsack problem (called OMdKP) in which there is a knapsack whose capacity is represented in m dimensions, each dimension could have a different capacity. Then, n items with different scalar profit values and m-dimensional weights arrive in an online manner and the goal is to admit or decline items upon their arrival such that the total profit obtained by admitted items is maximized and the capacity of knapsack across all dimensions is respected. This is a natural generalization of the classic single-dimension knapsack problem with several relevant applications such as in virtual machine allocation, job scheduling, and all-or-nothing flow maximization over a graph. We develop an online algorithm for OMdKP that uses an exponential reservation function to make online admission decisions. Our competitive analysis shows that the proposed online algorithm achieves the competitive ratio of O(log (Θ α)), where α is the ratio between the aggregate knapsack capacity and the minimum capacity over a single dimension and θ is the ratio between the maximum and minimum item unit values. We also show that the competitive ratio of our algorithm with exponential reservation function matches the lower bound up to a constant factor. 
    more » « less
  4. In this paper, we study the online multidimensional knapsack problem (called OMdKP) in which there is a knapsack whose capacity is represented in m dimensions, each dimension could have a different capacity. Then, n items with different scalar profit values and m-dimensional weights arrive in an online manner and the goal is to admit or decline items upon their arrival such that the total profit obtained by admitted items is maximized and the capacity of knapsack across all dimensions is respected. This is a natural generalization of the classic single-dimension knapsack problem and finds several relevant applications such as in virtual machine allocation, job scheduling, and all-or-nothing flow maximization over a graph. We develop two algorithms for OMdKP that use linear and exponential reservation functions to make online admission decisions. Our competitive analysis shows that the linear and exponential algorithms achieve the competitive ratios of O(θα ) and O(łogł(θα)), respectively, where α is the ratio between the aggregate knapsack capacity and the minimum capacity over a single dimension and θ is the ratio between the maximum and minimum item unit values. We also characterize a lower bound for the competitive ratio of any online algorithm solving OMdKP and show that the competitive ratio of our algorithm with exponential reservation function matches the lower bound up to a constant factor. 
    more » « less
  5. Abstract

    Allocation strategies improve the efficiency of crowdsourcing by decreasing the work needed to complete individual tasks accurately. However, these algorithms introduce bias by preferentially allocating workers onto easy tasks, leading to sets of completed tasks that are no longer representative of all tasks. This bias challenges inference of problem-wide properties such as typical task difficulty or crowd properties such as worker completion times, important information that goes beyond the crowd responses themselves. Here we study inference about problem properties when using an allocation algorithm to improve crowd efficiency. We introduce Decision-Explicit Probability Sampling (DEPS), a novel method to perform inference of problem properties while accounting for the potential bias introduced by an allocation strategy. Experiments on real and synthetic crowdsourcing data show that DEPS outperforms baseline inference methods while still leveraging the efficiency gains of the allocation method. The ability to perform accurate inference of general properties when using non-representative data allows crowdsourcers to extract more knowledge out of a given crowdsourced dataset.

     
    more » « less