skip to main content


Search for: All records

Creators/Authors contains: "Xu, Yifan"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. In this paper, we consider a setting inspired by spatial crowdsourcing platforms, where both workers and tasks arrive at different times, and each worker-task assignment yields a given reward. The key challenge is to address the uncertainty in the stochastic arrivals from both workers and the tasks. In this work, we consider a ubiquitous scenario where the arrival patterns of worker “types” and task “types” are not erratic but can be predicted from historical data. Specifically, we consider a finite time horizon T and assume that in each time-step the arrival of a worker and a task can be seen as an independent sample from two (different) distributions. Our model, called "Online Task Assignment with Two-Sided Arrival" (OTA-TSA), is a significant generalization of the classical online task-assignment problem when all the tasks are statically available. For the general case of OTA-TSA, we present an optimal non-adaptive algorithm (NADAP), which achieves a competitive ratio (CR) of at least 0.295. For a special case of OTA-TSA when the reward depends only on the worker type, we present two adaptive algorithms, which achieve CRs of at least 0.343 and 0.355, respectively. On the hardness side, we show that (1) no non-adaptive can achieve a CR larger than that of NADAP, establishing the optimality of NADAP among all non-adaptive algorithms; and (2) no (adaptive) algorithm can achieve a CR better than 0.581 (unconditionally) or 0.423 (conditionally on the benchmark linear program), respectively. All aforementioned negative results apply to even unweighted OTA-TSA when every assignment yields a uniform reward. At the heart of our analysis is a new technical tool, called "two-stage birth-death process", which is a refined notion of the classical birth-death process. We believe it may be of independent interest. Finally, we perform extensive numerical experiments on a real-world ride-share dataset collected in Chicago and a synthetic dataset, and results demonstrate the effectiveness of our proposed algorithms in practice. 
    more » « less
    Free, publicly-accessible full text available March 11, 2025
  2. Product catalogs, conceptually in the form of text-rich tables, are self-reported by individual retailers and thus inevitably contain noisy facts. Verifying such textual attributes in product catalogs is essential to improve their reliability. However, popular methods for processing free-text content, such as pre-trained language models, are not particularly effective on structured tabular data since they are typically trained on free-form natural language texts. In this paper, we present Tab-Cleaner, a model designed to handle error detection over text-rich tabular data following a pre-training / fine-tuning paradigm. We train Tab-Cleaner on a real-world Amazon Product Catalog table w.r.t millions of products and show improvements over state-of-the-art methods by 16% on PR AUC over attribute applicability classification task and by 11% on PR AUC over attribute value validation task. 
    more » « less
    Free, publicly-accessible full text available July 1, 2024
  3. Online matching markets (OMMs) are commonly used in today’s world to pair agents from two parties (whom we will call offline and online agents) for mutual benefit. However, studies have shown that the algorithms making decisions in these OMMs often leave disparities in matching rates, especially for offline agents. In this article, we propose online matching algorithms that optimize for either individual or group-level fairness among offline agents in OMMs. We present two linear-programming (LP) based sampling algorithms, which achieve competitive ratios at least 0.725 for individual fairness maximization and 0.719 for group fairness maximization. We derive further bounds based on fairness parameters, demonstrating conditions under which the competitive ratio can increase to 100%. There are two key ideas helping us break the barrier of 1-1/𝖾~ 63.2% for competitive ratio in online matching. One is boosting , which is to adaptively re-distribute all sampling probabilities among only the available neighbors for every arriving online agent. The other is attenuation , which aims to balance the matching probabilities among offline agents with different mass allocated by the benchmark LP. We conduct extensive numerical experiments and results show that our boosted version of sampling algorithms are not only conceptually easy to implement but also highly effective in practical instances of OMMs where fairness is a concern. 
    more » « less
  4. We consider online resource allocation under a typical non-profit setting, where limited or even scarce resources are administered by a not-for-profit organization like a government. We focus on the internal-equity by assuming that arriving requesters are homogeneous in terms of their external factors like demands but heterogeneous for their internal attributes like demographics. Specifically, we associate each arriving requester with one or several groups based on their demographics (i.e., race, gender, and age), and we aim to design an equitable distributing strategy such that every group of requesters can receive a fair share of resources proportional to a preset target ratio. We present two LP-based sampling algorithms and investigate them both theoretically (in terms of competitive-ratio analysis) and experimentally based on real COVID-19 vaccination data maintained by the Minnesota Department of Health. Both theoretical and numerical results show that our LP-based sampling strategies can effectively promote equity, especially when the arrival population is disproportionately represented, as observed in the early stage of the COVID-19 vaccine rollout. 
    more » « less