skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on June 17, 2026

Title: A Cost-Effective LLM-based Approach to Identify Wildlife Trafficking in Online Marketplaces
Wildlife trafficking remains a critical global issue, significantly impacting biodiversity, ecological stability, and public health. Despite efforts to combat this illicit trade, the rise of e-commerce platforms has made it easier to sell wildlife products, putting new pressure on wild populations of endangered and threatened species. The use of these platforms also opens a new opportunity: as criminals sell wildlife products online, they leave digital traces of their activity that can provide insights into trafficking activities as well as how they can be disrupted. The challenge lies in finding these traces. Online marketplaces publish ads for a plethora of products, and identifying ads for wildlife-related products is like finding a needle in a haystack. Learning classifiers can automate ad identification, but creating them requires costly, time-consuming data labeling that hinders support for diverse ads and research questions. This paper addresses a critical challenge in the data science pipeline for wildlife trafficking analytics: generating quality labeled data for classifiers that select relevant data. While large language models (LLMs) can directly label advertisements, doing so at scale is prohibitively expensive. We propose a cost-effective strategy that leverages LLMs to generate pseudo labels for a small sample of the data and uses these labels to create specialized classification models. Our novel method automatically gathers diverse and representative samples to be labeled while minimizing the labeling costs. Our experimental evaluation shows that our classifiers achieve up to 95% F1 score, outperforming LLMs at a lower cost. We present real use cases that demonstrate the effectiveness of our approach in enabling analyses of different aspects of wildlife trafficking.  more » « less
Award ID(s):
2146306 2106888
PAR ID:
10653924
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
Proceedings of the ACM on Management of Data
Volume:
3
Issue:
3
ISSN:
2836-6573
Page Range / eLocation ID:
1 to 23
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract We have more data about wildlife trafficking than ever before, but it remains underutilized for decision-making. Central to effective wildlife trafficking interventions is collection, aggregation, and analysis of data across a range of source, transit, and destination geographies. Many data are geospatial, but these data cannot be effectively accessed or aggregated without appropriate geospatial data standards. Our goal was to create geospatial data standards to help advance efforts to combat wildlife trafficking. We achieved our goal using voluntary, participatory, and engagement-based workshops with diverse and multisectoral stakeholders, online portals, and electronic communication with more than 100 participants on three continents. The standards support data-to-decision efforts in the field, for example indictments of key figures within wildlife trafficking, and disruption of their networks. Geospatial data standards help enable broader utilization of wildlife trafficking data across disciplines and sectors, accelerate aggregation and analysis of data across space and time, advance evidence-based decision making, and reduce wildlife trafficking. 
    more » « less
  2. Classification of clinical alarms is at the heart of prioritization, suppression, integration, postponement, and other methods of mitigating alarm fatigue. Since these methods directly affect clinical care, alarm classifiers, such as intelligent suppression systems, need to be evaluated in terms of their sensitivity and specificity, which is typically calculated on a labeled dataset of alarms. Unfortunately, the collection and particularly labeling of such datasets requires substantial effort and time, thus deterring hospitals from investigating mitigations of alarm fatigue. This article develops a lightweight method for evaluating alarm classifiers without perfect alarm labels. The method relies on probabilistic labels obtained from data programming—a labeling paradigm based on combining noisy and cheap-to-obtain labeling heuristics. Based on these labels, the method produces confidence bounds for the sensitivity/specificity values from a hypothetical evaluation with manual labeling. Our experiments on five alarm datasets collected at Children’s Hospital of Philadelphia show that the proposed method provides accurate bounds on the classifier’s sensitivity/specificity, appropriately reflecting the uncertainty from noisy labeling and limited sample sizes. 
    more » « less
  3. Targeted advertising remains an important part of the free web browsing experience, where advertisers' targeting and personalization algorithms together find the most relevant audience for millions of ads every day. However, given the wide use of advertising, this also enables using ads as a vehicle for problematic content, such as scams or clickbait. Recent work that explores people's sentiments toward online ads, and the impacts of these ads on people's online experiences, has found evidence that online ads can indeed be problematic. Further, there is the potential for personalization to aid the delivery of such ads, even when the advertiser targets with low specificity. In this paper, we study Facebook--one of the internet's largest ad platforms--and investigate key gaps in our understanding of problematic online advertising: (a) What categories of ads do people find problematic? (b) Are there disparities in the distribution of problematic ads to viewers? and if so, (c) Who is responsible--advertisers or advertising platforms? To answer these questions, we empirically measure a diverse sample of user experiences with Facebook ads via a 3-month longitudinal panel. We categorize over 32,000 ads collected from this panel (n = 132); and survey participants' sentiments toward their own ads to identify four categories of problematic ads. Statistically modeling the distribution of problematic ads across demographics, we find that older people and minority groups are especially likely to be shown such ads. Further, given that 22% of problematic ads had no specific targeting from advertisers, we infer that ad delivery algorithms (advertising platforms themselves) played a significant role in the biased distribution of these ads. 
    more » « less
  4. Online sex advertisements (sex ads) have been linked to many U.S. sex trafficking cases. However, since the closure of the dominant website, Backpage.com (Backpage), many competing sites have emerged that are hosted in countries where U.S. law enforcement organizations have no jurisdiction. Although the online ecosystem has changed significantly, very little research uses data from sites other than Backpage, and even less uses data from multiple sites. This paper presents an anonymized dataset derived from the text and image artifacts of more than 10 million sex ads. By making this dataset publicly available, we aim to reduce barriers to entry for researchers interested in conducting data-driven counter-trafficking research. The dataset can be used to test hypotheses related to sex ads and intersite connectivity, understand the posting processes employed by prominent sites in the current online sex ad ecosystem, and develop multidisciplinary approaches for estimating ad legitimacy. Progress in any of these areas can result in potentially lifesaving interventions for ST victims. 
    more » « less
  5. With the prevalence of machine learning in many high-stakes decision-making processes, e.g., hiring and admission, it is important to take fairness into account when practitioners design and deploy machine learning models, especially in scenarios with imperfectly labeled data. Multiple-Instance Learning (MIL) is a weakly supervised approach where instances are grouped in labeled bags, each containing several instances sharing the same label. However, current fairness-centric methods in machine learning often fall short when applied to MIL due to their reliance on instance-level labels. In this work, we introduce a Fair Multiple-Instance Learning (FMIL) framework to ensure fairness in weakly supervised learning. In particular, our method bridges the gap between bag-level and instance-level labeling by leveraging the bag labels, inferring high-confidence instance labels to improve both accuracy and fairness in MIL classifiers. Comprehensive experiments underscore that our FMIL framework substantially reduces biases in MIL without compromising accuracy. 
    more » « less