NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

AutoDDG: Automated Dataset Description Generation using Large Language Models

Zhang, Haoxiang; Liu, Yurong; Santos, Aecio; Hung, Wei-Lun; Freire, Juliana (December 2025, https://arxiv.org/pdf/2502.01050)

Free, publicly-accessible full text available December 18, 2026
HILTS: Human-LLM collaboration for effective data labeling

https://doi.org/10.1016/j.is.2025.102660

Barbosa, Juliana; Alencar, Eduarda; Fan, Grace; Santos, Aécio; Freire, Juliana (December 2025, Information Systems)

Free, publicly-accessible full text available December 1, 2026
Hierarchical Table Semantics for Exploratory Table Discovery

https://doi.org/10.1145/3736733.3736746

Fan, Grace; Freire, Juliana (July 2025, ACM)

Free, publicly-accessible full text available July 8, 2026
Descriptive Analysis of Online Wildlife Products Using Vision Language Models

https://doi.org/10.1145/3715335.3735484

Sharma, Kinshuk; Barbosa, Juliana Silva; Roberts, Spencer; Gondhali, Ulhas; Petrossian, Gohar; Jacquet, Jennifer; Freire, Juliana; Chakraborty, Sunandan (July 2025, Proceedings of the ACM SIGCAS/SIGCHI Conference on Computing and Sustainable Societies)

Free, publicly-accessible full text available July 21, 2026
A Cost-Effective LLM-based Approach to Identify Wildlife Trafficking in Online Marketplaces

Barbosa, Juliana; Gondhali, Ulhas; Petrossian, Gohar; Sharma, Kinshuk; Chakraborty, Sunandan; Jacquet, Jennifer; Freire, Juliana (June 2025, Proceedings of the ACM on Management of Data, Volume 3, Issue 3)

Free, publicly-accessible full text available June 18, 2026
Matrix Product Sketching via Coordinated Sampling

Daliri, Majid; Freire, Juliana; Li, Danrong; Musco, Christopher (April 2025, International Conference on Learning Representations (ICLR))

Free, publicly-accessible full text available April 24, 2026
Prevalence of endangered shark trophies in automated detection of the online wildlife trade

https://doi.org/10.1016/j.biocon.2025.110992

Chakraborty, Sunandan; Roberts, Spencer N; Petrossian, Gohar A; Sosnowski, Monique; Freire, Juliana; Jacquet, Jennifer (April 2025, Biological Conservation)

Direct exploitation, which includes the trade of wild animals for their parts, is a major driver of extinction. Digital communication tools, particularly the internet, have facilitated the trade in endangered species. Here, we automatically collected data to analyze online sales of threatened animals across 148 English-text online mar- ketplaces. We created a tool that searched for online sales of 13,267 animal species at risk of global extinction, as classified by the International Union for Conservation of Nature (IUCN), as well as 706 animal species on Ap- pendix I of the Convention for International Trade in Endangered Species (CITES), for which international commercial trade is prohibited. Examining a period of 15 weeks in 2018, we identified 10,699 unique listings selling body parts or eggs of threatened species, of which 4131 contained a full species name (common or sci- entific). These 4131 results were then filtered by keywords and, finally, manually vetted, which yielded 546 sale listings for 83 species. Of these 546 listings, 61 % advertised shark trophies (mainly jaws), 73 % of which were taken from species listed as endangered or critically endangered. Just four websites hosted >95 % of listings. We identified 18 species for sale that are included on CITES Appendix I. We also identified 13 species for which the IUCN had not identified intentional use as a threat. This work expands current understanding about the dealing of endangered and potentially illegal species online, specifies taxa threatened by online trade, and highlights emerging opportunities and persistent challenges to preventing the trafficking of threatened species.
more » « less
Free, publicly-accessible full text available April 1, 2026
A Cost-Effective LLM-based Approach to Identify Wildlife Trafficking in Online Marketplaces

https://doi.org/10.1145/3725256

Barbosa, Juliana Silva; Gondhali, Ulhas; Petrossian, Gohar; Sharma, Kinshuk; Chakraborty, Sunandan; Jacquet, Jennifer; Freire, Juliana (June 2025, Proceedings of the ACM on Management of Data)

Wildlife trafficking remains a critical global issue, significantly impacting biodiversity, ecological stability, and public health. Despite efforts to combat this illicit trade, the rise of e-commerce platforms has made it easier to sell wildlife products, putting new pressure on wild populations of endangered and threatened species. The use of these platforms also opens a new opportunity: as criminals sell wildlife products online, they leave digital traces of their activity that can provide insights into trafficking activities as well as how they can be disrupted. The challenge lies in finding these traces. Online marketplaces publish ads for a plethora of products, and identifying ads for wildlife-related products is like finding a needle in a haystack. Learning classifiers can automate ad identification, but creating them requires costly, time-consuming data labeling that hinders support for diverse ads and research questions. This paper addresses a critical challenge in the data science pipeline for wildlife trafficking analytics: generating quality labeled data for classifiers that select relevant data. While large language models (LLMs) can directly label advertisements, doing so at scale is prohibitively expensive. We propose a cost-effective strategy that leverages LLMs to generate pseudo labels for a small sample of the data and uses these labels to create specialized classification models. Our novel method automatically gathers diverse and representative samples to be labeled while minimizing the labeling costs. Our experimental evaluation shows that our classifiers achieve up to 95% F1 score, outperforming LLMs at a lower cost. We present real use cases that demonstrate the effectiveness of our approach in enabling analyses of different aspects of wildlife trafficking.
more » « less
Free, publicly-accessible full text available June 17, 2026
Large Language Models for Data Discovery and Integration: Challenges and Opportunities

Freire, Juliana; Fan, Grace; Feuer, Benjamin; Koutras, Christos; Liu, Yurong; Pena, Eduardo; Santos, Aécio; Silva, Cláudio T; Wu, Eden (April 2025, IEEE Data Engineering Bulletin)

Free, publicly-accessible full text available April 3, 2026
Efficiently Estimating Mutual Information Between Attributes Across Tables

https://doi.org/10.1109/ICDE60146.2024.00022

Santos, Aécio; Korn, Flip; Freire, Juliana (May 2024, IEEE)

Full Text Available

« Prev Next »

Search for: All records