Chameleon: Foundation Models for Fairness-Aware Multi-Modal Data Augmentation to Enhance Coverage of Minorities

Erfanian, Mahdi; Jagadish, H V; Asudeh, Abolfazl

doi:10.14778/3681954.3682014

Citation Details

Chameleon: Foundation Models for Fairness-Aware Multi-Modal Data Augmentation to Enhance Coverage of Minorities

Potential harms from the under-representation of minorities in data, particularly in multi-modal settings, is a well-recognized concern. While there has been extensive effort in detecting such under-representation, resolution has remained a challenge. With recent generative AI advancements, large language and foundation models have emerged as versatile tools across various domains. In this paper, we propose Chameleon, a system that efficiently utilizes these tools to augment a dataset with minimal addition of synthetically generated tuples to enhance the coverage of the under-represented groups. Our system applies quality and outlier-detection tests to ensure the quality and semantic integrity of the generated tuples. In order to minimize the rejection chance of the generated tuples, we propose multiple strategies to provide a guide for the foundation model. Our experiment results, in addition to confirming the efficiency of our proposed algorithms, illustrate our approach's effectiveness, as the model's unfairness in a downstream task significantly dropped after data repair using Chameleon. more »

Award ID(s):: 2107290 2348919 2312931 2106176

PAR ID:: 10554109

Author(s) / Creator(s):: Erfanian, Mahdi; Jagadish, H V; Asudeh, Abolfazl

Publisher / Repository:: ACM

Date Published:: 2024-07-01

Journal Name:: Proceedings of the VLDB Endowment

Volume:: 17

Issue:: 11

ISSN:: 2150-8097

Page Range / eLocation ID:: 3470 to 3483

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.14778/3681954.3682014

More Like this