Finding Support for Tabular LLM Outputs

Fan, Grace; Shraga, Roee; Miller, Renée J

Citation Details

With the emerging advancements of AI, validating data generated by AI models becomes a key challenge. In this work, we tackle the problem of validating tabular data generated by large language models (LLMs). By leveraging a recently proposed technique called Gen-T, we present a technique to verify if the data in the LLM table can be reclaimed (reproduced) using tables available in a given data lake (for example, tables used to train the LLM). Specically, we measure the number of data lake tables that support tuples (or partial tuples) in a generated table. We further provide suggestions for value replacements if a generated value cannot be reclaimed. Using this approach, users can evaluate their LLM-generated tables, consider potential modications for table values, and gauge how much support the modied table has from the data lake. We discuss two case studies showing that table values annotated with reclama- tion support scores, along with possible value replacements, can help users assess the trustworthiness of LLM-generated tables. more »

Award ID(s):: 2325632 2107248 1956096

PAR ID:: 10539018

Author(s) / Creator(s):: Fan, Grace; Shraga, Roee; Miller, Renée J

Publisher / Repository:: PVLDB Workshop on Tabular Data Analysis (TaDA)

Date Published:: 2024-08-28

Subject(s) / Keyword(s):: Data Management

Format(s):: Medium: X

Location:: China

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this