Generating Succinct Descriptions of Database Schemata for Cost-Efficient Prompting of Large Language Models

Trummer, Immanuel

doi:10.14778/3681954.3682017

Citation Details

Generating Succinct Descriptions of Database Schemata for Cost-Efficient Prompting of Large Language Models

Using large language models (LLMs) for tasks like text-to-SQL translation often requires describing the database schema as part of the model input. LLM providers typically charge as a function of the number of tokens read. Hence, reducing the length of the schema description saves money at each model invocation. This paper introduces Schemonic, a system that automatically finds concise text descriptions of relational database schemata. By introducing abbreviations or grouping schema elements with similar properties, Schemonic typically finds descriptions that use significantly fewer tokens than naive schema representations. Internally, Schemonic models schema compression as a combinatorial optimization problem and uses integer linear programming solvers to find guaranteed optimal or near-optimal solutions. It speeds up optimization by starting optimization from heuristic solutions and reducing the search space size via pre-processing. The experiments on TPC-H, SPIDER, and Public-BI demonstrate that Schemonic reduces schema description length significantly, along with fees for reading them, without reducing the accuracy in tasks such as text-to-SQL translation. more »

Award ID(s):: 2239326

PAR ID:: 10577784

Author(s) / Creator(s):: Trummer, Immanuel

Publisher / Repository:: VLDB Endowment

Date Published:: 2024-07-01

Journal Name:: Proceedings of the VLDB Endowment

Volume:: 17

Issue:: 11

ISSN:: 2150-8097

Page Range / eLocation ID:: 3511 to 3523

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.14778/3681954.3682017

More Like this