SpareLLM: Automatically Selecting Task-Specific Minimum-Cost Large Language Models under Equivalence Constraint

Jo, Saehan; Trummer, Immanuel

doi:10.1145/3725356

Citation Details

This content will become publicly available on June 17, 2026

SpareLLM: Automatically Selecting Task-Specific Minimum-Cost Large Language Models under Equivalence Constraint

We introduce SpareLLM, SelectingPassableAndResource-EfficientLLMs, a novel LLM framework designed to minimize the inference costs (i.e., resource-efficient) of large-scale NLP tasks while ensuring sufficient result quality (i.e., passable). It enables users to specify an equivalence constraint in terms of the equivalence of outputs to those of the most powerful LLM. SpareLLM then generates results that deviate from the outputs of this LLM only with a probability below a user-defined threshold. SpareLLM employs a profiling phase that evaluates the performance of multiple LLMs to identify those that meet the user-defined equivalence level. It optimizes the tradeoff between profiling overheads and the anticipated cost savings resulting from profiling. Moreover, SpareLLM further reduces inference costs by strategically leveraging a mix of LLMs. Our experiments on five real-world datasets show that SpareLLM achieves significant cost savings, up to 8.6x, while generating equivalent outputs in 90% of cases compared to GPT-4-Turbo. Compared to recent LLM cascading baselines, SpareLLM demonstrates a superior tradeoff between cost and accuracy, accounting for 91.1% and 83.8% of the points on the Pareto curve for OpenAI and Llama models. more »

Award ID(s):: 2239326

PAR ID:: 10621145

Author(s) / Creator(s):: Jo, Saehan; Trummer, Immanuel

Publisher / Repository:: ACM

Date Published:: 2025-06-17

Journal Name:: Proceedings of the ACM on Management of Data

Volume:: 3

Issue:: 3

ISSN:: 2836-6573

Page Range / eLocation ID:: 1 to 26

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on June 17, 2026
Journal Article:
https://doi.org/10.1145/3725356

More Like this