NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Fairness in Serving Large Language Models

Sheng, Ying; Cao, Shiyi; Li, Dacheng; Zhu, Banghua; Li, Zhuohan; Zhuo, Danyang; Gonzalez, Joseph E; Stoica, Ion (July 2024, 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24))

High-demand LLM inference services (e.g., ChatGPT and BARD) support a wide range of requests from short chat conversations to long document reading. To ensure that all client requests are processed fairly, most major LLM inference services have request rate limits, to ensure that no client can dominate the request queue. However, this rudimentary notion of fairness also results in under-utilization of the resources and poor client experience when there is spare capacity. While there is a rich literature on fair scheduling, serving LLMs presents new challenges due to their unpredictable request lengths and their unique batching characteristics on parallel accelerators. This paper introduces the definition of LLM serving fairness based on a cost function that accounts for the number of input and output tokens processed. To achieve fairness in serving, we propose a novel scheduling algorithm, the Virtual Token Counter (VTC), a fair scheduler based on the continuous batching mechanism. We prove a 2× tight upper bound on the service difference between two backlogged clients, adhering to the requirement of work-conserving. Through extensive experiments, we demonstrate the superior performance of VTC in ensuring fairness, especially in contrast to other baseline methods, which exhibit shortcomings under various conditions. The reproducible code is available at https://github.com/Ying1123/VTC-artifact.
more » « less
Full Text Available
Simple Token-Level Confidence Improves Caption Correctness

https://doi.org/10.1109/WACV57701.2024.00564

Petryk, Suzanne; Whitehead, Spencer; Gonzalez, Joseph E; Darrell, Trevor; Rohrbach, Anna; Rohrbach, Marcus (January 2024, IEEE)

Full Text Available
CLAIR: Evaluating Image Captions with Large Language Models.

Chan, David; Petryk, Suzanne; Gonzalez, Joseph E; Darrell, Trevor; Canny, John F (October 2023, arXiv)

Full Text Available
Leveraging Cloud Computing to Make Autonomous Vehicles Safer

https://doi.org/10.1109/IROS55552.2023.10341821

Schafhalter, Peter; Kalra, Sukrit; Xu, Le; Gonzalez, Joseph E; Stoica, Ion (October 2023, IEEE)

Full Text Available
LLM-Assisted Code Cleaning For Training Accurate Code Generators.

Jain, Naman; Zhang, Tianjun; Chiang, Wei-Lin; Gonzalez, Joseph E; Sen, Koushik; Stoica, Ion (October 2023, arXiv)

Full Text Available
Investigating the Behavior of Diffusion Models for Accelerating Electronic Structure Calculations.

Rothchild, Daniel; Rosen, Andrew S; Taw, Eric; Robinson, Connie; Gonzalez, Joseph E; Krishnapriyan, Aditi S (November 2023, arXiv)

Full Text Available
The Wisdom of Hindsight Makes Language Models Better Instruction Followers

Zhang, Tianjun; Liu, Fangchen; Wong, Justin; Abbeel, Pieter; Gonzalez, Joseph E (July 2023, PMLR)

Reinforcement learning has seen wide success in finetuning large language models to better align with instructions via human feedback. The so-called algorithm, Reinforcement Learning with Human Feedback (RLHF) demonstrates impressive performance on the GPT series models. However, the underlying reinforcement learning algorithm is complex and requires additional training for reward and value networks. In this paper, we consider an alternative approach: converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner. Such an algorithm doesn’t require any additional parameters except for the original language model and maximally reuses the pretraining pipeline. To achieve this, we formulate instruction alignment problem for language models as a goal-reaching problem in decision making. We propose Hindsight Instruction Relabeling (HIR), a novel algorithm for aligning language models with instructions. The resulting two-stage algorithm shed light to a family of reward-free approaches that utilize the hindsightly relabeled instructions based on feedback. We evaluate the performance of HIR extensively on 12 challenging BigBench reasoning tasks and show that HIR outperforms the baseline algorithms and is comparable to or even surpasses supervised fine-tuning. The implementation of HIR is available at https://github.com/tianjunz/HIR.
more » « less
Full Text Available
Gorilla: Large Language Model Connected with Massive APIs.

Patil, Shishir G; Zhang, Tianjun; Wang, Xin; Gonzalez, Joseph E (June 2023, arXiv)

Full Text Available
Efficiently Programming Large Language Models using SGLang.

Zheng, Lianmin; Yin, Liangsheng; Xie, Zhiqiang; Huang, Jeff; Sun, Chuyue; Yu, Cody_Hao; Cao, Shiyi; Kozyrakis, Christos; Stoica, Ion; Gonzalez, Joseph E; et al (December 2023, arXiv)

Full Text Available
TEMPERA: TEST-TIME PROMPT EDITING VIA REINFORCEMENT LEARNING

Zhang, Tianjun; Wang, Xuezhi; Zhou, Denny; Schuurmans, Dale; Gonzalez, Joseph E (May 2023, International Conference on Learning Representations (ICLR))

Full Text Available

« Prev Next »

Search for: All records