Scaling Textual Gradients via Sampling-Based Momentum

Ding, Z; Hong, J; Wang, JT; Lin, Z; Wang, Z; Chen, Y

Citation Details

This content will become publicly available on July 19, 2026

Scaling Textual Gradients via Sampling-Based Momentum

As prompts become central to Large Language Models (LLMs), optimizing them is vital. Textual Stochastic Gradient Descent (TSGD) offers a data-driven approach by iteratively refining prompts using LLM-suggested updates over minibatches. We empirically show that increasing training data initially improves but can later degrade TSGD's performance across NLP tasks, while also raising computational costs. To address this, we propose Textual Stochastic Gradient Descent with Momentum (TSGD-M)—a scalable method that reweights prompt sampling based on past batches. Evaluated on 9 NLP tasks across three domains, TSGD-M outperforms TSGD baselines for most tasks and reduces performance variance. more »

Award ID(s):: 2313131

PAR ID:: 10618156

Author(s) / Creator(s):: Ding, Z; Hong, J; Wang, JT; Lin, Z; Wang, Z; Chen, Y

Publisher / Repository:: 2nd Workshop on Test-Time Adaptation: Putting Updates to the Test (PUT)

Date Published:: 2025-07-19

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on July 19, 2026
Workshop Report:
The DOI is not currently available.

More Like this