skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Flextron: Many-in-One Flexible Large Language Model
raining modern large language models (LLMs) is extremely resource-intensive, and repeatedly customizing them for deployment scenarios with limited compute and memory is impractical. This paper introduces Flextron, a network architecture and post-training model optimization framework that supports flexible model deployment. Flextron uses a nested elastic structure that adapts rapidly to user-defined latency and accuracy targets during inference without requiring additional fine-tuning. It is also input-adaptive, automatically routing tokens through sub-networks for improved efficiency and performance. The authors propose a sample-efficient training method and routing algorithms to systematically transform an already-trained LLM into a Flextron model. Evaluation on the GPT-3 and LLaMA-2 families demonstrates Flextron’s superior performance over end-to-end trained variants and other state-of-the-art elastic networks, all with a single pretraining run that consumes only 7.63% of the tokens compared to original pretraining.  more » « less
Award ID(s):
2505865
PAR ID:
10631936
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
https://doi.org/10.48550/arXiv.2406.10260
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Training modern LLMs is extremely resource intensive, and customizing them for various deployment scenarios characterized by limited compute and memory resources through repeated training is impractical. In this paper, we introduce Flextron, a network architecture and post-training model optimization framework supporting flexible model deployment. The Flextron architecture utilizes a nested elastic structure to rapidly adapt to specific user-defined latency and accuracy targets during inference with no additional fine-tuning required. It is also input-adaptive, and can automatically route tokens through its sub-networks for improved performance and efficiency. We present a sample-efficient training method and associated routing algorithms for systematically transforming an existing trained LLM into a Flextron model. We evaluate Flextron on the GPT-3 and LLama-2 family of LLMs, and demonstrate superior performance over multiple end-to-end trained variants and other state-of-the-art elastic networks, all with a single pretraining run that consumes a mere 7.63% tokens compared to original pretraining. 
    more » « less
  2. Language models (LMs) are pretrained to imitate internet text, including content that would violate human preferences if generated by an LM: falsehoods, offensive comments, personally identifiable information, low-quality or buggy code, and more. Here, we explore alternative objectives for pretraining LMs in a way that also guides them to generate text aligned with human preferences. We benchmark five objectives for pretraining with human feedback across three tasks and study how they affect the trade-off between alignment and capabilities of pretrained LMs. We find a Pareto optimal and simple approach among those we explored: conditional training, or learning distribution over tokens conditional on their human preference scores given by a reward model. Conditional training reduces the rate of undesirable content by up to an order of magnitude, both when generating without a prompt and with an adversarially chosen prompt. Moreover, conditional training maintains the downstream task performance of standard LM pretraining, both before and after task-specific finetuning. Pretraining with human feedback results in much better preference satisfaction than standard LM pretraining followed by finetuning with feedback, i.e., learning and then unlearning undesirable behavior. Our results suggest that we should move beyond imitation learning when pretraining LMs and incorporate human preferences from the start of training. 
    more » « less
  3. Scalable methods for optical transmission performance prediction using machine learning (ML) are studied in metro reconfigurable optical add-drop multiplexer (ROADM) networks. A cascaded learning framework is introduced to encompass the use of cascaded component models for end-to-end (E2E) optical path prediction augmented with different combinations of E2E performance data and models. Additional E2E optical path data and models are used to reduce the prediction error accumulation in the cascade. Off-line training (pre-trained prior to deployment) and transfer learning are used for component-level erbium-doped fiber amplifier (EDFA) gain models to ensure scalability. Considering channel power prediction, we show that the data collection process of the pre-trained EDFA model can be reduced to only 5% of the original training set using transfer learning. We evaluate the proposed method under three different topologies with field deployed fibers and achieve a mean absolute error of 0.16 dB with a single (one-shot) E2E measurement on the deployed 6-span system with 12 EDFAs. 
    more » « less
  4. Language models (LMs) are pretrained to imitate text from large and diverse datasets that contain content that would violate human preferences if generated by an LM: falsehoods, offensive comments, personally identifiable information, low-quality or buggy code, among others. Here, we explore alternative objectives for pretraining LMs in a way that also guides them to generate text aligned with human preferences. We benchmark five objectives for pretraining with human feedback across three tasks and study how they affect the alignment and capabilities of pretrained LMs. We find a Pareto-optimal and simple approach among those we explored: conditional training, or learning distribution over tokens conditional on their human preference scores. Conditional training reduces the rate of undesirable content by up to an order of magnitude, both when generating without a prompt and with an adversarially-chosen prompt. Moreover, conditional training maintains the downstream task performance of standard LM pretraining, both before and after task-specific finetuning. Pretraining with human feedback results in much better preference satisfaction than standard LM pretraining followed by finetuning with feedback, i.e., learning and then unlearning undesirable behavior. Our results suggest that we should move beyond imitation learning when pretraining LMs and incorporate human preferences from the start of training. 
    more » « less
  5. null (Ed.)
    In this paper, we investigate the effect of pretraining CNNs on Ima- geNet on their performance when refined for steganalysis of digital images. In many cases, it seems that just ’seeing’ a large number of images helps with the convergence of the network during the refinement no matter what the pretraining task is. To achieve the best performance, the pretraining task should be related to steganal- ysis, even if it is done on a completely mismatched cover and stego datasets. Furthermore, the pretraining does not need to be carried out for very long and can be done with limited computational re- sources. An additional advantage of the pretraining is that it is done on color images and can later be applied for steganalysis of color and grayscale images while still having on-par or better perfor- mance than detectors trained specifically for a given source. The refining process is also much faster than training the network from scratch. The most surprising part of the paper is that networks pretrained on JPEG images are a good starting point for spatial domain steganalysis as well. 
    more » « less