NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

LLM-Assisted Code Cleaning For Training Accurate Code Generators.

Jain, Naman; Zhang, Tianjun; Chiang, Wei-Lin; Gonzalez, Joseph E; Sen, Koushik; Stoica, Ion (October 2023, arXiv)

Full Text Available
The Wisdom of Hindsight Makes Language Models Better Instruction Followers

Zhang, Tianjun; Liu, Fangchen; Wong, Justin; Abbeel, Pieter; Gonzalez, Joseph E (July 2023, PMLR)

Reinforcement learning has seen wide success in finetuning large language models to better align with instructions via human feedback. The so-called algorithm, Reinforcement Learning with Human Feedback (RLHF) demonstrates impressive performance on the GPT series models. However, the underlying reinforcement learning algorithm is complex and requires additional training for reward and value networks. In this paper, we consider an alternative approach: converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner. Such an algorithm doesn’t require any additional parameters except for the original language model and maximally reuses the pretraining pipeline. To achieve this, we formulate instruction alignment problem for language models as a goal-reaching problem in decision making. We propose Hindsight Instruction Relabeling (HIR), a novel algorithm for aligning language models with instructions. The resulting two-stage algorithm shed light to a family of reward-free approaches that utilize the hindsightly relabeled instructions based on feedback. We evaluate the performance of HIR extensively on 12 challenging BigBench reasoning tasks and show that HIR outperforms the baseline algorithms and is comparable to or even surpasses supervised fine-tuning. The implementation of HIR is available at https://github.com/tianjunz/HIR.
more » « less
Full Text Available
Gorilla: Large Language Model Connected with Massive APIs

Patil, Shishir_G; Zhang, Tianjun; Wang, Xin; Gonzalez, Joseph_E (May 2023, arXiv)

Full Text Available
Gorilla: Large Language Model Connected with Massive APIs.

Patil, Shishir G; Zhang, Tianjun; Wang, Xin; Gonzalez, Joseph E (June 2023, arXiv)

Full Text Available
TEMPERA: TEST-TIME PROMPT EDITING VIA REINFORCEMENT LEARNING

Zhang, Tianjun; Wang, Xuezhi; Zhou, Denny; Schuurmans, Dale; Gonzalez, Joseph E (May 2023, International Conference on Learning Representations (ICLR))

Full Text Available
SPECTRAL DECOMPOSITION REPRESENTATION FOR REINFORCEMENT LEARNING

Ren, Tongzheng; Zhang, Tianjun; Lee, Lisa; Gonzalez, Joseph E; Schuurmans, Dale; Dai, Bo (May 2023, International Conference on Learning Representations (ICLR))

Full Text Available
ANODEv2: A Coupled Neural ODE Framework

Zhang, Tianjun and (April 2022, Advances in neural information processing systems)

It has been observed that residual networks can be viewed as the explicit Euler discretization of an Ordinary Differential Equation (ODE). This observation motivated the introduction of so-called Neural ODEs, which allow more general discretization schemes with adaptive time stepping. Here, we propose ANODEV2, which is an extension of this approach that allows evolution of the neural network parameters, in a coupled ODE-based formulation. The Neural ODE method introduced earlier is in fact a special case of this new framework. We present the formulation of ANODEV2, derive optimality conditions, and implement the coupled framework in PyTorch. We present empirical results using several different configurations of ANODEV2, testing them on multiple models on CIFAR-10. We report results showing that this coupled ODE-based framework is indeed trainable, and that it achieves higher accuracy, as compared to the baseline models as well as the recently-proposed Neural ODE approach.
more » « less
Full Text Available
Multi-objective Optimization by Learning Space Partitions

Zhao, Yiyang; Wang, Linnan; Yang, Kevin; Zhang, Tianjun; Guo, Tian; Tian, Yuandong (January 2022, International Conference on Learning Representations (ICLR'22))

Full Text Available
MADE: Exploration via Maximizing Deviation from Explored Regions

Zhang, Tianjun; Rashidinejad, Paria; Jiao, Jiantao; Tian, Yuandong; Gonzalez, Joseph E; Russell, Stuart (January 2021, Advances in Neural Information Processing Systems 34 (NeurIPS 2021))

Full Text Available
GenAx: A Genome Sequencing Accelerator

https://doi.org/10.1109/ISCA.2018.00017

Fujiki, Daichi; Subramaniyan, Arun; Zhang, Tianjun; Zeng, Yu; Das, Reetuparna; Blaauw, David; Narayanasamy, Satish (June 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA))

Full Text Available

« Prev Next »

Search for: All records