Search for: All records

Award ID contains: 2107048

« Prev Next »

Total Resources

5

Resource Type
Conference Paper

5

Conference Proceeding

0

Dataset

0

Journal Article

0

Workshop Report

0

Availability
Full Text / Resource Available

5

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ReAct: Synergizing Reasoning and Acting in Language Models

Yao, Shunyu ; Zhao, Jeffrey ; Yu, Dian ; Du, Nan ; Shafran, Izhak ; Narasimhan, Karthik ; Cao, Yuan ( January 2023 , International Conference on Learning Representations (ICLR))

While large language models (LLMs) have demonstrated impressive capabilities across tasks in language understanding and interactive decision making, their abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g. action plan generation) have primarily been studied as separate topics. In this paper, we explore the use of LLMs to generate both reasoning traces and task-specific actions in an interleaved manner, allowing for greater synergy between the two: reasoning traces help the model induce, track, and update action plans as well as handle exceptions, while actions allow it to interface with external sources, such as knowledge bases or environments, to gather additional information. We apply our approach, named ReAct, to a diverse set of language and decision making tasks and demonstrate its effectiveness over state-of-the-art baselines, as well as improved human interpretability and trustworthiness over methods without reasoning or acting components. Concretely, on question answering (HotpotQA) and fact verification (Fever), ReAct overcomes issues of hallucination and error propagation prevalent in chain-of-thought reasoning by interacting with a simple Wikipedia API, and generates human-like task-solving trajectories that are more interpretable than baselines without reasoning traces. On two interactive decision making benchmarks (ALFWorld and WebShop), ReAct outperforms imitation and reinforcement learning methods by an absolute success rate of 34% and 10% respectively, while being prompted with only one or two in-context examples.
more » « less
Full Text Available
Remember the Past: Distilling Datasets into Addressable Memories for Neural Networks

Deng, Zhiwei ; Russakovsky, Olga ( December 2022 , 36th Conference on Neural Information Processing Systems (NeurIPS 2022))
Multi-query video retrieval

Wang, Zeyu ; Wu, Yu ; Narasimhan, Karthik ; Russakovsky, Olga ( October 2022 , European Conference on Computer Vision (2022))
CARETS: A Consistency And Robustness Evaluative Test Suite for VQA

https://doi.org/10.18653/v1/2022.acl-long.443

Jimenez, Carlos E. ; Russakovsky, Olga ; Narasimhan, Karthik ( January 2022 , Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics)

Full Text Available
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents

Yao, Shunyu ; Chen, Howard ; Yang, John ; Narasimhan, Karthik ( January 2022 , Advances in neural information processing systems)

Most existing benchmarks for grounding language in interactive environments either lack realistic linguistic elements, or prove difficult to scale up due to substantial human involvement in the collection of data or feedback signals. We develop WebShop – a simulated e-commerce website environment with 1.18 million real-world products and 12,087 crowd-sourced text instructions. In this environment, an agent needs to navigate multiple types of webpages and issue diverse actions to find, customize, and purchase a product given an instruction. WebShop provides several challenges including understanding compositional instructions, query (re-)formulation, dealing with noisy text in webpages, and performing strategic exploration. We collect over 1,600 human trajectories to first validate the benchmark, then train and evaluate a diverse range of agents using reinforcement learning, imitation learning, and pre-trained image and language models. Our best model achieves a task success rate of 29%, which significantly outperforms rule heuristics but is far lower than expert human performance (59%). We also analyze agent and human trajectories and ablate various model components to provide insights for developing future agents with stronger language understanding and decision making abilities. Finally, we show our agent trained on WebShop exhibits non-trivial sim-to-real transfer when evaluated on amazon.com and ebay.com, indicating the potential value of our benchmark for developing practical web agents that can operate in the wild.
more » « less
Full Text Available