NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

How far can camels go? exploring the state of instruction tuning on open resources

Wang, Yizhong; Ivison, Hamish; Dasigi, Pradeep; Hessel, Jack; Khot, Tushar; Chandu, Khyathi; Wadden, David; MacMillan, Kelsey; Smith, Noah; Beltagy, Iz; et al (May 2024, Neurips)

Full Text Available
SynerGPT: In-Context Learning for Personalized Drug Synergy Prediction and Drug Design

Edwards, Carl; Naik, Aakanksha; Khot, Tushar; Burke, Martin; Ji, Heng; Hope, Tom (October 2023, First Conference on Language Modeling COLM)

Full Text Available
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

https://doi.org/10.18653/v1/2024.acl-long.850

Trivedi, Harsh; Khot, Tushar; Hartmann, Mareike; Manku, Ruskin; Dong, Vinty; Li, Edward; Gupta, Shashank; Sabharwal, Ashish; Balasubramanian, Niranjan (January 2024, Association for Computational Linguistics)

Full Text Available
Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

https://doi.org/10.18653/v1/2023.acl-long.557

Trivedi, Harsh; Balasubramanian, Niranjan; Khot, Tushar; Sabharwal, Ashish (January 2023, 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))

Full Text Available
Teaching Broad Reasoning Skills via Decomposition-Guided Contexts

Trivedi, Harsh; Balasubramanian, Niranjan; Khot, Tushar; Sabharwal, Ashish (December 2022, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing)

Question-answering datasets require a broad set of reasoning skills. We show how to use question decompositions to teach language models these broad reasoning skills in a robust fashion. Specifically, we use widely available QDMR representations to programmatically create hard-to-cheat synthetic contexts for real questions in six multi-step reasoning datasets. These contexts are carefully designed to avoid common reasoning shortcuts prevalent in real contexts that prevent models from learning the right skills. This results in a pretraining dataset, named TeaBReaC, containing 525K multi-step questions (with associated formal programs) covering about 900 reasoning patterns. We show that pretraining standard language models (LMs) on TeaBReaC before fine-tuning them on target datasets improves their performance by up to 13 F1 points across 4 multi-step QA datasets, with up to 21 point gain on more complex questions. The resulting models also demonstrate higher robustness, with a 5-8 F1 point improvement on two contrast sets. Furthermore, TeaBReaC pretraining substantially improves model performance and robustness even when starting with numerate LMs pretrained using recent methods (e.g., PReasM, POET). Our work thus shows how to effectively use decomposition-guided contexts to robustly teach multi-step reasoning.
more » « less
Full Text Available
GooAQ: Open Question Answering with Diverse Answer Types

https://doi.org/10.18653/v1/2021.findings-emnlp.38

Khashabi, Daniel; Ng, Amos; Khot, Tushar; Sabharwal, Ashish; Hajishirzi, Hannaneh; Callison-Burch, Chris (January 2021, Findings of the Association for Computational Linguistics: EMNLP 2021)

While day-to-day questions come with a variety of answer types, the current question-answering (QA) literature has failed to adequately address the answer diversity of questions. To this end, we present GooAQ, a large-scale dataset with a variety of answer types. This dataset contains over 5 million questions and 3 million answers collected from Google. GooAQ questions are collected semi-automatically from the Google search engine using its autocomplete feature. This results in naturalistic questions of practical interest that are nonetheless short and expressed using simple language. GooAQ answers are mined from Google’s responses to our collected questions, specifically from the answer boxes in the search results. This yields a rich space of answer types, containing both textual answers (short and long) as well as more structured ones such as collections. We benchmark T5 models on GooAQ and observe that: (a) in line with recent work, LM’s strong performance on GooAQ’s short-answer questions heavily benefit from annotated data; however, (b) their quality in generating coherent and accurate responses for questions requiring long responses (such as ‘how’ and ‘why’ questions) is less reliant on observing annotated data and mainly supported by their pre-training. We release GooAQ to facilitate further research on improving QA with diverse response types.
more » « less
Full Text Available
Repurposing Entailment for Multi-Hop Question Answering Tasks

Trivedi, Harsh; Kwon, Heeyoung; Khot, Tushar; Sabharwal, Ashish; Balasubramanian, Niranjan (June 2019, North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies)

Question Answering (QA) naturally reduces to an entailment problem, namely, verifying whether some text entails the answer to a question. However, for multi-hop QA tasks, which require reasoning with \textit{multiple} sentences, it remains unclear how best to utilize entailment models pre-trained on large scale datasets such as SNLI, which are based on sentence pairs. We introduce Multee, a general architecture that can effectively use entailment models for multi-hop QA tasks. Multee uses (i) a local module that helps locate important sentences, thereby avoiding distracting information, and (ii) a global module that aggregates information by effectively incorporating importance weights. Importantly, we show that both modules can use entailment functions pre-trained on a large scale NLI datasets. We evaluate performance on MultiRC and OpenBookQA, two multihop QA datasets. When using an entailment function pre-trained on NLI datasets, Multee outperforms QA models trained only on the target QA datasets and the OpenAI transformer models.
more » « less
Full Text Available
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Srivastava, Aarohi; Rastogi, Abhinav; Rao, Abhishek; Shoeb, Abu Awal; Abid, Abubakar; Fisch, Adam; Brown, Adam R.; Santoro, Adam; Gupta, Aditya; Garriga-Alonso, Adri; et al (January 2023, Transactions on machine learning research)

Full Text Available

Search for: All records