skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Down and Across: Introducing Crossword-Solving as a New NLP Benchmark
Solving crossword puzzles requires diverse reasoning capabilities, access to a vast amount of knowledge about language and the world, and the ability to satisfy the constraints imposed by the structure of the puzzle. In this work, we introduce solving crossword puzzles as a new natural language understanding task. We release a corpus of crossword puzzles collected from the New York Times daily crossword spanning 25 years and comprised of a total of around nine thousand puzzles. These puzzles include a diverse set of clues: historic, factual, word meaning, synonyms/antonyms, fill-in-the-blank, abbreviations, prefixes/suffixes, wordplay, and cross-lingual, as well as clues that depend on the answers to other clues. We separately release the clue-answer pairs from these puzzles as an open-domain question answering dataset containing over half a million unique clue-answer pairs. For the question answering task, our baselines include several sequence-to-sequence and retrieval-based generative models. We also introduce a non-parametric constraint satisfaction baseline for solving the entire crossword puzzle. Finally, we propose an evaluation framework which consists of several complementary performance metrics.  more » « less
Award ID(s):
1652742
PAR ID:
10400763
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Page Range / eLocation ID:
2648 to 2659
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Parsons puzzles are jigsaw puzzles wherein students are given a program in scrambled order and tasked with reassembling the program in its correct order. Do students use program semantics when solving Parsons puzzles? The answer to this question has implications for the use of Parsons puzzles as a pedagogic tool. In order to answer this question, we considered semantics at the level of statements and control-flow. We analyzed the data collected by a Parsons puzzle tutor over 5 semesters and measured the extent to which students’ puzzle-solving behavior conformed with the use of statement-level and control-flow semantics. We found that students used statement-level semantics to assemble up to 73% of the lines in a puzzle and control-flow semantics to assemble up to 47% of the lines. They used statement-level semantics more than control-flow semantics and more on some puzzles than others. Whenever we found a significant difference between C++ and Java students, C++ students used semantics more than Java students. Finally, we did not find an increase in the use of semantics with in-creased practice. 
    more » « less
  2. The ARQMath Lab at CLEF 2020 considers the problem of finding answers to new mathematical questions among posted answers on a community question answering site (Math Stack Exchange). Queries are question postings held out from the test collection, each containing both text and at least one formula. We expect this to be a challenging task, as both math and text may be needed to find relevant answer posts. While several models have been proposed for text question answering, math question answering is in an earlier stage of development. To advance math-aware search and mathematical question answering systems, we will create a standard test collection for researchers to use for benchmarking. ARQMath will also include a formula retrieval sub-task: individual formulas from question posts are used to locate formulas in earlier answer posts, with relevance determined by narrative fields created based on the original question. We will use these narrative fields to explore diverse information needs for formula search (e.g., alternative notation, applications in specific fields or definition). 
    more » « less
  3. We formally introduce, define, and construct {\em memory-hard puzzles}. Intuitively, for a difficulty parameter $$t$$, a cryptographic puzzle is memory-hard if any parallel random access machine (PRAM) algorithm with ``small'' cumulative memory complexity ($$\ll t^2$$) cannot solve the puzzle; moreover, such puzzles should be both ``easy'' to generate and be solvable by a sequential RAM algorithm running in time $$t$$. Our definitions and constructions of memory-hard puzzles are in the standard model, assuming the existence of indistinguishability obfuscation (\iO) and one-way functions (OWFs), and additionally assuming the existence of a {\em memory-hard language}. Intuitively, a language is memory-hard if it is undecidable by any PRAM algorithm with ``small'' cumulative memory complexity, while a sequential RAM algorithm running in time $$t$$ can decide the language. Our definitions and constructions of memory-hard objects are the first such definitions and constructions in the standard model without relying on idealized assumptions (such as random oracles). We give two applications which highlight the utility of memory-hard puzzles. For our first application, we give a construction of a (one-time) {\em memory-hard function} (MHF) in the standard model, using memory-hard puzzles and additionally assuming \iO and OWFs. For our second application, we show any cryptographic puzzle (\eg, memory-hard, time-lock) can be used to construct {\em resource-bounded locally decodable codes} (LDCs) in the standard model, answering an open question of Blocki, Kulkarni, and Zhou (ITC 2020). Resource-bounded LDCs achieve better rate and locality than their classical counterparts under the assumption that the adversarial channel is resource bounded (e.g., a low-depth circuit). Prior constructions of MHFs and resource-bounded LDCs required idealized primitives like random oracles. 
    more » « less
  4. Answering complex questions that require making latent decisions is a challenging task, especially when limited supervision is available. Recent works leverage the capabilities of large language models (LMs) to perform complex question answering in a few-shot setting by demonstrating how to output intermediate rationalizations while solving the complex question in a single pass. We introduce “Successive Prompting” where, we iteratively break down a complex task into a simple task, solve it, and then repeat the process until we get the final solution. Successive prompting decouples the supervision for decomposing complex questions from the supervision for answering simple questions, allowing us to (1) have multiple opportunities to query in-context examples at each reasoning step (2) learn question decomposition separately from question answering, including using synthetic data, and (3) use bespoke (fine-tuned) components for reasoning steps where a large LM does not perform well. The intermediate supervision is typically manually written, which can be expensive to collect. We introduce a way to generate synthetic dataset which can be used to bootstrap model’s ability to decompose and answer intermediate questions. Our best model (with successive prompting) achieves an improvement in F1 of ~5% when compared with a state-of-the-art model with synthetic augmentations and few-shot version of the DROP dataset. 
    more » « less
  5. Parsons puzzles are popular for programming education. Identifying the strategies used by students solving Parsons puzzles is of interest because they can be used to determine to what extent students use the strategies typically associated with programming expertise, and to provide feedback and monitor the progress of students in a tutor. We propose solution sequence as an approximation of the student’s strategy for solving Parsons puzzles. It is scalable in terms of both the size of the puzzle and the number of students solving the puzzle. We propose BNF grammar to represent desirable puzzle-solving strategies associated with expert programmers. This representation is extensible and agnostic to the puzzle-solving strategies themselves. Finally, we propose a best match parser that matches a student’s solution sequence against the BNF grammar of a desirable strategy and quantifies the degree to which the student’s solution conforms to the desirable strategy. As a proof of concept, we used the parser to analyze the data collected by a Parsons puzzle tutor on if-else statements over five semesters and found a significant difference between C++ and Java puzzle-solvers in terms of their conformance to one desirable puzzle-solving strategy. Being able to tease out the effects of individual components of a strategy is one benefit of our approach: we found that relaxing shell-first constraint in the strategy resulted in significant improvement in the conformance of both C++ and Java students. 
    more » « less