While developing a story, novices and published writers alike have had to look outside themselves for inspiration. Language models have recently been able to generate text fluently, producing new stochastic narratives upon request. However, effectively integrating such capabilities with human cognitive faculties and creative processes remains challenging. We propose to investigate this integration with a multimodal writing support interface that offers writing suggestions textually, visually, and aurally. We conduct an extensive study that combines elicitation of prior expectations before writing, observation and semi-structured interviews during writing, and outcome evaluations after writing. Our results illustrate individual and situational variation in machine-in-the-loop writing approaches, suggestion acceptance, and ways the system is helpful. Centrally, we report how participants perform integrative leaps , by which they do cognitive work to integrate suggestions of varying semantic relevance into their developing stories. We interpret these findings, offering modeling and design recommendations for future creative writing support technologies.
more »
« less
Adaptively profiling models with task elicitation
Language model evaluations often fail to characterize consequential failure modes, forcing experts to inspect outputs and build new benchmarks. We introduce task elicitation, a method that automatically builds new evaluations to profile model behavior. Task elicitation finds hundreds of natural-language tasks—an order of magnitude more than prior work—where frontier models exhibit systematic failures, in domains ranging from forecasting to online harassment. For example, we find that Sonnet 3.5 over-associates quantum computing and AGI and that o3-mini is prone to hallucination when fabrications are repeated in-context.
more »
« less
- Award ID(s):
- 2442421
- PAR ID:
- 10675370
- Publisher / Repository:
- Association for Computational Linguistics
- Date Published:
- Page Range / eLocation ID:
- 24996 to 25031
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Temporal Logic (TL) bridges the gap between natural language and formal reasoning in the field of complex systems verification. However, in order to leverage the expressivity entailed by TL, the syntax and semantics must first be understood—a large task in itself. This significant knowledge gap leads to several issues: (1) the likelihood of adopting a TL-based verification method is decreased, and (2) the chance of poorly written and inaccurate requirements is increased. In this ongoing work, we present the Pythonic Formal Requirements Language (PyFoReL) tool: a Domain-Specific Language inspired by the programming language Python to simplify the elicitation of TL-based requirements for engineers and non-experts.more » « less
-
Arapaho is an endangered Native American language with fewer than 100 fluent speakers, all elderly. The age and health of the speakers limit our ability to do traditional on-the-ground documentation of language in relation to geography and space. In this project, a virtual reality (VR) elicitation process was developed in collaboration with elders of the Northern Arapaho Language and Culture Commission. This new place-sensitive method of linguistic documentation uses aerial drone video and 8K resolution, three-dimensional (3D), 360-degree panoramas to visually and auditorily immerse elder consultants in physical locations oriented around the traditional Arapaho worldview. This method virtually transports elder speakers of Indigenous languages into places and contexts that may be physically impossible for them to visit in person, allowing them to recall place-based cultural and ecological knowledge. Analysis of interview data resulting from VR elicitation shows it to be comparable to other language documentation techniques in terms of the quality of the data while also possessing unique attributes and utility.more » « less
-
Transfer learning using ImageNet pre-trained models has been the de facto approach in a wide range of computer vision tasks. However, fine-tuning still requires task-specific training data. In this paper, we propose N3 (Neural Networks from Natural Language) - a new paradigm of synthesizing task-specific neural networks from language descriptions and a generic pre-trained model. N3 leverages language descriptions to generate parameter adaptations as well as a new task-specific classification layer for a pre-trained neural network, effectively “fine-tuning” the network for a new task using only language descriptions as input. To the best of our knowledge, N3 is the first method to synthesize entire neural networks from natural language. Experimental results show that N3 can out-perform previous natural-language based zero-shot learning methods across 4 different zero-shot image classification benchmarks. We also demonstrate a simple method to help identify keywords in language descriptions leveraged by N3 when synthesizing model parameters.more » « less
-
We address the problem of end-to-end visual storytelling. Given a photo album, our model first selects the most representative (summary) photos, and then composes a natural language story for the album. For this task, we make use of the Visual Storytelling dataset and a model composed of three hierarchically-attentive Recurrent Neural Nets (RNNs) to: encode the album photos, select representative (summary) photos, and compose the story. Automatic and human evaluations show our model achieves better performance on selection, generation, and retrieval than baselines.more » « less
An official website of the United States government

