skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, February 13 until 2:00 AM ET on Friday, February 14 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Gao, Ge"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. We study interactive learning of LLM-based language agents based on user edits made to the agent's output. In a typical setting such as writing assistants, the user interacts with a language agent to generate a response given a context, and may optionally edit the agent response to personalize it based on their latent preference, in addition to improving the correctness. The edit feedback is naturally generated, making it a suitable candidate for improving the agent's alignment with the user's preference, and for reducing the cost of user edits over time. We propose a learning framework, PRELUDE that infers a description of the user's latent preference based on historic edit data. The inferred user preference descriptions are used to define prompts for generating responses in the future. This avoids fine-tuning the agent, which is costly, challenging to scale with the number of users, and may even degrade its performance on other tasks. Furthermore, learning descriptive preference improves interpretability, allowing the user to view and modify the learned preference. However, user preference can be complex, subtle, and vary based on context, making it challenging to learn. To address this, we propose a simple yet effective algorithm named CIPHER that leverages the LLM to infer the user preference for a given context based on user edits. In the future, CIPHER retrieves inferred preferences from the k-closest contexts in the history, and forms an aggregate preference for response generation. We introduce two interactive environments -- summarization and email writing, and use a GPT-4 simulated user for evaluation. On both tasks, CIPHER outperforms several baselines by achieving the lowest edit distance cost while only having a small overhead in LLM query cost. Our analysis reports that user preferences learned by CIPHER show significant similarity to the ground truth latent preferences. 
    more » « less
    Free, publicly-accessible full text available December 11, 2025
  2. Free, publicly-accessible full text available December 10, 2025
  3. When seeking information from unfamiliar documents, users frequently pose questions that cannot be answered by the documents. While existing large language models (LLMs) identify these unanswerable questions, they do not assist users in reformulating their questions, thereby reducing their overall utility. We curate CouldAsk, an evaluation benchmark composed of existing and new datasets for document-grounded question answering, specifically designed to study reformulating unanswerable questions. We evaluate state-of-the-art open-source and proprietary LLMs on CouldAsk. The results demonstrate the limited capabilities of these models in reformulating questions. Specifically, GPT-4 and Llama2-7B successfully reformulate questions only 26% and 12% of the time, respectively. Error analysis shows that 62% of the unsuccessful reformulations stem from the models merely rephrasing the questions or even generating identical questions. We publicly release the benchmark and the code to reproduce the experiments. 
    more » « less
    Free, publicly-accessible full text available November 11, 2025
  4. Free, publicly-accessible full text available April 1, 2025
  5. Abstract

    As climate change intensifies, resulting in more severe rainfall events, coastal cities globally are witnessing significant life and property losses. A growingly crucial component for flood prevention and relief are urban storm flood simulations, which aid in informed decision-making for emergency management. The vastness of data and the intricacies of 3D computations can make visualizing the urban flood effects on infrastructure daunting. This study offers a 3D visualization of the repercussions of hurricane storm surge flooding on Galveston, TX residences, illustrating the impact on each structure and road across varied storm conditions. We employ target detection to pinpoint house door locations, using door inundation as a metric to gauge potential flood damage. Within a GIS-based framework, we model the damage scope for residences exposed to varying storm intensities. Our research achieves three core goals: 1) Estimating the storm inundation levels on homes across different storm conditions; 2) Assessing first-floor elevations to categorize housing damages into three distinct groups; and 3) Through visualization, showcasing the efficacy of a proposed dike designed to shield Galveston Island from future storm surge and flood events.

     
    more » « less
  6. This study explores the affective responses and newsworthiness perceptions of generative AI for visual journalism. While generative AI offers advantages for newsrooms in terms of producing unique images and cutting costs, the potential misuse of AI-generated news images is a cause for concern. For our study, we designed a 3-part news image codebook for affect-labeling news images based on journalism ethics and photography guidelines. We collected 200 news headlines and images retrieved from a variety of U.S. news sources on the topics of gun violence and climate change, generated corresponding news images from DALL-E 2 and asked annotators their emotional responses to the human-selected and AI-generated news images following the codebook. We also examined the impact of modality on emotions by measuring the effects of visual and textual modalities on emotional responses. The findings of this study provide insights into the quality and emotional impact of generative news images produced by humans and AI. Further, results of this work can be useful in developing technical guidelines as well as policy measures for the ethical use of generative AI systems in journalistic production. The codebook, images and annotations are made publicly available to facilitate future research in affective computing, specifically tailored to civic and public-interest journalism. 
    more » « less