To accelerate software development, much research has been performed to help people understand and reuse the huge amount of available code resources. Two important tasks have been widely studied: code retrieval, which aims to retrieve code snippets relevant to a given natural language query from a code base, and code annotation, where the goal is to annotate a code snippet with a natural language description. Despite their advancement in recent years, the two tasks are mostly explored separately. In this work, we investigate a novel perspective of Code annotation for Code retrieval (hence called “CoaCor”), where a code annotation model is trained to generate a natural language annotation that can represent the semantic meaning of a given code snippet and can be leveraged by a code retrieval model to better distinguish relevant code snippets from others. To this end, we propose an effective framework based on reinforcement learning, which explicitly encourages the code annotation model to generate annotations that can be used for the retrieval task. Through extensive experiments, we show that code annotations generated by our framework are much more detailed and more useful for code retrieval, and they can further improve the performance of existing code retrieval models significantly.
more »
« less
This content will become publicly available on December 5, 2025
Advancing Large Language Models for Spatiotemporal and Semantic Association Mining of Similar Environmental Events
ABSTRACT Retrieval and recommendation are two essential tasks in modern search tools. This paper introduces a novel retrieval‐reranking framework leveraging large language models to enhance the spatiotemporal and semantic associated mining and recommendation of relevant, unusual climate and environmental events described in news articles and web posts. This framework uses advanced natural language processing techniques to address the limitations of traditional manual curation methods in terms of high labor costs and lack of scalability. Specifically, we explore an optimized solution to employ cutting‐edge embedding models for semantically analyzing spatiotemporal events (news) and propose a Geo‐Time Re‐ranking strategy that integrates multi‐faceted criteria including spatial proximity, temporal association, semantic similarity, and category‐instructed similarity to rank and identify similar spatiotemporal events. We apply the proposed framework to a dataset of four thousand local environmental observer network events, achieving top performance on recommending similar events among multiple cutting‐edge dense retrieval models. The search and recommendation pipeline can be applied to a wide range of similar data search tasks dealing with geospatial and temporal data. We hope that by linking relevant events, we can better aid the general public to gain enhanced understanding on climate change and its impact on different communities.
more »
« less
- Award ID(s):
- 1853864
- PAR ID:
- 10558769
- Publisher / Repository:
- Wiley-Blackwell
- Date Published:
- Journal Name:
- Transactions in GIS
- Volume:
- 29
- Issue:
- 1
- ISSN:
- 1361-1682
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Developing a universal model that can efficiently and effectively respond to a wide range of information access requests-from retrieval to recommendation to question answering--has been a long-lasting goal in the information retrieval community. This paper argues that the flexibility, efficiency, and effectiveness brought by the recent development in dense retrieval and approximate nearest neighbor search have smoothed the path towards achieving this goal. We develop a generic and extensible dense retrieval framework, called framework, that can handle a wide range of (personalized) information access requests, such as keyword search, query by example, and complementary item recommendation. Our proposed approach extends the capabilities of dense retrieval models for ad-hoc retrieval tasks by incorporating user-specific preferences through the development of a personalized attentive network. This allows for a more tailored and accurate personalized information access experience. Our experiments on real-world e-commerce data suggest the feasibility of developing universal information access models by demonstrating significant improvements even compared to competitive baselines specifically developed for each of these individual information access tasks. This work opens up a number of fundamental research directions for future exploration.more » « less
-
With the increase in volume of daily online news items, it is more and more difficult for readers to identify news articles relevant to their interests. Thus, effective recommendation systems are critical for an effective user news consumption experience. Existing news recommendation methods usually rely on the news click history to model user interest. However, there are other signals about user behaviors, such as user commenting activity, which have not been used before. We propose a recommendation algorithm that predicts articles a user may be interested in, given her historical sequential commenting behavior on news articles. We show that following this sequential user behavior the news recommendation problem falls into in the class of session-based recommendation. The techniques in this class seek to model users' sequential and temporal behaviors. While we seek to follow the general directions in this space, we face unique challenges specific to news in modeling temporal dynamics, e.g., users' interests shift over time, users comment irregularly on articles, and articles are perishable items with limited lifespans. We propose a recency-regularized neural attentive framework for session-based news recommendation. The proposed method is able to capture the temporal dynamics of both users and news articles, while maintaining interpretability. We design a lag-aware attention and a recency regularization to model the time effect of news articles and comments. We conduct extensive empirical studies on 3 real-world news datasets to demonstrate the effectiveness of our method.more » « less
-
Abstract Satellite precipitation retrieval is inherently an underdetermined inverse problem where additional physical constraints could substantially enhance accuracy. While previous studies have explored static (pixel‐based/spatial‐context‐based) environmental variables at discrete satellite observation times, their temporal dynamic information remains underutilized. Building on our earlier finding that retrieval errors depend on storm progression (event stage), we propose a new, physically interpretable mechanism for improving retrievals, namely, leveraging environmental variables' temporal dynamics as proxies for event stages. Using IMERG satellite product and GV‐MRMS as ground‐truth over CONUS (2018–2020), we first demonstrate robust coevolution patterns of environmental variables and satellite errors throughout events, and show that these variables' temporal gradients reliably infer event stages. We then demonstrate that incorporating these variables and their gradients into a machine‐learning post‐processing framework improves retrieval accuracy. This work inspires and guides more thorough utilization of spatiotemporal atmospheric fields encoding rich physical information within advanced machine‐learning frameworks for further algorithm improvement.more » « less
-
Episodic memories are records of personally experienced events, coded neurally via the hippocampus and sur- rounding medial temporal lobe cortex. Information about the neural signal corresponding to a memory representation can be measured in fMRI data when the pattern across voxels is examined. Prior studies have found that similarity in the voxel patterns across repetition of a to-be-remembered stimulus predicts later memory retrieval, but the results are inconsistent across studies. The current study investigates the possibility that cognitive goals (defined here via the task instructions given to participants) during encoding affect the voxel pattern that will later support memory retrieval, and therefore that neural representations cannot be interpreted based on the stimulus alone. The behavioral results showed that exposure to variable cognitive tasks across repetition of events benefited subsequent memory retrieval. Voxel patterns in the hippocampus indicated a significant interaction between cognitive tasks (variable vs. consistent) and memory (remembered vs. forgotten) such that reduced voxel pattern similarity for repeated events with variable cognitive tasks, but not consistent cognitive tasks, sup- ported later memory success. There was no significant interaction in neural pattern similarity between cognitive tasks and memory success in medial temporal cortices or lateral occipital cortex. Instead, higher similarity in voxel patterns in right medial temporal cortices was associated with later memory retrieval, regardless of cognitive task. In conclusion, we found that the relationship between pattern similarity across repeated encoding and memory success in the hippocampus (but not medial temporal lobe cortex) changes when the cognitive task during encoding does or does not vary across repetitions of the event.more » « less
An official website of the United States government
