Large Language Models (LLMs) have become pivotal in reshaping the world by enabling advanced natural language processing tasks such as document analysis, content generation, and conversational assistance. Their ability to process and generate human-like text has unlocked unprecedented opportunities across different domains such as healthcare, education, finance, and more. However, commercial LLM platforms face several limitations, including data privacy concerns, context size restrictions, lack of parameter configurability, and limited evaluation capabilities. These shortcomings hinder their effectiveness, particularly in scenarios involving sensitive information, large-scale document analysis, or the need for customized output. This underscores the need for a tool that combines the power of LLMs with enhanced privacy, flexibility, and usability. To address these challenges, we present EvidenceBot, a local, Retrieval-Augmented Generation (RAG)-based solution designed to overcome the limitations of commercial LLM platforms. Evidence-Bot enables secure and efficient processing of large document sets through its privacy-preserving RAG pipeline, which extracts and appends only the most relevant text chunks as context for queries. The tool allows users to experiment with hyperparameter configurations, optimizing model responses for specific tasks, and includes an evaluation module to assess LLM performance against ground truths using semantic and similarity-based metrics. By offering enhanced privacy, customization, and evaluation capabilities, EvidenceBot bridges critical gaps in the LLM ecosystem, providing a versatile resource for individuals and organizations seeking to leverage LLMs effectively. 
                        more » 
                        « less   
                    
                            
                            Navigating the Dual Facets: A Comprehensive Evaluation of Sequential Memory Editing in Large Language Models
                        
                    
    
            Memory Editing (ME) has emerged as an efficient method to modify erroneous facts or inject new facts into Large Language Models (LLMs). Two mainstream ME methods exist: parameter-modifying ME and parameter-preserving ME (integrating extra modules while preserving original parameters). Regrettably, previous studies on ME evaluation have two critical limitations: (i) evaluating LLMs with single edit only, neglecting the need for continuous editing, and (ii) evaluations focusing solely on basic factual triples, overlooking broader LLM capabilities like logical reasoning and reading understanding. This study addresses these limitations with contributions threefold: (i) We explore how ME affects a wide range of fundamental capabilities of LLMs under sequential editing. Experimental results reveal an intriguing phenomenon: Most parameter-modifying ME consistently degrade performance across all tasks after a few sequential edits. In contrast, parameter-preserving ME effectively maintains LLMs’ fundamental capabilities but struggles to accurately recall edited knowledge presented in a different format. (ii) We extend our evaluation to different editing settings, such as layers to edit, model size, instruction tuning, etc. Experimental findings indicate several strategies that can potentially mitigate the adverse effects of ME. (iii) We further explain why parameter-modifying damages LLMs from three dimensions: parameter changes after editing, language modeling capability, and the in-context learning capability. Our in-depth study advocates more careful use of ME in real-world scenarios. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2238940
- PAR ID:
- 10629365
- Publisher / Repository:
- Association for Computational Linguistics
- Date Published:
- Page Range / eLocation ID:
- 13755 to 13772
- Format(s):
- Medium: X
- Location:
- Bangkok, Thailand
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            The ability to edit 3D assets with natural language presents a compelling paradigm to aid in the democratization of 3D content creation. However, while natural language is often effective at communicating general intent, it is poorly suited for specifying exact manipulation. To address this gap, we introduce ParSEL, a system that enablescontrollableediting of high-quality 3D assets with natural language. Given a segmented 3D mesh and an editing request, ParSEL produces aparameterizedediting program. Adjusting these parameters allows users to explore shape variations with exact control over the magnitude of the edits. To infer editing programs which align with an input edit request, we leverage the abilities of large-language models (LLMs). However, we find that although LLMs excel at identifying the initial edit operations, they often fail to infer complete editing programs, resulting in outputs that violate shape semantics. To overcome this issue, we introduce Analytical Edit Propagation (AEP), an algorithm which extends a seed edit with additional operations until a complete editing program has been formed. Unlike prior methods, AEP searches for analytical editing operations compatible with a range of possible user edits through the integration of computer algebra systems for geometric analysis. Experimentally, we demonstrate ParSEL's effectiveness in enabling controllable editing of 3D objects through natural language requests over alternative system designs.more » « less
- 
            A significant amount of research is focused on developing and evaluating large language models for a variety of code synthesis tasks. These include synthesizing code from natural language, synthesizing tests from code, and synthesizing explanations of code. In contrast, the behavior of instructional code editing with LLMs is understudied. These are tasks in which the model is provided a block of code and an instruction to modify the code. The editing instruction may ask for a feature to be added or removed, describe a bug and ask for a fix, or ask for a different kind of solution. We introduce a carefully crafted benchmark of code editing tasks and use it to evaluate several cutting edge LLMs. Our evaluation exposes a significant gap between the capabilities of state-of-the-art open and closed models. For example, even GPT-3.5-Turbo is better than the best open model at code editing tasks. We also introduce a new, carefully curated, permissively licensed training dataset of code editing tasks coupled with natural language instructions. Using this training dataset, we show that we can fine-tune open Code LLMs to significantly improve their code editing capabilities, closing the gap between open and closed models. All code, data, and models are available at https://github.com/nuprl/CanItEdit.more » « less
- 
            A significant amount of research is focused on developing and evaluating large language models for a variety of code synthesis tasks. These include synthesizing code from natural language, synthesizing tests from code, and synthesizing explanations of code. In contrast, the behavior of instructional code editing with LLMs is understudied. These are tasks in which the model is provided a block of code and an instruction to modify the code. The editing instruction may ask for a feature to be added or removed, describe a bug and ask for a fix, or ask for a different kind of solution. We introduce a carefully crafted benchmark of code editing tasks and use it to evaluate several cutting edge LLMs. Our evaluation exposes a significant gap between the capabilities of state-of-the-art open and closed models. For example, even GPT-3.5-Turbo is better than the best open model at code editing tasks. We also introduce a new, carefully curated, permissively licensed training dataset of code editing tasks coupled with natural language instructions. Using this training dataset, we show that we can fine-tune open Code LLMs to significantly improve their code editing capabilities, closing the gap between open and closed models. All code, data, and models are available at https://github.com/nuprl/CanItEdit.more » « less
- 
            Large language models (LLMs) have achieved remarkable performance on various natural language tasks. However, they are trained on static corpora and their knowledge can become outdated quickly in the fast-changing world. This motivates the development of knowledge editing (KE) to update specific knowledge in LLMs without changing unrelated others or compromising their pre-trained capabilities. Previous efforts sought to update a small amount of parameters of a LLM and proved effective for making selective updates. Nonetheless, the edited LLM often exhibits degraded ability to reason about the new knowledge. In this work, we identify a key issue: \textit{heterogeneous token overfitting} (HTO), where the LLM overfits different tokens in the provided knowledge at varying rates. To tackle this, we propose {\NAME}, a token-level smoothing method that mitigates HTO by adaptively refining the target distribution. Theoretically, {\NAME} offers better parameter updates with negligible computation overhead. It also induces an implicit DPO but does not require preference data pairs. Extensive experiments across four editing methods, two LLMs, and diverse scenarios demonstrate the effectiveness and versatility of our method.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    