<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue</title></titleStmt>
			<publicationStmt>
				<publisher>Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing</publisher>
				<date>11/11/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10557036</idno>
					<idno type="doi"></idno>
					
					<author>Jia-Chen Gu</author><author>Hao-Xiang Xu</author><author>Jun-Yu Ma</author><author>Pan Lu</author><author>Zhen-Hua Ling</author><author>Kai-Wei Chang</author><author>Nanyun Peng</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Model editing is a technique that edits the large language models (LLMs) with updated knowledge to alleviate hallucinations without resource-intensive retraining. While current model editing methods can effectively modify a model's behavior within a specific area of interest, they often overlook the potential unintended side effects on the general abilities of LLMs such as reasoning, natural language inference, and question answering. In this paper, we raise concerns that model editing's improvements on factuality may come at the cost of a significant degradation of the model's general abilities. We systematically analyze the side effects by evaluating four popular editing methods on three LLMs across eight representative tasks. Our extensive empirical experiments show that it is challenging for current editing methods to simultaneously improve factuality of LLMs and maintain their general abilities. Our analysis reveals that the side effects are caused by model editing altering the original model weights excessively, leading to overfitting to the edited facts. To mitigate this, a method named RECT is proposed to regularize the edit update weights by imposing constraints on their complexity based on the RElative Change in weighT. Evaluation results show that RECT can significantly mitigate the side effects of editing while still maintaining over 94% editing performance 1 . * Equal contribution.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>As real-world knowledge is dynamically increasing and updating, existing large language models (LLMs) need to constantly incorporate the inherit knowledge and up-to-date information for lifelong learning. Despite continual training, LLMs inevitably manifest hallucinations caused by missing, false or outdated knowledge embedded in Figure <ref type="figure">1</ref>: Demonstration of model editing and its impact on the general abilities of LLMs. Although the factuality of the model has been improved, the general abilities of LLMs, such as question answering, dialogue, named entity recognition, sentiment analysis, have been substantially impaired after editing. f &#952; / f &#952;e denotes the models before / after editing. their parameters <ref type="bibr">(Zhang et al., 2023;</ref><ref type="bibr">Peng et al., 2023;</ref><ref type="bibr">Ji et al., 2023)</ref>. Due to the intensive computational cost of retraining LLMs, researchers have increasingly focused on model editing (a.k.a., knowledge editing) <ref type="bibr">(Sinitsin et al., 2020;</ref><ref type="bibr">Cao et al., 2021;</ref><ref type="bibr">Dai et al., 2022;</ref><ref type="bibr">Mitchell et al., 2022b;</ref><ref type="bibr">Meng et al., 2022</ref><ref type="bibr">Meng et al., , 2023;;</ref><ref type="bibr">Yao et al., 2023;</ref><ref type="bibr">Zhong et al., 2023;</ref><ref type="bibr">Ma et al., 2023;</ref><ref type="bibr">Zhang et al., 2024)</ref>. This task is to efficiently modify a model's behavior within a specific area of interest through targeted interventions without resource-intensive model retraining.</p><p>At present, the assessment of editing methods typically involves evaluation along three critical dimensions <ref type="bibr">(Yao et al., 2023)</ref>. First, reliability ensures the edited model can accurately recall the specific edited fact. Second, generalization validates the adaptability of the edited model by assessing the model's ability to recall the fact under diverse paraphrase prompts. Finally, locality checks if the edited model's output for unrelated inputs remains consistent after editing. These multifaceted criteria collectively contribute to a nuanced understanding of the effectiveness and robustness of editing methods.</p><p>In this paper, we put forward a critical concern regarding the overall robustness and adaptability of edited models. As shown in Figure <ref type="figure">1</ref>, while model editing methods have demonstrated improved factuality, it may come at the significant cost of the general abilities of LLMs such as summarization, question answering (QA), natural language inference. We argue that improving model factuality must be balanced with the need to maintain effectiveness across a range of abilities.</p><p>In light of the above issues, we systematically study if model editing hurts the general abilities of LLMs. This work studies model editing in the single-versus sequential-editing and instanceversus batch-editing settings. The edited models are evaluated on a variety of downstream tasks to see if there are any side effects on performance before and after editing. Extensive empirical experiments are conducted on four popular editing methods: KN <ref type="bibr">(Dai et al., 2022)</ref>, MEND <ref type="bibr">(Mitchell et al., 2022a)</ref>, ROME <ref type="bibr">(Meng et al., 2022)</ref>, and MEMIT <ref type="bibr">(Meng et al., 2023)</ref> applied to three representative LLMs: GPT-2 XL (1.5B) <ref type="bibr">(Radford et al., 2019)</ref>, LLaMA-1 (7B) <ref type="bibr">(Touvron et al., 2023a)</ref>, and LLaMA-2 (7B) <ref type="bibr">(Touvron et al., 2023b)</ref>. Eight representative tasks including reasoning <ref type="bibr">(Cobbe et al., 2021)</ref>, natural language inference <ref type="bibr">(Dagan et al., 2005)</ref>, open-domain QA <ref type="bibr">(Kwiatkowski et al., 2019)</ref>, closed-domain QA <ref type="bibr">(Clark et al., 2019)</ref>, dialogue <ref type="bibr">(Cui et al., 2020)</ref>, summarization <ref type="bibr">(Gliwa et al., 2019)</ref>, named entity recognition <ref type="bibr">(Sang and Meulder, 2003)</ref>, and sentiment analysis <ref type="bibr">(Socher et al., 2013)</ref> are employed to understand the impact of model editing on the general abilities of LLMs.</p><p>Experimental results show that existing LLMs are not robust to weight perturbations, and editing even a few parameters can significantly affect their general abilities. Strikingly, with a single pass of editing involving less than 1% parameters, LLaMA-1 (7B) exhibited a drastic performance degradation to nearly 0 on all the tasks we tried. These results demonstrate that current editing algorithms struggle to work effectively in tandem with LLMs to simultaneously improve model factuality and maintain general abilities.</p><p>Furthermore, our analysis of the causes of side effects reveals that current model editing methods change the original model weights too much, resulting in overfitting to new editing facts. The accumulation of overfitting across multiple edits can amplify the negative impact on the general abilities of LLMs. As a result, the edited model can recall new editing facts well but fails to generalize to various downstream tasks. To this end, we design a regularization method named RECT (RElative Change in weighT) to prevent overfitting. Basically, this regularization discourages overly complex editing updates that are more likely to overfit. Specifically, the top-k% elements in an edit update weight that change the most according to relative change in weights are considered as the principal editing information and keep their original values. While for the remaining elements in this edit update weight, they are treated as minor contributions to editing and set to zero for regularization. Evaluation results show that the edited models regularized by RECT can effectively mitigate the side effects of editing while still maintaining over 94% editing performance</p><p>In summary, we demonstrate that although model editing is effective in updating parametric knowledge in a resource-efficient and targetspecific way, current methods still have significant flaws in preserving the general abilities of LLMs. Existing research on model editing excessively pursued altering a model's behavior under specific knowledge while overlooked the premise of not compromising general abilities. This paper points out the urgent shortcomings in model editing and proposes a regularization method to prevent overfitting across multiple edits to rescue the general abilities, calling for follow-up research efforts on trustworthy and robust model editing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>Many studies have investigated model editing, including memory-based, meta-learning, and locatethen-edit <ref type="bibr">(Wang et al., 2024a;</ref><ref type="bibr">Yao et al., 2023)</ref>. Memory-based methods do not modify model weights but store the editing facts with an external memory <ref type="bibr">(Mitchell et al., 2022b;</ref><ref type="bibr">Zhong et al., 2023)</ref>. <ref type="bibr">Mitchell et al. (2022b)</ref> stored edits in a base model and learned to reason over them to adjust its predictions as needed. The latter two classes of methods are developed to directly modify the internal parameters of models, which is the focus of this paper. On the one hand, meta-learning methods train a hypernetwork to get gradient changes to update model parameters <ref type="bibr">(Cao et al., 2021;</ref><ref type="bibr">Mitchell et al., 2022a)</ref>. <ref type="bibr">Cao et al. (2021)</ref> utilized a hypernetwork to predict parameter shift at test time. <ref type="bibr">Mitchell et al. (2022a)</ref> learned to transform the finetuning gradient into a low-rank decomposition of the gradient. On the other hand, locate-then-edit methods first locate knowledge neurons in LLMs that exhibit a positive correlation with a knowledge expression, and then modify them accordingly <ref type="bibr">(Dai et al., 2022;</ref><ref type="bibr">Meng et al., 2022</ref><ref type="bibr">Meng et al., , 2023))</ref>. In particular, <ref type="bibr">Dai et al. (2022)</ref> computed the contribution of each neurons to a certain knowledge, then updated or erased knowledge by modifying these neurons with the embedding vectors of facts. <ref type="bibr">Meng et al. (2022)</ref> located multi-layer perceptron (MLP) storing factual knowledge, and then edited such knowledge by injecting new key-value pair in the MLP module. Besides, some works investigate the evaluation paradigm for model editing <ref type="bibr">(Zhong et al., 2023;</ref><ref type="bibr">Cohen et al., 2024;</ref><ref type="bibr">Ma et al., 2023;</ref><ref type="bibr">Li et al., 2024;</ref><ref type="bibr">Hase et al., 2023;</ref><ref type="bibr">Wu et al., 2023a;</ref><ref type="bibr">Gandikota et al., 2023;</ref><ref type="bibr">Wang et al., 2024b)</ref>. For example, <ref type="bibr">Cohen et al. (2024)</ref> introduced the ripple effects of editing, suggesting that editing a particular fact implies that many other facts need to be updated. Additionally, recent works have also applied editing in various domains, such as changing model personality <ref type="bibr">(Mao et al., 2023)</ref>, editing multimodal models <ref type="bibr">(Cheng et al., 2023)</ref>, protecting users privacy <ref type="bibr">(Wu et al., 2023b), etc.</ref> A main difference between this work and previous related studies should be highlighted. These approaches target at designing editing algorithms to improve or evaluation paradigms to assess the editing performance. In contrast, this study rethinks model editing and explores if current editing methods inadvertently cause the potential side effects on the underlying general abilities of LLMs. The contemporaneous work <ref type="bibr">(Gupta et al., 2024)</ref> presents a similar finding that model editing at scale leads to catastrophic forgetting, but no mitigation method is proposed. To the best of our knowledge, this paper makes the first call for attention to side effects on a variety of tasks beyond editing performance by presenting a systematical evaluation of four editing methods on three LLMs covering eight tasks. Besides, we also analyze the causes of side effects, and propose a regularization method to prevent editing overfitting.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Preliminary</head><p>Model editing involves modifying the memorized facts contained in LMs without retraining to better suit specific tasks or requirements. Various kinds of complex learned beliefs such as logical, spatial, or numerical knowledge are expected to be edited.</p><p>In this paper, we study editing factual knowledge in the form of (subject s, relation r, object o), e.g., (s = United States, r = President of, o = Donald Trump). An LM is expected to recall a memory and predict the next token(s) representing o given a natural language prompt p(s, r) such as "The President of the United States is". Editing a fact is to insert a new knowledge triple (s, r, o * ) in place of the current one (s, r, o), where these two triples share the same subject and relation. An editing operation is represented as e = (s, r, o, o * ) for brevity. Given a set of editing facts E = {e 1 , e 2 , . . .} and a model f , model editing involves learning a function K that yields an edited LM f</p><p>To evaluate the effectiveness of editing methods, previous works focus on evaluation along three dimensions <ref type="bibr">(Cao et al., 2021;</ref><ref type="bibr">Mitchell et al., 2022a;</ref><ref type="bibr">Meng et al., 2022</ref><ref type="bibr">Meng et al., , 2023))</ref>. First and foremost is reliability, aiming to ascertain the ability of the edited model to accurately recall the specific editing facts. The second dimension generalization seeks to validate the adaptability of the edited model by assessing its ability to recall the editing facts under diverse paraphrase prompts. The last dimension locality (a.k.a., specificity) is employed to verify the stability of the edited model by examining whether its output for unrelated inputs remains consistent after editing. Due to limited space, readers can refer to Appendix A for more detailed explanations and examples of the evaluation metrics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Analysis of Side Effects of Editing</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Evaluation Paradigm</head><p>This paper systematically studies the side effects of model editing in the single-versus sequentialediting and instance-versus batch-editing settings. Figure <ref type="figure">2</ref> illustrates these experimental settings. The edited models are evaluated under the zero-shot setting on a variety of downstream tasks unrelated to editing facts to understand the performance before and after editing.</p><p>Single-vs. Sequential-editing Single-editing involves examining the reliability and impact of making a single editing operation to a model. Specifically, it focuses on understanding how a model adapts to such a single alteration, and the implicit effect of such specific modifications on the overall performance. It is worth noting that a single editing operation can contain either only one editing instance or multiple ones in a batch, which is further discussed later. In practice, there are often situations where only a particular change is needed, so it's crucial to understand how effectively the model integrates and preserves that individual edit. Therefore, evaluating the robustness to a single edit is crucial in determining its ability to retain the intended changes and overall performance.</p><p>In contrast to single-editing, multiple editing operations are conducted successively in sequentialediting <ref type="bibr">(Huang et al., 2023)</ref>. Similarly, each editing operation in sequential-editing can also contain either only one editing instance or multiple ones in a batch. Ideally, models should retain the changes from previous edits when carrying out a new one <ref type="bibr">(Yao et al., 2023)</ref>, which is decisive for the continual learning of future LLMs. Therefore, whether edited models can still maintain its general abilities after sequential editing is one of the important characteristics that should be considered. For this analysis, how the performance of edited models on a variety of tasks changes as the number of edits increases will be explored.</p><p>Instance-vs. Batch-editing Instance-editing refers to using only one instance per editing operation to make specific and targeted adjustments to individual pieces of knowledge within LLMs, regardless of the single-or sequential-editing settings. This setting is particularly valuable in situations where certain instances present unique challenges or outliers that require specialized treatment. These fine-grained alterations to model behaviors over individual instances are expected to contribute to more adaptable and accurate LLMs.</p><p>The real world is ever-changing, so there is a huge amount of knowledge that needs to be dynamically added and updated into LLMs. Despite the effectiveness of many instance-editing methods <ref type="bibr">(Dai et al., 2022;</ref><ref type="bibr">Meng et al., 2022;</ref><ref type="bibr">Ma et al., 2023)</ref>, ultimately at most a few dozens of pieces of knowledge can be updated <ref type="bibr">(Mitchell et al., 2022b)</ref>, due to their relatively low but still nonnegligible editing cost for a single instance. Since naive sequential applications of current state-of-theart model editing methods fail to scale up <ref type="bibr">(Meng et al., 2023)</ref>, one may wish to update hundreds or thousands of facts simultaneously in batch-editing. Notably, batch-editing can also be coupled with both the single-or sequential-editing settings.</p><p>Zero-shot Learning Zero-shot learning aims to solve tasks without labeled training examples, and recent studies have demonstrated the superiority of LLMs for zero-shot learning <ref type="bibr">(Brown et al., 2020;</ref><ref type="bibr">Wei et al., 2022;</ref><ref type="bibr">Chowdhery et al., 2023)</ref>. Following these studies, we explore the zero-shot performance of unedited and edited models on a variety of tasks. Given a task instruction and a test problem that are concatenated as the input, the model is expected to generate a target text to address this problem. The instructions and input formats of different tasks are shown in Appendix B, which are taken from or inspired by <ref type="bibr">Qin et al. (2023)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Evaluation Setup</head><p>We briefly introduced the experimental setup regarding editing methods, editing datasets, selected LLMs, and representative tasks here. Readers can refer to their corresponding papers for more details.</p><p>Editing Methods Four popular editing methods were selected: (1) KN <ref type="bibr">(Dai et al., 2022)</ref> involved identifying neurons linked to knowledge expression using gradient-based attributions and then enhancing the MLP layer by adding scaled embedding vectors to those specific neurons. ( <ref type="formula">2</ref> amounts of factual data through the updating of a sequence of MLP layers. It is notable that only MEND and MEMIT support batch-editing. All experiments were conducted using the EasyEdit tool <ref type="bibr">(Wang et al., 2024a)</ref>, ensuring standardized and reproducible evaluation. All editing instances were randomly sampled from the editing dataset.</p><p>Readers can refer to Appendix C for details of these editing methods.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Editing Dataset</head><p>The popular model editing dataset Zero-Shot Relation Extraction (ZSRE) <ref type="bibr">(Levy et al., 2017)</ref> used in previous work <ref type="bibr">(Cao et al., 2021;</ref><ref type="bibr">Meng et al., 2022;</ref><ref type="bibr">Yao et al., 2023</ref>) was adopted in our experiments. ZSRE is a QA dataset using question rephrasings generated by back-translation as the equivalence neighborhood. Each input is a question about an entity, and plausible alternative edit labels are sampled from the top-ranked predictions of a BART-base model trained on ZSRE.</p><p>Selected LLMs Experiments were conducted on three LLMs: GPT-2 XL (1.5B) <ref type="bibr">(Radford et al., 2019)</ref>, LLaMA-1 (7B) <ref type="bibr">(Touvron et al., 2023a)</ref>, and LLaMA-2 (7B) <ref type="bibr">(Touvron et al., 2023b)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Downstream Tasks and Metrics</head><p>To extensively explore whether model editing has side effects on the general abilities of LLMs, eight representative tasks were adopted: (1) Reasoning on the GSM8K <ref type="bibr">(Cobbe et al., 2021)</ref>, and the results were measured by solve rate. (2) Natural language inference (NLI) on the RTE <ref type="bibr">(Dagan et al., 2005)</ref>, and the results were measured by accuracy of two-way classification. (3) Open-domain QA on the Natural Question <ref type="bibr">(Kwiatkowski et al., 2019)</ref>, and the results were measured by exact match (EM) with the reference answer after minor normalization as in <ref type="bibr">Chen et al. (2017)</ref> and <ref type="bibr">Lee et al. (2019)</ref>. ( <ref type="formula">4</ref>) Closed-domain QA on the BoolQ <ref type="bibr">(Clark et al., 2019)</ref>, and the results were also measured by EM. (5) Dialogue on the MuTual <ref type="bibr">(Cui et al., 2020)</ref>, and the results were measured by selecting one best-matched response from four available candidates, denoted as Recall 4 @1 as in <ref type="bibr">Lowe et al. (2015)</ref>. ( <ref type="formula">6</ref>) Summarization on the SAMSum <ref type="bibr">(Gliwa et al., 2019)</ref>, and the results were measured by the average of ROUGE-1, ROUGE-2 and ROUGE-L as in <ref type="bibr">Lin (2004)</ref>. ( <ref type="formula">7</ref>) Named entity recognition (NER) on the <ref type="bibr">CoNLL03 (Sang and Meulder, 2003)</ref>, and the results were measured by entity-level F1score. ( <ref type="formula">8</ref>) Sentiment analysis on the SST2 <ref type="bibr">(Socher et al., 2013)</ref>, and the results were measured by accuracy of two-way classification.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Results</head><p>Impact of Sequential-editing Since singleediting can be regarded as a special case of sequential-editing when the number of edits is 1, this subsection mainly discussed instance-and sequential-editing. KN and ROME that support instance-editing but not batch-editing were adopted to facilitate this exploration. MEND and MEMIT that support batch-and sequential-editing will be explored later in this subsection. Figure <ref type="figure">3</ref> presents the performance on general tasks of edited models using KN or ROME to edit GPT-2 XL and LLaMA-1 (7B) as the number of edits increases. Due to limited space, readers can refer to Appendix D for the results of editing LLaMA-2 (7B) which show similar trends. It can be seen that although there is only one instance per editing operation, the performance of edited models on various tasks fluctuates significantly and shows a downward trend as the number of edits increases. Strikingly, the use of KN resulted in a drastic performance degradation to nearly zero on all selected tasks with just a single edit. These findings underscore two key insights. First, the selected LLMs are not robust to weight perturbations even if less than 1% of the parameters are edited, whereby slight perturbations may significantly affect their general abilities. Second, these outcomes also shed light on the challenging nature of effectively coupling current editing algorithms with LLMs. The difficulty lies in the dual objective of improving model factuality while simultaneously maintaining their general abilities. The observed trends indicate that existing editing algorithms face grand challenges in achieving this delicate balance, emphasizing the need for further research and development in the refinement of editing methodologies for LLMs.</p><p>Impact of Batch-editing This subsection delved into batch-and single-editing to explore the impact of batch size for scaling up the editing scope. Only MEND and MEMIT that supported batch-editing were adopted to facilitate this exploration. Figure <ref type="figure">4</ref> presents the performance on general tasks of edited models using MEND or MEMIT to edit GPT-2 XL and LLaMA-1 (7B) with different batch sizes.</p><p>Readers can refer to Appendix D for the results of editing LLaMA-2 (7B). Remarkably, even with only one single editing operation, edited models exhibited a trend of performance degradation as the batch size increases in most cases. This consistent decrease in performance underlines the sensitivity of the models to increases in batch size, emphasizing the significance of carefully scaling knowledge editing for optimal updates. Therefore, we call for more research work on scalable editing to facilitate efficient editing of multiple instances.</p><p>Impact of Batch-and Sequential-editing In order to holistically take into account the interplay between batch size and sequential-editing, a joint setting of batch-and sequential-editing was explored to understand how these two factors collaboratively influence the overall performance of edited models. Figure <ref type="figure">11</ref> to Figure <ref type="figure">16</ref> in Appendix D present the performance of using MEND or MEMIT to edit GPT-2 XL, LLaMA-1 (7B) and LLaMA-2 (7B) respectively as the number of edits increases. These results also echo our observations on sequential-editing, and those on batch-editing respectively. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Analysis of Causes of Side Effects</head><p>We show that the side effects of editing come from changing the original model weights too much, resulting in overfitting to the editing facts. This phenomenon can be illustrated through statistics and visualization using ROME to edit GPT-2 XL.</p><p>Statistics We first show how the weights change in instance-and single-editing. Typically, one editing operation is to add an edit update weight &#8710;W to the original weight W , where &#8710;W is calculated aiming to insert new editing facts. Here, we define the absolute value of the relative change in weight &#948; = | &#8710;W W | to characterize the degree of change of each element in the update weight &#8710;W . The statistics show that only 20% of the elements in the update weight &#8710;W have &#948; greater than 0.077, while only 10% of the elements have &#948; greater than 0.171. These results are averaged by 100 random single edits in the ZsRE dataset. It can be seen that the update weight &#8710;W might be quite sparse, while most elements in &#8710;W are minor. Manhattan distance between the updated and original weights is also calculated as a measurement of distinction.</p><p>Furthermore, how the weights change in instance-and sequential-editing is shown in Table <ref type="table">1</ref>. As the number of edits increases, the proportion of elements whose &#948; is greater than a certain threshold increases significantly, and the weight is also more differentiated than the original weight. Therefore, the accumulation of overfitting across multiple edits can amplify changes to the original weights. Visualization The distinction between the final edited weight and the original unedited weight is illustrated by visualizing the weight change |&#8710;W | as shown in Figure 5. It reveals the consistent findings that the update weight &#8710;W might be quite sparse, while the accumulation across multiple edits can amplify changes to the original weights. 5 RECT: RElative Change in weighT 5.1 Approach</p><p>We have analyzed the causes of side effects in Section 4.4 that model editing changes the original model weights too much, resulting in overfitting to the editing facts. This type of editing overfitting occurs when a model learns to fit the new editing data too closely, capturing noise and outliers in the data rather than the underlying patterns. Furthermore, the gradual buildup of editing-induced overfitting across sequential edits can severely impair the general abilities of LLMs. Consequently, while such models may exhibit proficiency in the new editing facts, they often struggle to generalize effectively across a spectrum of downstream tasks. This phenomenon underscores the importance of mitigating overfitting during the editing process to ensure both the improvement of model factuality and the maintenance of their general abilities.</p><p>To this end, this paper designs a regularization method named RElative Change in weighT (RECT) to prevent editing overfitting. Figure <ref type="figure">6</ref> illustrates the overview of this regularization method. Typically, one editing operation is to add an edit update weight &#8710;W to the original weight W</p><p>2 6 5 8 3 4 1 9 8 5 6 1 6 4 3 2 0.02 0.6 0.05 0.08 0.03 0.04 0.1 0.09 0.8 0.05 0.06 0.01 0.06 0.04 0.03 0.2 2.02 6.6 5.05 8.08 3.03 4.04 1.1 9.09 8.8 5.05 6.06 1.01 6.06 4.04 3.03 2.2 0 0.6 0 0 0 0 0.1 0 0.8 0 0 0 0 0 0 0.2 2 6.6 5 8 3 4 1.1 9 8.8 5 6 1 6 4 3 2.2 Original Weight W Unregularized Update Weight &#916;W Updated Weight &#119882; Original Weight W Regularized Update Weight &#916;&#119882; Updated Weight &#119882; (a) Non-regularization 2 6 5 8 3 4 1 9 8 5 6 1 6 4 3 2 (b) RECT Regularization to derive the updated weight W , where &#8710;W is calculated aiming to insert a batch of N new editing facts {(s, r, o * ) i } N i=1 (N = 1 for a single editing fact). Formally, we have:</p><p>where function f denotes the calculation method of update weight &#8710;W for different editing methods, e.g., ROME <ref type="bibr">(Meng et al., 2022)</ref>.</p><p>Here, we define the absolute value of the relative change in weight &#948; = | &#8710;W W | to characterize the degree of change of each element in the update weight &#8710;W . To some extent, &#948; can be used to indicate the importance of each element in &#8710;W when inserting the new editing facts. On the one hand, a portion of the elements in the update weight W are assumed to constitute the core components of the new editing facts. On the other hand, the remaining elements are assumed minor contributions to editing. Specifically, the top-k% elements in &#8710;W that change the most according to &#948; are considered as the principal editing information and keep their original values. While for the remaining elements in &#8710;W , they are treated as minor contributions to editing and set to zero for regularization. Mathematically, we have the regularized edit update weight &#8710;W as:</p><p>Finally, the regularized edit update weight &#8710;W is added to the original weight to derive the regularized updated weight. Essentially, RECT functions</p><p>Reliability Generalization Locality 0.0 0.2 0.4 0.6 0.8 1.0 Editing Performance Unregularized RECT top80% RECT top60% RECT top40% RECT top20% Random 40% PCA 40% (a) ROME on GPT-2 XL Reliability Generalization Locality 0.0 0.2 0.4 0.6 0.8 1.0 Editing Performance Unregularized RECT top80% RECT top60% RECT top40% RECT top20% Random 40% PCA 40% (b) ROME on LLaMA-1 (7B) Reliability Generalization Locality 0.0 0.2 0.4 0.6 0.8 1.0 Editing Performance Unregularized RECT top80% RECT top60% RECT top40% RECT top20% Random 40% PCA 40% (c) MEMIT on GPT-2 XL Reliability Generalization Locality 0.0 0.2 0.4 0.6 0.8 1.0 Editing Performance Unregularized RECT top80% RECT top60% RECT top40% RECT top20% Random 40% PCA 40% (d) MEMIT on LLaMA-1 (7B) Figure 7: Comparison of introducing various regularization methods and how the editing performance change with respect to different top-k% for RECT. 0 5 10 15 20 Number of edits 0.0 0.1 0.2 Downstream Task Performance Unregularized RECT top80% RECT top60% RECT top40% RECT top20% Random 40% PCA 40% (a) Summarization 0 5 10 15 20 Number of edits 0.0 0.1 0.2 0.3 Downstream Task Performance Unregularized RECT top80% RECT top60% RECT top40% RECT top20% Random 40% PCA 40% (b) Open-domain QA 0 5 10 15 20 Number of edits 0.0 0.2 0.4 0.6 Downstream Task Performance Unregularized RECT top80% RECT top60% RECT top40% RECT top20% Random 40% PCA 40% (c) Closed-domain QA 0 5 10 15 20 Number of edits 0.0 0.2 0.4 0.6 Downstream Task Performance Unregularized RECT top80% RECT top60% RECT top40% RECT top20% Random 40% PCA 40% (d) Sentiment Analysis to deter the implementation of excessively intricate editing updates that have a higher propensity to result in overfitting. By imposing constraints on the complexity of editing updates, it serves as a safeguard against the model's inclination to adapt too closely to the editing data, thus promoting more generalizable and reliable model performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Regularization Baselines</head><p>To demonstrate the effectiveness and efficiency of the proposed method RECT, we compared it with the following baselines, including: Unregularized keeps the full elements of the edit update weight &#8710;W . Random k% selects the random k% elements of &#8710;W . PCA k% compresses the most important editing information in &#8710;W into k% elements via principal component analysis (PCA), and sets the remaining elements to zero.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Results of RECT</head><p>The effectiveness of a regularization method should be illustrated from two perspectives. First, regularizing the edit update weight should not harm its editing performance, i.e., edited models should still remember the new editing facts and generalize to related facts. Second, the regularized edited models should be able to preserve the general abilities compared with unregularized ones.</p><p>Editing Performance Figure <ref type="figure">7</ref> presents the the results of regularizing ROME or MEMIT on GPT-2 XL or LLaMA-1 (7B</p><p>). Readers can refer to Appendix D.2 for the results on LLaMA-2 (7B). From these results we can have the following findings. First, compared with unregularized &#8710;W , RECT that keeps the original values of an appropriate amount of top-40% elements in &#8710;W and sets the remaining elements to zero can help maintain over 94% majority of reliability and generalization, and even improve locality. It is natural that reliability and generalization slightly drop when setting partial elements to zero since partial editing information is removed. The reason why locality is improved is probably because those elements with low &#948; corresponding to some noise and outliers in the editing data are removed to prevent from editing overfitting, so the edited models are more robust. However, setting excessive elements in &#8710;W to zero, e.g., RECT top-20%, might hurt the editing performance as partial important editing information is accidentally removed. Furthermore, compared with Random 40% and PCA 40%, RECT top-40% achieves the best performance, indicating its effectiveness in selecting the most principal editing information. It is notable that RECT also exhibits advantages in terms of efficiency, since it eliminates the complex calculations required in PCA. General Downstream Task Performance Figure 8 presents how the downstream task performance change with respect to introducing various regularization methods to edit GPT-2 XL. Readers can refer to Appendix D.2 for the results on more downstream tasks. From these results we can have the following findings. As the proportion of elements in &#8710;W set to 0 increases, the more editing overfitting is regularized, the smaller the change to the original weight, so the general abilities can be more preserved. Results show that regularized edited models are able to preserve the general abilities compared with unregularized ones in most tasks such as summarization, open-and closeddomain QA. It is worth noting that it still poses a challenge for some tasks such as sentiment analysis, and remains unclear whether it works for larger number of edits, which will be left to future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusion</head><p>Model updating technology has been catalyzing the continuous iteration of advanced and trustworthy LLMs. This paper studies model editing and for the first time raises concerns whether model editing has any side effects on the general abilities of LLMs. The systematical evaluation reveals that current methods unintentionally hurt the general abilities of LLMs no matter in instance-or batchediting, and single-or sequential-editing. Our analysis of the causes reveals that model editing results in overfitting to the editing facts, and the accumulation of overfitting across multiple edits can amplify the negative impact. The proposed RECT regularization method has been proven to effectively prevent overfitting of new editing facts, thus preserving both the editing and general downstream task performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Impact Statements</head><p>As LLMs play an increasingly crucial role in various applications, mitigating the hallucinations caused by missing, false or outdated knowledge encapsulated within the parameters is imperative for ensuring the reliability of their outputs. However, the potential trade-off between improving the factuality and degrading the general abilities underscores the need for a balanced approach. Striking the right balance in model editing is crucial to prevent unintended consequences and to preserve the broader abilities of LLMs, contributing to the sustainable advancement of AI technology. This paper highlights the importance of considering not only the immediate in factuality but also the long-term impacts on the general performance and applicability of LLMs, encouraging a thoughtful and comprehensive exploration of model editing techniques for responsible AI development. More importantly, this paper calls for more efforts and underscores the collective focus on strengthening the robustness of LLMs to weight perturbations, developing innovative paradigms for model editing, and designing comprehensive evaluation of model editing. By doing so, we can collectively advance the continual development of LLMs, paving the way for more reliable applications in real-world scenarios.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Limitations</head><p>This paper studies the side effects of editing based on the ZsRE editing dataset, while more complex and diverse side effects are hypothesized to exist and thus need to be explored on more editing datasets in future work. In addition, although sometimes a method of side effect mitigation is effective for a certain number of edits, it remains to be seen whether the method will still be effective for a larger number of edits. It is expected that one editing method outperforms another in terms of the number of edits given the same requirements of maintaining editing performance and general abilities. This paper does not further explore whether the proposed method can still be effective for more edits, which is worth further study. While we primarily propose to mitigate the side effects of model editing from a statistical perspective, the bottleneck of the general abilities of edited models should be analyzed theoretically. cl os ed d om ai n Q A se nt im en ta na ly si s N LI op en d om ai n Q A su m m ar iz at io n N ER di al og ue re as on in g 0.0 0.2 0.4 0.6 0.8 1.0 Performance {Batch}{Single}{MEND}{LLaMA2 7b}{ZsRE} Not Edited Batch 10 Batch 30 Batch 50 Batch 70 cl os ed d om ai n Q A se nt im en ta na ly si s N LI op en d om ai n Q A su m m ar iz at io n N ER di al og ue re as on in g 0.0 0.2 0.4 0.6 0.8 1.0 Performance {Batch}{Single}{MEMIT}{LLaMA2 7b}{ZsRE} Not Edited Batch 10 Batch 30 Batch 50 Batch 70 Figure 10: Performance on general tasks of edited models using MEND or MEMIT to edit LLaMA-2 (7B) with different batch sizes in batch-and single-editing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D Extensive Evaluation Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D.1 Results of Side Effects of Model Editing</head></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>This research is based upon work supported by an Amazon AGI foundation research award, a google research scholar grant, CISCO sponsored research award, and NSF #2331966. We thank Tanmay Parekh, Po-Nien Kung, Sidi Lu, Fabrice Harel-Canada, UCLA NLP group members and anonymous reviewers for their valuable feedback.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1"><p>https://github.com/EleutherAI/knowledge-neurons</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2"><p>https://github.com/eric-mitchell/mend</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3"><p>https://github.com/kmeng01/rome</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4"><p>https://github.com/kmeng01/memit</p></note>
		</body>
		</text>
</TEI>
