<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Mitigating  Insecure  Outputs  in  Large  Language Models (LLMs):  A  Practical  Educational  Module</title></titleStmt>
			<publicationStmt>
				<publisher>IEEE</publisher>
				<date>07/02/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10508466</idno>
					<idno type="doi"></idno>
					<title level='j'>Proc. of The 48th IEEE International Conference on Computers, Software, and Applications (COMPSAC 2024)</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Md Barek</author><author>Md Rahman</author><author>Mst Akter</author><author>ABM Riad</author><author>Md Rahman</author><author>Hossain Shahriar</author><author>Akond Rahman</author><author>Fan Wu</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Large Language Models (LLMs) have extensive ability to produce promising output. Nowadays, people are increasingly relying on them due to easy accessibility, rapid and outstanding outcomes. However, the use of these results without appropriate scrutiny poses serious security risks, particularly when they are integrated with other software, APIs, or plugins. This is because the LLM outputs are highly dependent on the prompts they receive. Therefore, it is essential to carefully clean these outputs before using them in additional software environments. This paper is designed to teach students about the potential dangers of contaminatedLLM output within the context of web development through prelab, handson, and postlab experiences. Hands-on lab provides practical guidance on how to handle LLM vulnerabilities to make applications safe withsome real-world examples in Python. This approach aims to provide students with a deeper understanding of the precautions necessary to ensure software against the vulnerabilities introduced by LLM output.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>The Large Language Model has rapidly gained huge popularity worldwide and has already been widely adopted by a diverse range of users for generating human-like text outcomes. LLM has also recently been widely accepted in academia. The output of LLM is basically influenced by the input prompts, and it can be modified iteratively by changing the prompts. This means that although the LLM relies on a large amount of training data, the final output can still be controlled by carefully providing the prompts <ref type="bibr">[1]</ref>. Although the LLM output is different in various contexts, this paper aims mainly to focus on some specific areas for students to learn possible vulnerabilities of the LLM output and to understand how to handle them properly in the real world. This paper also aims to equip students with the knowledge and skills necessary to critically evaluate and secure LLM output, ensuring safer integration into web development and other software applications.</p><p>Our approach utilizes authentic learning <ref type="bibr">[2]</ref> concepts to equip students with the skills needed to identify and address security vulnerabilities in the LLM output. The reason we focus on handson learning is because it deepens understanding and proficiency by directly involving students with real-world problems and solutions. Engaging in practical tasks not only enhances active learning but also boosts a sense of competence and mastery, significantly improving knowledge and acquisition. Furthermore, by integrating authentic learning principles, we highlight the relevance and applicability of the skills taught, enhancing how effectively this knowledge can be transferred and applied in real-world situations <ref type="bibr">[3,</ref><ref type="bibr">4]</ref>.</p><p>To address these things, we propose authentic learning modules meticulously designed to guide students about handling insecure outputs from Large Language Models (LLMs). These modules are carefully designed to include prelab orientations, hands-on activities, and postlab reflections, creating a comprehensive educational experience that prepares students for their future endeavors. To get an overall visual understanding, please follow Figure <ref type="figure">1:</ref> In Figure <ref type="figure">1</ref>, we illustrate the complete process of cleaning the output of a large language model (LLM) to ensure that it is safe to use as input for other software services. We started with an arbitrary actor who intentionally offers a biased prompt to the LLM in order to generate vulnerable results that are passed to a tool to thoroughly filter out vulnerable parts from it. After proper cleansing, the final output then becomes safe and it can now be fed into the other software, API, or plugins.</p><p>We demonstrate to students with real-world scenarios illustrating the risks associated with directly rendering HTML and JavaScript in browsers without proper security measures. We discussed how image and iframe HTML tags, when src tags are improperly generated by the LLM, can initiate HTTP requests to external sites, potentially compromising browser security and leading to data privacy violations. Additionally, we cover the dangers of arbitrary executable JavaScript code generated by LLMs, which can be dangerously harmful when rendered directly in browsers without further scrutiny. To address these issues, we showed Python code how to clean HTML and JavaScript using the escape method, and applied regular expressions to thoroughly remove any JavaScript code from the HTML.</p><p>We also presented a case study on URL query strings that is produced by LLMs. If these query strings contain malicious data, they pose a significant risk to web servers. To mitigate this risk, we utilized Python's quote method to safely encode these query strings into URLs.</p><p>Moreover, we delved into the common issue of SQL injection, a well-known concern in database management systems. We explain how SQL queries, if partially generated by LLMs and used without scrutiny, can expose databases to attacks. To safeguard databases, we encouraged the use of parameterized query execution, which helps ensure the security and integrity of database operations.</p><p>The rest of the paper is structured as follows. Section II outlines a detailed labware setup, which is divided into subsections covering pre-lab, hands-on, and post-lab activities. Section III discusses related work, Section IV details the results of the student surveys, Section V explores potential directions for future research, Section VI concludes </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. LABWARE SETUP</head><p>The labware is organized into three main sections: prelab, handson, and postlab. They are carefully structured to provide a thorough learning experience which offers a comprehensive learning experience, enabling students to progressively build their knowledge from the basic to advanced level. The pre-lab section equips learners with essential foundational knowledge to the topic. This is followed by the hands-on lab, which offers detailed insights and hands-on with Python examples, enhancing deeper understanding and skill. The post-lab section lastly inspires students to delve deeper into the subject, supporting further research. More importantly, we have provided URLs to the Google website for easy access from anywhere. Refer to the labware design as a learning pathway depicted in Figure <ref type="figure">2</ref>: The base of the pyramid represents the foundational aspects, the middle section encompasses the hands-on experience, and the upper part, the post-lab, advances the learners to a higher level of expertise.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Pre-lab</head><p>The pre-lab section covers the basics of very common web vulnerabilities such as Cross Site Request Forgery(CSRF), Cross Site Scripting (XSS), Server Side Request Forgery (SSRF), and SQL Injection. It shows how these problems can show up in the outputs from Large Language Models (LLMs) when used in web development. It also explains why it is important to handle these issues carefully, especially when we use these outputs directly as inputs for other software services without checking them. This is essential to prevent security risks that could harm software and associated data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Hands-On</head><p>In the hands-on lab section, students will dive deeper into the practical applications of handling outputs from Large Language Models (LLMs) to enhance security and vulnerabilities in software development workflows. This section provides real-world examples, complete with Python code snippets, detailed explanations, and screenshots, showing how to address vulnerabilities associated with LLM output. We demonstrated using an open-source LLM model from Hugging Face's Transformers library, and we developed a Python method that accepts a prompt and generates an output. The output of this Python method is flexible and smart, as it can handle any given instructions and deliver AI-powered results based on the prompts provided. By frequently calling this method, students have the opportunity to freely experiment and observe how the LLM might generate biased output.</p><p>Web browsers are the primary means through which users interact with the internet daily, but many users do not regularly update their browsers. As a result, their devices are at risk of being compromised when they visit vulnerable or insecure websites <ref type="bibr">[5,</ref><ref type="bibr">6]</ref>. The rendering of unfiltered content directly in a browser can pose serious security threats <ref type="bibr">[7]</ref>. Likewise, accepting arbitrary content in APIs and plugins is risky because they might contain malicious data or executable code.</p><p>So, in the context of web development, we discussed how directly rendering this output in a browser could lead to serious security risks such as XSS <ref type="bibr">[8]</ref> and CSRF <ref type="bibr">[9]</ref>, if not properly cleaned up. We demonstrated how these outputs can be cleaned up using techniques like Python provided methods named 'escape', 'quote', and regular expressions. We explain how a query string for an URL could potentially pose Server Side Request Forgery(SSRF) <ref type="bibr">[10,</ref><ref type="bibr">11]</ref> if it is generated from LLM and we show techniques such as URL encoding to mitigate this issue. Furthermore, we explain the risks of SQL injection attacks Fig. <ref type="figure">2</ref>: Labware Setup <ref type="bibr">[12]</ref>, a very well-known issue, in databases, particularly when parts of SQL queries are derived from LLM output, and showed how to eliminate such SQL injection attacks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Post-lab</head><p>The post-lab section is designed to inspire students to dive deeper into the subject matter and engage in more thorough research. It encourages them to reflect on their hands-on experiences and apply their newly acquired knowledge in broader contexts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. RELATED WORKS</head><p>Cal&#242; et al. <ref type="bibr">[13]</ref> introduce a new way to use large language models (LLM) to help people generate websites by describing what they want in plain words through prompts. Their main idea is to guide the LLM to produce a website draft using a special setup that makes sure the LLM sticks to a certain template. They show how users describe website features, and the LLM generates the corresponding HTML and CSS in response. Le et al. <ref type="bibr">[14]</ref> discuss the use of Large Language Models (LLMs) like ChatGPT and Bard for the automatic repair of vulnerabilities in JavaScript code. The authors also contribute to how LLM-generated outputs can be refined to reduce risks, especially in programming and web development scenarios where security is critical. Hong et al. <ref type="bibr">[15]</ref> introduce the Knowledge-to-SQL framework, designed to improve SQL generation from text by integrating a specialized Data Expert Large Language Model (DELLM). Their innovation addresses the limitations of existing text-to-SQL models that often produce inaccurate SQL due to an incomplete understanding of the database schema and the context of the queries. The paper details the development and fine-tuning of DELLM, including a novel training strategy through natural language processing. Oh et al. <ref type="bibr">[16]</ref> explore the real-world implications of poisoning attacks on AI-powered coding assistant tools like ChatGPT. They investigate how such attacks can literally introduce vulnerabilities into the code suggestions provided by LLM tools, potentially leading to security breaches in software development. Siddiq and Santos <ref type="bibr">[17]</ref> introduce the security vulnerabilities in code produced by Large Language Models (LLMs). They highlight that while the functional correctness of code is frequently assessed, security aspects are often neglected.</p><p>To address this, the authors introduce the SALLM framework, which includes a new dataset of security-focused Python prompts for testing generated code, along with innovative metrics specifically designed to evaluate the security of code generated by LLMs. This framework aims to reduce potential security risks in code that is generated automatically. He and Vechev <ref type="bibr">[18]</ref> focus on enhancing the security of code generated by Large Language Models (LLMs) through a novel framework named SVEN which introduces controlled code generation, where code security is manipulated via binary properties that guide the LLM to produce secure code. Luo et al. <ref type="bibr">[19]</ref> propose a novel method called Guide-Align, focusing on improving the safety and quality of outputs from Large Language Models (LLMs). Their approach involves creating a comprehensive library of detailed guidelines. Their proposed method uses a two-stage process where a safety-trained model initially identifies potential risks and formulates appropriate guidelines for different inputs. These guidelines are then used by a retrieval model to guide LLMs during response generation to ensure that safe, reliable outputs align with human values. Jesse et al. <ref type="bibr">[20]</ref> investigate the susceptibility of Codex, AI-enabled Copilot by Github, to generate coding errors, particularly focusing on a type of bug known as single statement bugs (SStuBs). Codex, which is trained on public GitHub code that may contain bugs and vulnerabilities. This research finds that while Codex can help avoid some types of these simple bugs, they are twice as likely to produce known, exact copies of these bugs compared to correct code. The authors propose strategies to minimize the occurrence of these bugs while increasing the generation of accurate, bug-free code. V&#246;r&#246;s et al. <ref type="bibr">[21]</ref>, introduces a cutting-edge method for URL categorization using Large Language Models (LLMs) aimed at enhancing web content filtering to protect organizations from legal and ethical risks. It restricts access to high-risk or dubious sites, and promotes a secure, professional workplace.</p><p>While these studies provide very useful information, it is important to note that none specifically address how to manage insecure LLM output for the academic domain. Although learning methods such as project-based, case study-based, and authentic learning have been extensively used in numerous fields, there is still a gap in the literature in focusing on the implementation of insecure LLM output handing in the web development context. Therefore, our research seeks to bridge this gap by developing authentic learning modules to manage insecure LLM production in the educational sector.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. STUDENTS SURVEY</head><p>We conducted surveys for the prelab and postlab on large language model security by asking various questions. The surveys are offered in both quantitative and qualitative formats.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Pre-lab Survey</head><p>We asked students based on their prior work experience related to Computer and Artificial Intelligence and found different level of their roles such as Android Developer, Programmer, Software Developer and Software Engineer. The students were asked to identify their level of experience in the Large Language Model(LLM). According to the pre-lab survey, a majority of the students lacked experience in it. Among them, 33% of the students had no experience and 66% of students had limited themselves to a moderate level of knowledge about it. We also asked about their level of experiences at vulnerabilities in Large Language Model. It was surprising that almost 45% of the students had almost no experience in this area and the rest had very little knowledge about it. Furthermore, we asked students very important questions regarding their learning preferences with five different choices: Strongly Agree, Agree, Neutral, Disagree and Strongly Disagree. From the learning preference questions, it is observed that significant number of students are strongly agreed on learning better by hands on lab which are 78%. It proves that learning through hands-on is more effective for pupils. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Post-lab Survey</head><p>In the post-lab survey, we collected responses from a total of 9 students. The survey was conducted after the students participated in the hands-on module. We asked various questions to the students about the benefits of the hands-on lab and also asked whether prelab session adequately prepared them to grasp the topic. The results indicated that authentic learning practices in the field of Cybersecurity issues in LLM outcomes are viewed promisingly positive. The survey included questions with five options: Strongly Agree, Agree, Neutral, Disagree, and Strongly Disagree. We observed some interesting facts from the post-lab survey. We found that the category "Strongly Agree" is between 78% and 89% on the benefits of the post-lab session concerning LLM security. Nobody disagreed or strongly disagreed our authentic learning process. Table <ref type="table">II</ref>   It is important to note that students not only answered the questions we asked but provided insightful feedback on our another question. The question was "Please add any additional comments about this LLM security hands on project, either what you liked or disliked and make any suggestions for further improvement". Here are some positive comments we received: "The hands on module is great to build basic foundation on Software security on Large Language Model Output Concerns", "I liked the detail examples and code for demo on LLM vulnerabilities", "liked the process", "The labs are well designed and conveyed the concepts clearly with code and example", "Really helpful explanation of each line of code".</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. FUTURE RESEARCH DIRECTIONS</head><p>In our research, we concentrated exclusively on handling insecure LLM outputs within the context of web development. However, it is also essential to evaluate LLM responses across various use cases and further research is needed to thoroughly examine the results produced by LLM in other scopes. The LLM output needs thorough scrutiny in sectors such as Healthcare advice, legal and compliance, business and finance, public safety and emergency response, journalism and media, customer service and support, personal data and privacy, and Interactions with Children, etc. As we increasingly rely on AI-enabled services in our daily lives due to their promising results these days, it is also very crucial to carefully examine the outcome of LLM before making any decisions in these critical areas. Furthermore, we aim to disseminate our google site among various students, possibly more than 1500, to allow them to learn and get feedbacks from them and improve our lab based on the feedbacks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. CONCLUSION</head><p>Our labware is designed to tackle the challenges associated with learning how to handle insecure outputs in Large Language Models (LLMs) effectively focused on software security through authentic learning. We have structured the labware progressively to guide students from basic to advanced levels. In order to overcome the gap between typical learning environments and engaging practical experiences, we have proposed an approach which provides both theoretical knowledge as well as practical coding-level training that are directly relevant to the authentic learning for the learners. The preliminary feedback from students has been very positive. They have not only been able to understand the concepts but have also actively applied these skills through the hands-on labs. This reinforces our belief that practical experience combined with theoretical understanding is crucial in educating students about insecure output handling of LLMs. Through this labware, we aim to equip students with the necessary skills to navigate and mitigate the vulnerabilities of LLM outputs in their future endeavors in the software development domain.</p></div></body>
		</text>
</TEI>
