<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Impact of Adversarial Patches on Object Detection with YOLOv7</title></titleStmt>
			<publicationStmt>
				<publisher>Hampton</publisher>
				<date>12/31/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10579937</idno>
					<idno type="doi"></idno>
					
					<author>Darrien Hunt</author><author>C Boonthum-Denecke</author><author>I Mkpong-Ruffin</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[With the increased use of machine learningmodels, there is a need to understand howmachine learning models can be maliciouslytargeted. Understanding how these attacks are‘enacted’ helps in being able to ‘harden’ modelsso that it is harder for attackers to evade detection.We want to better understand object detection, theunderlying algorithms, different perturbationapproaches that can be utilized to fool thesemodels. To this end, we document our findings asa review of existing literature and open-sourcerepositories related to Computer Vision andObject Detection. We also look at howAdversarial Patches impact object detectionalgorithms. Our objective was to replicateexisting processes in order to reproduce results tofurther our research on adversarial patches.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Introduction</head><p>Computer Vision is a subset of Artificial Intelligence (AI) that grants computers the ability to "see" and process visualizations, thus extracting valuable information and performing a particular task in response. This technology can be integrated into the features of many existing devices we use today to improve its capabilities. One prominent feature in development under the umbrella of computer vision is Object Detection. Object detection allows computers to utilize cameras to track images and perform analysis. Image analysis is generally performed by training deep learning models on a large set of images in order to accurately classify objects. Images are classified by extracting visual features from them, usually by using a sliding window to scale through them and utilizing those features to make distinctions <ref type="bibr">[1]</ref>. Throughout the progression of object detection technology, various algorithms such as YOLO (You Only Look Once) <ref type="bibr">[3]</ref> have been developed. This has enabled real-time object detection research with a model capable of producing results with high detection speed and accuracy.</p><p>Compared to previous versions of YOLO, YOLOv7 benefits from an improved network architecture and less of a need for more expensive computation power <ref type="bibr">[4]</ref>. YOLOv7 utilizes Convolutional Neural Networks (CNNs) which is a learning algorithm used for image processing. This method allows computers to take images or videos and digitize them by converting them to pixels before taking a window size of the image and extracting features from each window. Its purpose is to find some useful things in each frame and pool them together. Classification can then be performed upon the output of the algorithm to determine what the computer saw. Like other neural networks, this algorithm benefits from a learning process called backpropagation which allows values to be passed back through the algorithm with adjustments in order to train the model more efficiently.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Convolutional Neural Networks (CNNs)</head><p>Convolutional Neural Networks (CNNs) are a type of deep neural network that have been widely used in computer vision tasks, including object detection <ref type="bibr">[5]</ref>. CNNs are designed to automatically learn features from raw input data, such as images, by applying a series of convolutional filters that scan the image at different scales and orientations, looking for specific patterns or features.</p><p>In object detection, CNNs are typically used in two stages: region proposal and object classification. In the first stage, the CNN is used to generate a set of candidate object regions in the image, which are then passed to the second stage for classification. The candidate regions are typically generated using a technique called selective search, which identifies regions that are likely to contain objects based on their color, texture, and other visual cues. In the second stage, the CNN is used to classify each candidate region as either containing an object or not. This is typically done by applying a sliding window approach, where a small window is moved across each candidate region, and the CNN classifies the contents of the window. If the CNN determines that the window contains an object, the region is considered a positive detection and the object is localized within the region.</p><p>CNNs have been shown to be highly effective in object detection, achieving state-of-the-art performance on many benchmark datasets. However, they require large amounts of training data and can be computationally expensive, especially for real-time applications. To address these challenges, researchers are exploring new architectures, such as the YOLO <ref type="bibr">[3]</ref> and Faster R-CNN models <ref type="bibr">[6]</ref>, which aim to balance accuracy and speed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Adversarial Attacks and Adversarial Patches</head><p>Adversarial attacks aim to fool machine learning models. Adversarial attacks can be done by making changes to a physical object so as to fool the machine learning model so that an image or object is mistaken for some other object or image or in some cases, not even detected. This ability to make a change to an object for the purpose of fooling a system is also known as adversarial perturbation. In this work, we investigated the usage of patches to cause the image classifier to misclassify, misidentify or be unable to identify given objects. Adversarial patches are images that once printed, added, or presented to the image classifier, can cause the classifier to ignore the other items or misidentify the items <ref type="bibr">[8,</ref><ref type="bibr">9]</ref>.</p><p>Figure <ref type="figure">2:</ref> A real-world attack on VGG16, using a physical patch <ref type="bibr">[7]</ref> Experimental Approach We reviewed various papers detailing generative adversarial patches and their effects on object detection algorithms. These patches seek to prevent the algorithms from accurately detecting objects within the images. Many papers provided access to GitHub repositories, thus allowing us to use pre-trained models on our test datasets to compare the results of the algorithm with varying images and patches.</p><p>We worked to experiment with the Pytorch version of the algorithm to observe its effect on a random set of images from ImageNet which consisted of images of people and objects/animals. The set of objects/animals are a bicycle, birds, cars, cell phones, and dogs.</p><p>In order to run the algorithm, we had to configure my computing environment by cloning the YOLOv7 repository (<ref type="url">https://github.com/wongkinyiu/yolov7</ref>) and then installing all the Python libraries listed in the provided text file. We were able to run the detection algorithm on all the images to generate a confidence value and altered images with a border surrounding the classified objects. The confidence value represents how sure the model accurately defines an object. The script we utilized for detection allowed us to customize the parameters in order to set pre-trained model weights, set a confidence value threshold, specify image size, and set a target directory for my images to perform classification successively.</p><p>The first step in measuring the effectiveness of the model was to run the detection algorithm on the set of unaltered images to record their confidence values. Following this, an adversarial patch can be applied to an image in an attempt to alter the confidence value output. Two patches were used to apply to images and test the algorithm as shown in Figures 3. They were automatically generated with qualities that may have the ability to fool algorithms. Each patch was applied over both categories of images, and for images of people, they were applied over the face and the body in separate instances to observe the varying effects. Figure 3(b). Patch #2 Confidence Value Object Non-people Non-people Patch #1 Non-people Patch #2 Bicycle 0 </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results and Discussion</head><p>All of the confidence values retrieved from the output of the algorithms were recorded in Table <ref type="table">1</ref> and Table <ref type="table">2</ref>. In order to more easily digest the results and store the values, the parameter of the algorithms that permits saving text files was used, and the script was modified to append the confidence value to the results. Based on observation, the algorithm seemed to be less prone to detecting objects within Patch #2, which proved to be a distraction with Patch #1. Almost all images that utilized Patch #1 were subject to detection of not only the objects in the original image, but also the hidden objects within the patch. This is likely because Patch #1 contains objects that are more easily recognizable based on the images utilized to train YOLOv7 <ref type="bibr">[4,</ref><ref type="bibr">7]</ref>.  For the set of images representing non-people, the patches had varying effects. While the original images average confidence value was 0.83496, Patch #1 had an overall negative effect on the detection rate of the classifier, producing a result of 0.71171. This drastic decrease was most sharply noticeable with the images of cars which had large patches applied to them and numerous detectable objects in the background of the images. As shown in Figures <ref type="figure">4</ref> &amp; <ref type="figure">5</ref>, background objects such as people and other cars, in addition to objects within the patch itself served as potential impediments to the algorithms detection accuracy of the primary car. The size of the patch may also be an overwhelming factor given the algorithm's ability to attempt detection of the patch as well. Patch #2 had a converse effect on the non-people image subject producing slightly better detection rates than even the unaltered images. The patch itself was not generally recognized by the algorithm as a makeup of additional objects.</p><p>The people image subset allowed us to view the effects that a patch might have when applied on certain parts of a person, namely the face and the body. It was initially suspected that applying the patch over the face of a person may serve as more of a threat than anywhere else because the face possesses many identifiable characteristics of a person. Overall, the unaltered images possessed an average confidence value of 0.928238. When Patch #1 was applied over the body and the face, the confidence values generated were 0.9224 and 0.8647684 respectively. While the average confidence value for Patch #1 applied over the body of the person was on par with that of the unaltered images, the same patch applied over the face generated a lower average value. Patch #2 resulted in values of 0.93005 and 0.921353 when applied over the body and the face respectively, citing an increase in detection accuracy over the body and a miniscule decrease over the face. Overall, most images were reflective of that, but in one instance, the application of the patch over the face seemed to have a noticeable effect (Figures <ref type="figure">6</ref> &amp; <ref type="figure">7</ref>). While more testing and further analysis should be conducted to determine if the application of a patch over the face or body has drastic effects, it remained consistent that the algorithm conducted using Patch #2, no matter where on the object it was applied, produced better results than Patch #1.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Future Work</head><p>In the future, there's a variety of methods that can be executed to further analyze the effectiveness of YOLOv7. Throughout the experiment, it was shown the varying effect that different kinds of patches can produce, so a more in depth analysis of why one patch that contained objects was more easily detectable over another is warranted. The patch size is another potential factor to consider, especially when utilizing patches with identifiable objects within them, so calling patches over images is another avenue of interest. Patch location seemed to add another dimension to the detection rate in some instances, so in order to determine its effects, the same patch should be applied multiple times on the same image in separate locations. There is also room to utilize image datasets that are more expansive and diverse. While this experiment was generally limited to specific objects that the YOLOv7 model was trained on, perhaps more images containing multiple recognizable objects can be used to make determinations on whether or not this will have a positive or negative effect, especially if those objects are overlapping. The test dataset used for this experiment was also minimal, so testing on a larger dataset would also be of great benefit to evaluate more results and average confidence values. A comparison between previous versions of YOLO on the same image set would also help provide greater context.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusion</head><p>Overall, this project helped develop my understanding of computer vision and object detection. Exposure to rapidly improving object detection models such as YOLOv7 has heightened my interest in the field and strengthened my knowledge. Reproducibility of this project was the first step in understanding how object detection works and to opening the door to working on improving the model itself.</p><p>We were able to determine the effects that adversarial patches had on detection accuracy and analyze the results to create more hypotheses.</p></div></body>
		</text>
</TEI>
