<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>SINCO: A Novel Structural Regularizer for Image Compression Using Implicit Neural Representations</title></titleStmt>
			<publicationStmt>
				<publisher>IEEE</publisher>
				<date>06/04/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10504929</idno>
					<idno type="doi">10.1109/ICASSP49357.2023.10095531</idno>
					<title level='j'>Proceedings of the  IEEE International Conference on Acoustics Speech and Signal Processing</title>
<idno>1520-6149</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Harry Gao</author><author>Weijie Gan</author><author>Zhixin Sun</author><author>Ulugbek S. Kamilov</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Implicit neural representations (INR) have been recently proposed as deep learning (DL) based solutions for image compression. An image can be compressed by training an INR model with fewer weights than the number of image pixels to map the coordinates of the image to corresponding pixel values. While traditional training approaches for INRs are based on enforcing pixel-wise image consistency, we propose to further improve image quality by using a new structural regularizer. We present structural regularization for INR compression (SINCO) as a novel INR method for image compression. SINCO imposes structural consistency of the compressed images to the groundtruth by using a segmentation network to penalize the discrepancy of segmentation masks predicted from compressed images. We validate SINCO on brain MRI images by showing that it can achieve better performance than some recent INR methods.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">INTRODUCTION</head><p>Image compression is an important step for enabling efficient transmission and storage of images in many applications. It is widely used in biomedical imaging due to the highdimensional nature of data. While traditional image compression methods are based on fixed image transforms <ref type="bibr">[1,</ref><ref type="bibr">2]</ref>, deep learning (DL) has recently emerged as a powerful data-driven alternative. The majority of DL-based compression methods are based on training autoencoders to be invertible mappings from image pixels to quantized latent representations <ref type="bibr">[3]</ref><ref type="bibr">[4]</ref><ref type="bibr">[5]</ref>.</p><p>In this work, we seek an alternative to the autoencoder-based compression methods by focusing on a recent paradigm using implicit neural representations <ref type="bibr">(INRs)</ref>. INR refers to a class of DL techniques that seek to learn a mapping from input coordinates (e.g., (x, y)) to the corresponding physical quantities (e.g., density at (x, y)) by using a multi-layer perceptron (MLP) <ref type="bibr">[6]</ref><ref type="bibr">[7]</ref><ref type="bibr">[8]</ref><ref type="bibr">[9]</ref>. Recent studies have shown the potential of INR in image compression <ref type="bibr">[10]</ref><ref type="bibr">[11]</ref><ref type="bibr">[12]</ref><ref type="bibr">[13]</ref><ref type="bibr">[14]</ref>. The key idea &#8676; These authors contributed equally to this work. This material is based upon work supported by the NSF CAREER award under CCF-2043134.</p><p>behind INR based compression is to train a MLP to represent an image and consider the weights of the trained model as the compressed data. One can then reconstruct the image by evaluating the pre-trained MLP on the desired pixel locations. The traditional training strategy for image compression using INRs seeks to enforce image consistency between predicted and groundtruth image pixels. On the other hand, it is well-known that image quality can be improved by infusing prior knowledge on the desired images <ref type="bibr">[15,</ref><ref type="bibr">16]</ref>. Based on this observation, we propose Structural regularIzatioN for INR COmpression (SINCO) as new method for improving INRbased image compression using a new structural regularizer. Our structural regularizer seeks to improve the Dice score between the groundtruth segmentation maps and those obtained from the INR compressed image using a pre-trained segmentation network. We validate SINCO on brain MR images by showing that it can lead to significant improvements over the traditional INR-based image compression methods. We show that the combination of the traditional image-consistency loss and our structural regularizer enables SINCO to learn an INR that can better preserve desired image features.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">BACKGROUND</head><p>INR (also referred to as neural fields) denotes a class of algorithms for continuously representing physical quantities using coordinate-based MLPs (see a recent review <ref type="bibr">[6]</ref>). Recent work has shown the potential of INRs in many imaging and vision tasks, including novel view synthesis in 3D rendering <ref type="bibr">[7]</ref>, video frame interpolation <ref type="bibr">[8]</ref>, and dynamic imaging <ref type="bibr">[9]</ref>. The key idea behind INR is to train a MLP to map spatial coordinates to corresponding observed physical quantities. After training, one can evaluate the pre-trained MLP on desired coordinates to predict the corresponding physical quantities, including on locations that were not part of training. Let c denotes a vector of input coordinates, v the corresponding physical quantity, and M &#10003; a MLP with trainable parameter &#10003; 2 R n . The INR training can be formulated as</p><p>(1)  where N 1 denotes the number training pairs (c, v). The common choices for `inr include `2 and `1 norms.</p><p>INRs have been recently used for image compression <ref type="bibr">[10]</ref><ref type="bibr">[11]</ref><ref type="bibr">[12]</ref><ref type="bibr">[13]</ref> (see also a recent evaluation in medical imaging <ref type="bibr">[14]</ref>). COmpressed Implicit Neural representations (COIN) <ref type="bibr">[10]</ref> is a pioneering work based on training a MLP by mapping the pixel locations (i.e., c = (x, y)) of an image to the pixel values. The pre-trained MLP in COIN is quantized and then used as the compressed data. In order to reconstruct the image, one can evaluate the model on the same pixel locations used for training. Several papers have investigated the meta-learning approach to accelerate COIN by first training a MLP over a large collection of datapoints and then fine-tuning it on an instance-dependent one <ref type="bibr">[11,</ref><ref type="bibr">12,</ref><ref type="bibr">17]</ref>. Two recent papers proposed to regularize INR-based image compression by using `0-and `1-norm penalties on the weights of the MLP to improve the compression rate <ref type="bibr">[11,</ref><ref type="bibr">13]</ref>.</p><p>The structural regularization in SINCO is based on image segmentation using a pre-trained convolutional neural network (CNN) (see a comprehensive review of the topic <ref type="bibr">[18]</ref>). There exists a rich body of literature in the context of DL-based image segmentation that can be integrated into SINCO <ref type="bibr">[19]</ref><ref type="bibr">[20]</ref><ref type="bibr">[21]</ref>.</p><p>To the best of our knowledge, no prior work has considered higher-level structural regularization in the context of INRbased image compression. It is worth mentioning that our structural regularizer is fully compatible to the existing INR compression methods; for example, one can easily combine our structural regularizer with an additional `0-regularizer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">PROPOSED METHOD</head><p>SINCO consists of two components: (a) a MLP M &#10003; that represents an image by mapping its coordinates to corresponding pixel values and (b) a CNN g ' that predicts a segmentation mask given the compressed image produced by MLP. Specifically, let x 2 R H&#8677;W denote an image of height H and width W that we seek to compress. Let c 2 R HW &#8677;2 represents all the pixel locations within x. Then, M &#10003; is trained to take c as input and predict all the corresponding HW pixels values. We format the output of M &#10003; to be the compressed image b x 2 R H&#8677;W . The function g ' denotes the segmentation CNN that takes the compressed image and predicts a segmentation mask &#349; = g ' (b x). The SINCO pipeline is illustrated in Fig. <ref type="figure">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Network Architecture</head><p>We implemented two different architectures for M &#10003; : (a) SIREN<ref type="foot">foot_1</ref>  <ref type="bibr">[22]</ref> that consists of linear layers followed by sine activation functions and (b) NeRF that incorporates the positional encoding to expand c before passing it to M &#10003;<ref type="foot">foot_2</ref>  <ref type="bibr">[7]</ref> (c) = sin(2 0 &#8673;c), cos(2 0 &#8673;c)... sin(2 where L f &gt; 0 denotes the number of frequencies. M &#10003; of NeRF consists of linear layers followed by ReLU activation functions. For the NeRF architecture, we also add residual connections from the input to intermediates layers. We subsequently denote SINCO based on the two MLPs as SINCO (SIREN) and SINCO (INR). We adopt the widelyused U-Net <ref type="bibr">[21]</ref> architecture as the CNN for the segmentation network g ' .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Training Strategy</head><p>SINCO is trained by minimizing the following loss</p><p>where s denotes the reference segmentation mask of x, and 0 is a parameter that balances image consistency with structural regularization. We implement `compress as a `2-norm between the compressed image and the groundtruth image. We implement `regularize as 1 Dice(&#349;, s), where Dice(&#349;, s) is the Dice score coefficent the segmentation mask predicted from the compressed image and the groundtruth one. In eq. ( <ref type="formula">3</ref>), the segmentation network g ' is assumed to be pre-trained, which implies that we only optimize the parameters of M &#10003; during training. Note that when = 0, eq. ( <ref type="formula">3</ref>) is equivalent to the traditional INR-based image compression. After the training, we follow the approach in <ref type="bibr">[10]</ref> by quantizing the weights of M &#10003; from 32-bits to 16-bits.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">NUMERICAL EXPERIMENTS</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Setup and comparison</head><p>For our experiments, we used MR images of brain tumors with the corresponding segmentation masks obtained from the Decathlon dataset <ref type="bibr">[23]</ref> (Task01). We selected ten test images with resolution 240 &#8677; 240. Fig. <ref type="figure">2</ref> illustrates several images used in our experiments with the corresponding segmentation masks for tumors.</p><p>We compared SINCO against two reference INR methods: (a) COIN<ref type="foot">foot_3</ref> , a recent INR method discussed in Sec. 2, and (b) Vanilla INR, a variant of SINCO (INR) that sets in (3) to 0. Note that since COIN also uses SIREN as its MLP architecture, it can be viewed as SINCO (SIREN) using a different loss. For all methods we used the same compression rate quantified with bits per pixel (bpp) bpp = bit per parameters &#8677; #parameters #pixels .</p><p>We set bbp to 1.2 in our experiments, corresponding to the compression rate of 2% relative to the raw file size. We evaluated all the methods on both image compression and image segmentation using compressed images. For image compression, we used two widely-used quantitative metrics, peak signal-to-noise ratio (PSNR), measured in dB, and structural similarity index (SSIM). We used the Dice score coefficient to evaluate image segmentation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Implementation Details</head><p>We followed <ref type="bibr">[21]</ref> to train the segmentation network g ' . We used about 500 images from the same dataset in training. The corresponding training loss can be written as</p><p>where M 1 denotes the number of training samples and `seg corresponds to binary cross entropy <ref type="bibr">[21]</ref>. We used Adam <ref type="bibr">[24]</ref> as an optimizer with learning rate 0.0001. We set training epochs to 75 and batch size to 8. In the training of SINCO, we set L f in (2) to 12. We have experimented with different values of in <ref type="bibr">(3)</ref>. The best results were obtained when = 1. We used Adam <ref type="bibr">[24]</ref> as an optimizer with learning rate 0.001. We set training epochs to 50,000. We performed our experiments on a machine equipped with an Intel Xeon Gold 6130 Processor and an NVIDIA GeForce RTX 1080Ti GPU.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Results</head><p>Figure  <ref type="table">2</ref> shows that the performance of SINCO (INR) can be improved by using the optimized value of .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">CONCLUSION</head><p>We present SINCO as a new structurally regularized image compression method using implicit neural representation.</p><p>The key idea behind SINCO is to use a pre-trained segmentation network to ensure that the INR compressed images produce accurate segmentation masks. Our experiments on brain MR images show that SINCO can quantitatively and qualitatively outperform traditional INR approaches. In the future work, we will further investigate SINCO for higher dimensional data compression (e.g., 3D or 4D MRI <ref type="bibr">[25]</ref>) and leverage recent development of meta-learning strategies to accelerate the INR training <ref type="bibr">[17]</ref>.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Authorized licensed use limited to: WASHINGTON UNIVERSITY LIBRARIES. Downloaded on May 06,2024 at 04:45:16 UTC from IEEE Xplore. Restrictions apply.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_1"><p>github.com/lucidrains/siren-pytorch.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_2"><p>github.com/wustl-cig/Cooridnate-based-Internal-Learning</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_3"><p>github.com/EmilienDupont/coin Authorized licensed use limited to: WASHINGTON UNIVERSITY LIBRARIES. Downloaded on May 06,2024 at 04:45:16 UTC from IEEE Xplore. Restrictions apply.</p></note>
		</body>
		</text>
</TEI>
