<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>An Indirect Rate-Distortion Characterization for Semantic Sources: General Model and the Case of Gaussian Observation</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>09/01/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10380111</idno>
					<idno type="doi">10.1109/TCOMM.2022.3194978</idno>
					<title level='j'>IEEE Transactions on Communications</title>
<idno>0090-6778</idno>
<biblScope unit="volume">70</biblScope>
<biblScope unit="issue">9</biblScope>					

					<author>Jiakun Liu</author><author>Shuo Shao</author><author>Wenyi Zhang</author><author>H. Vincent Poor</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[A new source model, which consists of an intrinsic 1 state part and an extrinsic observation part, is proposed and its 2 information-theoretic characterization, namely its rate-distortion 3 function, is defined and analyzed. Such a source model is 4 motivated by the recent surge of interest in the semantic aspect 5 of information: the intrinsic state corresponds to the semantic 6 feature of the source, which in general is not observable but 7 can only be inferred from the extrinsic observation. There are 8 two distortion measures, one between the intrinsic state and its 9 reproduction, and the other between the extrinsic observation and 10 its reproduction. Under a given code rate, the tradeoff between 11 these two distortion measures is characterized by the rate-12 distortion function, which is solved via the indirect rate-distortion 13 theory and is termed the semantic rate-distortion function of 14 the source. As an application of the general model and its 15 analysis, the case of Gaussian extrinsic observation is studied, 16 assuming a linear relationship between the intrinsic state and the 17 extrinsic observation, under a quadratic distortion structure. The 18 semantic rate-distortion function is shown to be the solution of a 19 convex programming problem with respect to an error covariance 20 matrix, and a reverse water-filling type of solution is provided 21 when the model further satisfies a diagonalizability condition.22]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>A STANDARD approach to describe an information source is to model a source as a stochastic process {X i }, and when the stochastic process is memoryless, it suffices to model a source as a random variable<ref type="foot">foot_0</ref> X with a given probability distribution p(x) <ref type="bibr">[2]</ref>, <ref type="bibr">[3]</ref>. In this paper, we study a new source model, which consists of an intrinsic state process and an extrinsic observation process. In the memoryless case, we can describe such a source model as a pair of random variables (S, X), with a given joint probability distribution p(s, x), defined over an appropriate product alphabet S &#215; X .</p><p>In order to characterize the information-theoretic aspect of such a source, consider the problem of compressing the source (S, X) so as to reproduce, in a lossy sense, a reproduction ( &#348;, X) over a reproduction product alphabet &#348; &#215; X . Of course, a pair of distortion measures, d s : S &#215; &#348; &#8594; R and d o : X &#215; X &#8594; R, are introduced correspondingly. Here, the subscript s stands for "state" and the subscript o stands for "observation".</p><p>A key point of the problem is that the compressor only has access to X, the extrinsic observation; -while S, the intrinsic state, remains unrevealed. The situation is illustrated in Figure <ref type="figure">1</ref>.</p><p>Our source model, termed a semantic source in the sequel, is motivated by the recent surge of interest in the semantic aspect of information. In a number of applications that may benefit from taking into account the "semantic" feature of information, it is adequate to adopt a goal-oriented perspective; that is, the destination's interest in obtaining a piece of information is to accomplish a certain goal. Furthermore, it is customary to adopt an inference-theoretic problem formulation, corresponding semantic rate-distortion problem formulation in Section II, for which we establish the semantic rate-distortion function in general form in Section III. As an application of the general results, in Section IV we turn to a case study of Gaussian extrinsic observation, assuming a linear relationship between the intrinsic state and the extrinsic observation, under a quadratic distortion structure. Therein, we formulate a convex programming problem to solve for the semantic rate-distortion function. When the Gaussian observation model further satisfies a diagonalizability condition, we develop a reverse water-filling type of solution in Section V. Finally we conclude this paper in Section VI.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Related Works</head><p>The first formulation in Shannon's information theory is lossless source coding, wherein a sequence of symbols obeying a certain probabilistic law is represented as a bit string (i.e., a codeword) by an encoder, and the decoder reproduces, based upon the codeword, the original sequence of symbols, with success probability exactly one or asymptotically approaching one. Hence, the coding is solely determined by the probabilistic model of the source, and there is certainly no role of the semantic aspect of the source. This is also consistent with Shannon's remark in his landmark paper <ref type="bibr">[2]</ref>, saying "these semantic aspects of communication are irrelevant to the engineering problem."</p><p>In a broad sense, however, the lossy source coding formulation in Shannon's information theory, namely, the rate-distortion theory <ref type="bibr">[12]</ref>, has provided a means of studying the semantic aspects of a source. This is because the coding is not solely determined by the probabilistic model of a source, but is also affected by a distortion measure, which may be defined in a rather versatile way so as to capture the "utility" when the source is reproduced at the decoder.</p><p>Our present work goes one step further, by endowing a source with a state-observation structure and studying the rate distortion function of such a source model. This model captures the fact that the semantic aspects of a source are generally embedded as intrinsic features, and hence should be characterized by studying the reproduction of the intrinsic state, in addition to the reproduction of the extrinsic observation. Our treatment of semantic aspects of sources is also in line with the recent heightened interest in the development of 5G and beyond wireless systems <ref type="bibr">[13]</ref>, <ref type="bibr">[14]</ref>  <ref type="bibr">[15]</ref>, where for many applications the semantic aspects correspond to the accomplishment of certain inference goals. Hence, if we consider an information theoretic characterization of such a "semantic" source, the task of coding is to efficiently encode the extrinsic observation so that the decoder can infer both the intrinsic state and the extrinsic observation, subject to fidelity criteria on both, simultaneously. Our problem formulation and approach are closely related to two variants of the standard rate distortion theory, namely, the indirect rate distortion function and the rate distortion function under multiple distortion measures; see our discussion following Theorem 1 in Section III.</p><p>The inference-theoretic goal-oriented approach adopted in our problem formulation does not seek a task-independent universal definition of semantic information, which is outside the scope of the present paper; for some attempts in that regard, see, e.g., <ref type="bibr">[16]</ref>, <ref type="bibr">[17]</ref>  <ref type="bibr">[18]</ref>, <ref type="bibr">[19]</ref> for a few representative works that undertake drastically different approaches.</p><p>As related topics, the information bottleneck <ref type="bibr">[20]</ref>, <ref type="bibr">[21]</ref> and the privacy funnel <ref type="bibr">[22]</ref>, <ref type="bibr">[23]</ref> are, in a certain sense, dual concepts, and both place constraints in terms of mutual information. The underlying idea of the information bottleneck is, in a broad sense, similar to ours. Specifically, there one generates a reproduction based upon the extrinsic observation, minimizing the mutual information between the extrinsic observation and the reproduction, while maintaining a level of mutual information between the intrinsic state and the reproduction. But for the information bottleneck problem formulation, there is neither an explicit distortion measure, nor an operational definition of lossy compression.</p><p>Task-based compression has been approached mainly from the perspective of quantizer design <ref type="bibr">[24]</ref>. It has been demonstrated that steering the design goal according to the task leads to performance benefits compared with a conventional task-agnostic approach, a conclusion in line with what we advocate in our work. The perception-distortion tradeoff <ref type="bibr">[25]</ref> imposes an additional constraint on the probability distribution of the reproduction. None of these related works proposes to decompose the information source into intrinsic and extrinsic parts as in our work, let alone investigates the joint behavior of them. In <ref type="bibr">[26]</ref>, a similar intrinsic state-extrinsic observation model is studied, but the encoder is designed based on the marginal distribution of the extrinsic observation only.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. SYSTEM MODEL AND PROBLEM FORMULATION</head><p>As already outlined in the introduction, we model a memoryless semantic source as a pair of random variables (S, X) that are correlated with joint probability distribution p(s, x).</p><p>The semantic aspect is embodied in the intrinsic state S, which is not observable but can only be inferred from the extrinsic observation X. In order to characterize the rate-distortion behavior of the semantic source, we consider a sequence of independent and identically distributed (i.i.d.) samples of (S, X), denoted as (S i , X i ) i&#8712;N , and denote its length-n block as (S n , X n ).</p><p>The i.i.d. source model is an idealistic scenario for our information-theoretic study. Real-world data generally exhibit sophisticated memory structures. A particularly interesting scenario is when the intrinsic state is a Markov chain, and the extrinsic observation obeys a hidden Markov model (HMM) <ref type="bibr">[27]</ref>. Extensions of our approach for semantic source models with memory are left for future research.</p><p>The lossy compression of a semantic source has been illustrated in Figure <ref type="figure">1</ref>  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>256</head><p>We will also consider a variant of the distortion constraint; 257 that is, the state distortion and the observation distortion are 258 linearly combined to yield a single overall distortion. Hence, 259 instead of (3) and (4), the decoding functions are required to 260 satisfy the following weighted distortion constraint:</p><p>where w s and w o are non-negative weighting coefficients.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>263</head><p>It is also natural to generalize the system model to 264 include several intrinsic state variables each associated with 265 a specified reproduction and a distortion. Such a seman-266 tic source is described by a tuple of random variables, 267 (S 0 , S 1 , . . . , S k-1 , X), with joint probability distribution 268 p(s 0 , s 1 , . . . , s k-1 , x) over </p><p>295 </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>314</head><p>The detailed derivation, which is based on a unified treatment in <ref type="bibr">[33]</ref>, is given in Appendix I.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>316</head><p>We note that the semantic rate distortion function can be 317 non-trivial even for the special case where S is a deterministic 318 function of X, because from a lossy reproduction of X it 319 is generally impossible to reproduce S in a lossless fashion.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>320</head><p>Specifically, suppose that S = g(X). Then ds (x, &#349;) can be</p><p>323</p><p>Similar to standard rate distortion functions, a corollary of the semantic rate distortion function as given by Theorem 1 is the following regarding monotonicity and convexity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Corollary 1:</head><p>The semantic rate distortion function R(D s , D o ) in Theorem 1 has the following properties:</p><p>Proof: The proof of the first two properties is exactly the same as that for standard rate distortion functions; see, e.g., <ref type="bibr">[3]</ref>. The third property is then an immediate corollary of the second property.</p><p>Corollary 1 implies a trade-off between the two distortions: for a given code rate, the smaller the state distortion, the larger the observation distortion, and vice versa. Concrete numerical examples can be found in Section IV, where Figures <ref type="figure">2</ref> and<ref type="figure">4</ref> plot the achievable regions of (R, D s , D o ) and their projections under different values of R, for two experimental setups, respectively. These plots demonstrate that for fixed R, the achievable (D s , D o ) pairs form a convex region, whose boundary exhibits a trade-off between D s and D o . Hence a sensible coding scheme of a semantic source should exhibit such behavior. Now consider the weighted distortion constraint <ref type="bibr">(6)</ref>. We have the following corollary. We end this section with the semantic rate distortion function (8) for semantic sources with several intrinsic states, as given by the following corollary. Its proof is essentially identical to that of Theorem 1.</p><p>Authorized licensed use limited to: Princeton University. Downloaded on November 14,2022 at 17:00:42 UTC from IEEE Xplore. Restrictions apply.</p><p>where</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>377</head><p>and </p><p>we can represent S according to</p><p>417</p><p>We consider quadratic distortion measures, defined as</p><p>420</p><p>Consequently, we have</p><p>423 3 We use K V to denote the covariance matrix of a random column vector V .</p><p>For the considered model <ref type="bibr">(19)</ref>, we can derive its semantic rate distortion function, given by the following theorem.</p><p>Theorem 2: The semantic rate distortion function for the semantic source with Gaussian extrinsic observation and linear state-observation relationship <ref type="bibr">(19)</ref>, under quadratic distortion measures <ref type="bibr">(22)</ref> and <ref type="bibr">(23)</ref>, is given by:</p><p>where S m denotes the set of all m &#215; m positive definite matrices. Note that here we use a subscript G to emphasize that the extrinsic observation is Gaussian.</p><p>Proof: See Appendix II.</p><p>From <ref type="bibr">(28)</ref>, when Z is sufficiently strong so that tr(K Z ) &gt; D s , the optimization ( <ref type="formula">26</ref>) is no longer feasible and hence R G (D s , D o ) = &#8734;. Otherwise, there is no further restriction on K Z . For example, even if Z = 0, i.e., the relationship between S and X is deterministic as S = HX, the optimization problem in Theorem 2 is still non-trivial.</p><p>A simplified case arises when H is an orthogonal matrix satisfying H T H = I. In this case, <ref type="bibr">(28)</ref> becomes <ref type="bibr">(30)</ref> which can then be combined with <ref type="bibr">(29)</ref> leading to a single distortion constraint</p><p>In Theorem 2, the matrix &#916; which we optimize corresponds to the mean squared error (MSE) of estimating X based upon X at the decoder. The key to the proof of Theorem 2 is to show that the semantic rate distortion function is achieved by a Gaussian reproduction. This is similar to situations in several Gaussian lossy compression problems, including the standard Gaussian rate distortion problem <ref type="bibr">[12]</ref> and the Gaussian quadratic CEO problem <ref type="bibr">[37]</ref>. Existing techniques based on the entropy power inequality (EPI), extremal inequalities, and Fisher information inequalities may also be interpreted as the optimality of Gaussian reproduction for the minimum mean squared error (MMSE) estimation under a given MSE constraint. In our analysis, we further need to accommodate with two MSE constraints, corresponding to the intrinsic state and the extrinsic observation, respectively.</p><p>Compared with the general form of semantic rate distortion function in Theorem 1, Theorem 2 involves only one matrix-valued optimization variable &#916;, which, as remarked in the previous paragraph, is the MSE of estimating X based upon X alone. In fact, the solution exhibits a Markov structure, i.e., S &#8596; X &#8596; X &#8596; &#348;. To help understand the optimality of the Markov chain solution, supposing that an alternative solution ( X , &#348; ) is given which does not satisfy the Markov structure, consequently one can form an improved reproduction as X = E(X| X , &#348; ), satisfying the Markov structure and achieving the same code rate I(X; X, &#348; ) = I(X; X , &#348; ). The Markov chain solution further suggests a "two-stage" coding interpretation which is in fact extensively adopted in practice: the decoder first generates a reproduction for X as X, and then uses that reproduction to further generate a reproduction for S as &#348;. Similar to the standard Gaussian rate distortion problem, the optimal X can be constructed with the aid of a "test channel", for which X as the channel input is Corollary 4: For a semantic source (S, X) with general probability density function, whose covariance matrix is given by <ref type="bibr">(20)</ref>, its semantic rate distortion function subject to quadratic distortion constraints <ref type="bibr">(22)</ref> and <ref type="bibr">(23)</ref> satisfies</p><p>Proof: See Appendix III.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Computation of the Semantic Rate Distortion Function</head><p>We remark that the optimization problem in Theorem 2 is convex, and hence can be numerically solved by software like CVX in an efficient and stable fashion. In this subsection we present some illustrative numerical examples.</p><p>Our first example is a small-scale toy model, given by </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>516</head><p>The resulting semantic rate distortion function is computed 517 as displayed in Figure <ref type="figure">2</ref>. The dotted region in Figure <ref type="figure">2</ref>(b) 518 indicates that both constraints <ref type="bibr">(28)</ref> and ( <ref type="formula">29</ref>) are active. The 519 trade-off between the two distortions are clear: the smaller the 520 state distortion, the larger the observation distortion, and vice 521 versa.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>522</head><p>Our second example captures a sparse state-observation 523 relationship, as follows. The extrinsic observation is a length-    does not seem to be sensitive to the choice of D s . This fact has an important consequence for designing lossy compression schemes for semantic sources: although several different codes may have similar performance in terms of reproducing the extrinsic observation, they can differ considerably in terms of reproducing the intrinsic state. A heuristic explanation is as follows: since X is a high-dimensional vector, describing it along several different directions may lead to similar quadratic distortion performance; but since S corresponds to a low-dimensional feature of X, its reproduction only favors the direction of describing X that retains the feature of S the best.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Generalizations of Theorem 2</head><p>We can derive from Theorem 2 several corollaries corresponding to the variants of the problem formulation in Section II.</p><p>First, let us consider replacing the quadratic distortion 565 measures by the positive semi-definite distortion constraints. 566 Following the same arguments in the proof of Theorem 2, 567 we again arrive at the optimality of Gaussian descriptions 568 under positive semi-definite distortion constraints, and hence 569 the following corollary characterizes the semantic rate distor-570 tion function.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>571</head><p>Corollary 5: Consider the positive semi-definite distortion 572 measures as</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>575</head><p>The semantic rate distortion function is given by 576</p><p>This is a semi-definite programming problem and can be 581 readily solved by software.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>582</head><p>Now consider the weighted distortion constraint, where 583 the distortion measure is defined as a weighted sum of two 584 individual distortion measures, i.e.</p><p>Authorized licensed use limited to: Princeton University. Downloaded on November 14,2022 at 17:00:42 UTC from IEEE Xplore. Restrictions apply.</p><p>Applying Corollary 2, we obtain the semantic rate distortion function in the following corollary.</p><p>Corollary 6: For the weighted distortion measure d, the semantic rate distortion function R( D) is given by</p><p>Finally, consider the case of k intrinsic states. The extrinsic observation X is still N (0, K X ). For each j &#8712; {0, 1, &#8226; &#8226; &#8226; , k -1}, the j-th intrinsic state is generated according to</p><p>where H j is an l j &#215; m matrix, and Z j is a random vector independent of X, with zero mean and covariance matrix K Zj .</p><p>We consider quadratic distortion measures, as</p><p>The semantic rate distortion function is given by the following corollary. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. WEIGHTED REVERSE WATER-FILLING</head><p>Analogous to the standard Gaussian rate distortion problem wherein (after appropriate linear transformation) the solution can be interpreted as a reverse water-filling type of rate allocation, for the semantic rate distortion function in Theorem 2, under a diagonalizability condition, the solution can also be interpreted as reverse water-filling, but with appropriately weighted water levels.</p><p>For the model of Gaussian observation with linear state-observation relationship in Section IV, we further assume that the following diagonalizability condition is satisfied: there exists an unitary matrix Q such that</p><p>simultaneously hold. Here it loses no generality to order</p><p>Lemma 1: Under the diagonalizability condition, the result-634 ing optimal &#916; takes the form</p><p>and the semantic rate distortion function in Theorem 2 can be 637 further written in terms of the following optimization problem: 638</p><p>Proof: See Appendix IV.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>644</head><p>In order to describe the weighted reverse water-filling solu-645 tion, we first introduce the following curves.</p><p>which starts from (tr(HK X H T + K Z ), tr(K X )) and 650 ends at (tr(K Z ), 0).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>651</head><p>&#8226; Curve C o :</p><p>652</p><p>(48) 654 which starts from (tr(HK X H T + K Z ), tr(K X )) and 655 ends at (tr(K Z ), m j=q+1 &#963; j ). Here, m j=1+1 &#963; j is inter-656 preted as 0 if H T H is full-rank and thus q = m.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>657</head><p>We then introduce the following partitioning of the 658 (D s , D o ) plane, based upon the curves C s and C o : </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>667</head><p>An example of the partitioning above is plotted in Figure <ref type="figure">6</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>668</head><p>The following theorem describes the weighted reverse 669 water-filling solution.</p><p>670 Theorem 3: For the model of Gaussian observation 671 with linear state-observation relationship in Section IV, 672 under the diagonalizability condition, the optimal </p><p>where &#955; is chosen to satisfy</p><p>where &#956; is chosen to satisfy q j=1 &#945; j &#948; * j = D str(K Z ).</p><p>where &#955;, &#956; are chosen to satisfy</p><p>Proof: See Appendix IV.</p><p>The partitioning {A 0 , A 1 , A 2 , A 3 } is closely related to activity of the constraints (45) and (46), as summarized in Table <ref type="table">I</ref>. In A 0 , both constraints are inactive, and hence the optimization is unconstrained yielding the trivial solution (49).</p><p>In A 1 , only the observation distortion constraint is active, and the solution (50) is a standard reverse water-filling with water level 1/&#955;. In A 2 , only the state distortion is active, and the solution (51) essentially makes the weighted eigenvalues</p><p>with water level 1/&#956;. Alternatively, we may view the term 1/(&#956;&#945; j ) in ( <ref type="formula">51</ref>) as a water level with weight 1/&#945; j . In A 3 , both constraints are active, and the solution (52) also fulfills a reverse water-filling structure with unequal water levels.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Case Study: Circulant K X and H and Weighted Reverse Water-Filling in Frequency Domain</head><p>A case of special interest is where K X and H are both circulant matrices <ref type="bibr">[38]</ref>. As the dimension of X grows large, this models the scenario where X is a circularly stationary Gaussian process, 4 and S is obtained via passing X through a time-invariant linear filter whose response is given by the first row of H. For a circulant matrix, the corresponding unitary matrix Q is the well known discrete Fourier transform (DFT) 4 If we remove the circulant restriction and consider a stationary Gaussian process, then we encounter a Toeplitz K X , for which our solution still approximately applies; see, e.g., <ref type="bibr">[38]</ref>. Fig. <ref type="figure">6</ref>. The (Ds, Do) plane is divided into four regions A 0 , A 1 , A 2 , A 3 , which determine the form of the optimal &#916;. Five points on the contour R G (Ds, Do) = 50 are marked with colors varying from purple to yellow. matrix, and its eigenvalues are the DFT of the first row of 712 the matrix. Hence the weighted reverse water-filling may be 713 interpreted as exercised in the frequency domain, similar to 714 its counterpart for the standard rate distortion function of 715 stationary Gaussian processes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>716</head><p>In the illustrative example below, consider K X as a 717 128 &#215; 128 circulant matrix with the first row</p><p>719 H as a 128 &#215; 128 circulant matrix with the first row</p><p>and K Z as a 128 &#215; 128 zero matrix (i.e., no noise in the state-722 observation relationship). Therefore, Q is the 128 &#215; 128 DFT 723 matrix whose (i, j)-th element is</p><p>shown in Figure <ref type="figure">7</ref>, and the diagonal elements  ) of Q&#916;Q T for the marked points in Figure <ref type="figure">6</ref>, plotted with the colors in Figure <ref type="figure">6</ref>.</p><p>) for these points are depicted in Figure <ref type="figure">8</ref>. For</p><p>), the optimal solution degenerates into a standard reverse water-filling form, as indicated by the purple line.</p><p>When we go from</p><p>), the water level begins to "ripple". Note that this weighted reverse water-filling can be viewed as exercised in the frequency domain, and the angular frequencies are marked on the top of Figure <ref type="figure">8</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. CONCLUSION</head><p>We have provided a general source model to describe information sources that have semantic aspects, and proposed a corresponding rate distortion problem formulation for characterizing the amount of information content of such semantic sources. We have studied the case of Gaussian extrinsic observation subject to a linear state-observation relationship and a quadratic distortion structure. There are a variety of issues that we have not touched upon in the present work.</p><p>First, calculating and bounding the semantic rate distortion functions for other interesting cases would make further use of our proposed framework, for example, when the intrinsic state is a discrete categorical random variable, corresponding to the important problem of classification; see <ref type="bibr">[1]</ref> for some preliminary results. Second, a more challenging problem is to estimate the semantic rate distortion function, and more importantly, to develop effective lossy compression methods when the joint probability distribution of the intrinsic state and the extrinsic observation is not perfectly known, say, when only finite training data of the state-observation pair are available.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>APPENDIX I PROOF OF THEOREM 1</head><p>The key to proving Theorem 1 is converting the semantic rate distortion problem into an equivalent standard rate distortion problem, with an indirect (state) distortion con-765 straint and a direct (observation) distortion constraint. More 766 precisely, we need to show that the constraint with respect 767 to the state distortion measure d s (s, &#349;) is equivalent to a 768 constraint on a converted distortion measure ds (x, &#349;); that 769 is, as long as a reproduction &#348; satisfies the constraint on 770 ds (x, &#349;), it will satisfy the constraint on d s (s, &#349;), and vice 771 versa.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>772</head><p>A general and unified approach to the indirect rate-distortion 773 function put forward in <ref type="bibr">[33]</ref> is first showing that the 774 one-shot expected distortion E d s (S, &#348;) is equivalent to 775 E ds (X, &#348;) , and then invoking a tensorization argument to 776 extend the one-shot equivalence to block codes. Here we 777 directly illustrate how this can be accomplished for S n &#8596; 778 X n &#8596; ( &#348;n , Xn ) generated by an arbitrary encoder-decoder 779 pair, as follows: </p><p>where (a) is due to independence between Z and X, (b) is according to the problem setup that E(Z) = 0, and (c) is due to the fact that tr(HK X &#348; ) = tr(K &#348;X H T ). From this chain of identities, we see that for any two reproductions of the intrinsic state, &#348; and &#348; , we have E d s (S; &#348;) = E d s (S; &#348; ) as long as</p><p>Therefore, by Theorem 1, the semantic rate distortion function R G (D s , D o ) can be further written as</p><p>Notice that, by denoting T ( &#348;, X) for convenience, h(X| &#348;, X) can be upper bounded as</p><p>where (a) is by the fact that conditioning reduces entropy, and equality holds when X -K XT K -1 T T is independent of T ; (b) is due to the fact that Gaussian distribution maximizes differential entropy with given second central moment. Overall, we can see that this upper bound of h(X| &#348;, X) is achieved when X and T are jointly Gaussian.</p><p>Based on the argument above, for an arbitrary T = ( &#348;, X), we can generate T = ( &#348; , X ) according to a linear relationship</p><p>where N is a multivariate Gaussian random variable following</p><p>and is independent of X. Clearly it holds that K T = K T and K XT = K XT . According to (58), we can see that h(X| &#348;, X) &#8804; h( &#348; , X ). That is to say, for any ( &#348;, X) that satisfies the distortion constraints, there always exists a Gaussian ( &#348; , X ) which also satisfies the distortion constraints, but achieving a lower code rate. We thus establish that jointly Gaussian reproduction ( &#348;, X) achieves the semantic rate distortion function.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Reduction to One Optimization Variable</head><p>In fact, it is unnecessary to optimize with two random variables ( &#348;, X) simultaneously, and in the following we reduce the number of optimization variables to only one.</p><p>We choose the new optimization variable as cov(X| X, &#348;), defined as</p><p>i.e., the error covariance matrix of MMSE estimating X by ( X, &#348;). By denoting cov(X| X, &#348;) as &#916; for short, we can write I(X; X, &#348;) as <ref type="bibr">(26)</ref>. Therefore, now the key point is to show that the feasible region defined by ( <ref type="formula">56</ref>)-(57) (denoted as R 1 ) is the same as the feasible region defined by ( <ref type="formula">27</ref>)-( <ref type="formula">29</ref>) That is to say, for any K ( &#348;, X) &#8712; R 1 , we can find a corresponding &#916; &#8712; R 2 , and hence R 1 &#8838; R 2 .</p><p>Then we show that R 2 &#8838; R 1 . For any &#916; &#8712; R 2 , we consider a test channel with X = X + N and let &#348; = H X, where N obeys Gaussian distribution N (0, &#916;). Hence we have</p><p>That is to say, for any &#916; &#8712; R 2 , we can also find a corresponding tuple of K ( &#348;, X) &#8712; R 1 , and hence R 2 &#8838; R 1 . Now, we can conclude that, under the setting of Theorem 2, Theorems 1 and 2 define two optimization problems with the same objective function and the same feasible region. This therefore completes the proof.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>APPENDIX III PROOF OF COROLLARY 4</head><p>By Theorem 2 and the identities <ref type="formula">21</ref>), the semantic rate distortion function of a jointly Gaussian semantic source with covariance matrix ( <ref type="formula">20</ref>) is given by</p><p>We will prove</p><p>for an arbitrary symmetric matrix &#916; that satisfies (64), ( <ref type="formula">65</ref>) and (66), by constructing a test channel. This implies that R(D s , D o ) is no greater than (63).</p><p>In order to construct the test channel, let U be a Gaussian 918 vector with zero mean and covariance matrix &#916; -&#916;K -1 X &#916;, 919 independent of (S, X). That &#916; -&#916;K -1  X &#916; is semi-definite 920 will be proved in Lemma 2 at the end of this subsection. 921 Define X = (I m -&#916;K -1 X )X + U and &#348; = K SX K -1 X X. 922 Thus S &#8596; X &#8596; X &#8596; &#348; is a Markov chain. We will verify in 923 the next paragraphs that E[ ds (X, &#348;)] &#8804; D s , where ds (  </p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>In this paper, random variables can be drawn from general alphabets, so random vectors are vector-valued random variables.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1"><p>Authorized licensed use limited to: Princeton University. Downloaded on November 14,2022 at 17:00:42 UTC from IEEE Xplore. Restrictions apply.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_2"><p>This is the operational definition of a rate distortion function, which has been widely used (see, for example,<ref type="bibr">[3]</ref>,<ref type="bibr">[28]</ref> [29]).Authorized licensed use limited to: Princeton University. Downloaded on November 14,2022 at 17:00:42 UTC from IEEE Xplore. Restrictions apply.</p></note>
		</body>
		</text>
</TEI>
