<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Estimation of Low Rank High-Dimensional Multivariate Linear Models for Multi-Response Data</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>04/03/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10411726</idno>
					<idno type="doi">10.1080/01621459.2020.1799813</idno>
					<title level='j'>Journal of the American Statistical Association</title>
<idno>0162-1459</idno>
<biblScope unit="volume">117</biblScope>
<biblScope unit="issue">538</biblScope>					

					<author>Changliang Zou</author><author>Yuan Ke</author><author>Wenyang Zhang</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[In this paper, we study low rank high dimensional multivariate linear models (LRMLM) for high dimensional multi-response data. We propose an intuitively appealing estimation approach, and develop an algorithm for implementation purposes. Asymptotic properties are established in order to justify the estimation procedure theoretically. Intensive simulation studies are also conducted to demonstrate performance when the sample size is finite, and a comparison is made with some popular methods from the literature. The results show the proposed estimator outperforms all of the alternative methods under various circumstances. Finally, using our suggested estimation procedure we apply the LRMLM to analyse an environmental data set and predict concentrations of PM2.5 at the locations]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>It is common to find multi-response data in many real life problems. Componentwise analysis is clearly not a good choice for multi-response data analysis, because it does not fully make use of the information available. For example, the observations of other components may contain the information for the component of interest, and such information would be completely overlooked by component-wise analysis, therefore, the resulting estimators would not be as efficient as we can expect. It is necessary to take multivariate analysis approach for multi-response data analysis. The most commonly used multivariate regression models are the multivariate linear models. The research in the multivariate linear models can be at least traced back to <ref type="bibr">Anderson (1951)</ref>. There is much literature after <ref type="bibr">Anderson (1951)</ref> about the classic multivariate linear models, see the references in <ref type="bibr">Reinsel and Velu (1998)</ref> and <ref type="bibr">Anderson (2004)</ref>.</p><p>With the surge in high dimensional data analysis in the past more than a decade, the multivariate linear models in high dimensional setting are attracting more and more attention than ever before. Many interesting developments in low rank high dimensional multivariate linear models have appeared in literature, see <ref type="bibr">Yuan et al. (2007)</ref>, <ref type="bibr">Negahban and</ref><ref type="bibr">Wainwright (2011), Obozinski et al. (2011)</ref>, <ref type="bibr">Kong et al. (2017)</ref>, <ref type="bibr">Bing and Wegkamp (2019)</ref>, <ref type="bibr">Raskutti et al. (2019)</ref>, <ref type="bibr">Zheng et al. (2019)</ref> and the references therein.</p><p>A commonly used approach to deal with the low rank coefficient matrix in a multivariate linear model is based on the idea of decomposing the coefficient matrix, say A with rank r and size p &#215; q, to CDQ, where C and Q are two matrices of size p &#215; r and r &#215; q, respectively, and D is a diagonal matrix of size r, where r is an estimator of r. The estimation of r plays a key role for the success of the approach used. Different approaches may end up with different ways to estimate r, see <ref type="bibr">Yuan et al. (2007)</ref> and <ref type="bibr">Bing and Wegkamp (2019)</ref>. Although the existing approaches for estimating r enjoy nice asymptotic properties, when implementing them, we often come up against a dilemma: we create a new unknown parameter in order to estimate an unknown parameter, this is because we have to select a tuning parameter in the estimation of r. In addition to that, as far as the estimation of A is concerned, which is the ultimate goal for multivariate linear models, even if we knew the rank r, in order to get the estimator of A based on the decomposition, we would have to estimate C, Q and D. Even with the constraints coming with the decomposition, we may have to estimate at least (p + q)r unknown parameters, which is more than the unknown parameters we need to estimate without using the decomposition when r &gt; pq/(p + q). That implies we may end up with a better estimator of A if we simply apply the standard least squares estimation for multivariate linear models when r &gt; pq/(p + q), which clearly shows the limitation of the decomposition based approach.</p><p>In this paper, we are going to propose an estimation procedure for the low rank multivariate linear models, in which we only need to estimate r(p + q) -r 2 unknown parameters in order to get the estimator of A. We can easily show r(p + q) -r 2 &#8804; pq, because r &#8804; min(p, q). Intuitively speaking, the proposed estimation procedure would be more efficient than either of the standard least squares estimation and the decomposition based approach. This conclusion is confirmed to be true by both the asymptotic theory established in Section 4 and simulation studies in Section 5. As part of the proposed estimation procedure, the rank of A is estimated by the BIC, which is free of tuning parameter.</p><p>We will show the resulting estimator enjoys good theoretical properties and performs well in simulation studies.</p><p>Another advantage of the proposed estimation procedure is it clearly appreciates the high dimensionality by directly imposing a penalty on those entries of A in question, which makes the proposed estimation procedure easily accom-modate the high dimensional cases and enjoy the function of feature selection.</p><p>In the context of multi-response data analysis, the proposed estimation procedure also comes with a very nice practical implication, which is the impacts of explanatory variables on some responses are linear combinations of the impacts on certain responses when the matrix coefficient is of low rank, this would be very helpful when it comes to interpreting the results for a given real dataset, and may lead to some interesting findings in the discipline which the dataset comes from.</p><p>To implement the proposed estimation procedure, we have also developed an algorithm for the estimation. Our simulation studies show the proposed algorithm is fast and accurate.</p><p>The rest of this paper is organised as follows: we begin with a detailed description of the models we are going to address in Section 2. The proposed estimation procedure and associated computational algorithm are described in Section 3. The asymptotic properties of the estimators obtained by the proposed estimation procedure are presented in Section 4. Section 5 is devoted to simulation studies, in which we will examine how well the proposed estimation works. Finally, in Section 6, we apply the low rank multivariate linear models together with the proposed estimation procedure to analyse an environmental data set and predict the concentrations of PM2.5 at the locations concerned.</p><p>The results show the proposed method provides more accurate prediction than other methods. We leave the theoretical proofs of all asymptotic properties in the Appendix.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">The low rank high dimensional multivariate linear models</head><p>To give a generic description of the models we are going to address, we use Y to denote the vector of all response variables, X the vector of all covariates.</p><p>Without any confusion, from now on, we call Y the response variable, X the covariate. We assume Y is of q dimension, X is of p dimension. p and q may tend to &#8734; when sample size tends to &#8734;. The low rank high dimensional multivariate linear models which we are going to address in this paper are</p><p>where A is a p &#215; q unknown matrix of unknown rank r, r &lt; q &lt; p, =</p><p>( 1 , . . . , q ) is a q dimensional random error, and</p><p>Like <ref type="bibr">Yuan et al. (2007)</ref>, <ref type="bibr">Bing and Wegkamp (2019)</ref> and <ref type="bibr">Raskutti et al. (2019)</ref>,</p><p>we assume &#931; = &#963; 2 I q , and &#963; 2 is unknown.</p><p>Suppose we have a sample</p><p>model for the sample can be written as</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Estimation procedure</head><p>Throughout this paper, for any matrix &#8486; = (w ij ) of size p &#215; q and any vector</p><p>For any integers 1</p><p>and &#955; min (B) and &#955; max (B) be the smallest and largest eigenvalues of a square matrix B.</p><p>Suppose we have an independent and identically distributed sample (Y i , X i ), i = 1, &#8226; &#8226; &#8226; , n, from (Y , X ). A standard penalised least squares estimation would provide us with an estimator &#195; of A, which is the minimiser of</p><p>where P &#955; (&#8226;) is a penalty function. However, this estimation does not take into account the information that A is of low rank, which would result in an estimator not as efficient as we could expect. In fact, it is easy to see this estimation is equivalent to componentwise penalised least squares estimation for (2.1).</p><p>The proposed estimation procedure will fully make use of the low rank information of A, and the resulting estimator will be more efficient.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Estimation method</head><p>The idea, based on which the proposed estimation is constructed, is that each column of A is a linear combination of r linearly independent columns of A.</p><p>Based on this idea, we propose the following estimation procedure for A. We start with the case when the rank r of A is known, then propose an estimation for r.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.1">When r is known</head><p>Apply the idea of penalised least squares estimation and minimise</p><p>with respect to a j l s, b k s, and</p><p>is a penalty function, &#955; involved is a tuning parameter which can be selected by some criterion, such as BIC.</p><p>When r is large, b k s are also likely to have sparsity, in which case, we can add another penalty term into (3.2) to penalise b k s. However, when r is small, which is the case of main interest, there is no need to penalise b k s, this is because for each component of Y i , say the kth component, there are only r b k s, which is not many.</p><p>Notice that the minimiser of (3.2) is not unique. We denote a minimiser of (3.2) by</p><p>For any j, 1 &#8804; j &#8804; q, the jth column of A is estimated by</p><p>We use &#194;(r) to denote the estimator of A.</p><p>The non-uniqueness of the minimiser of (3.2) is because that there can be more than one ways to choose r independent columns D r so that A Dr is fullrank. Theoretically speaking, as long as A Dr and r can be well estimated, we can recovery the low-rank structure of A regardless of the choice of D r . Our theory shows that the consistency of our proposed estimator holds uniformly in D r ; please refer to Lemmas 1-3 in the Appendix. Hence, this non-unique issue does not affect the performance of the proposed estimation procedure, which is further corroborated via extensive simulations in Section 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.2">Estimation of r</head><p>The estimation of A in section 3.1.1 is built on the assumption that the rank r of A is known, and this assumption is not realistic in reality. In fact, rank r plays a very important role in the estimation of A. If r is underestimated, a substantial bias would creep into the estimation procedure and make the final estimator of A very biased. On the other hand, if r is overestimated, we would have to estimate unnecessarily many unknown parameters, which would make the final estimator of A have big variance. In this paper, we use BIC, which is defined as follows, to estimate r</p><p>where h n is a positive diverging sequence, which can be set to be log n. The estimator of r is given by</p><p>where r is a pre-specified bound for r. The proposed estimator &#194;(r) of A is the &#194;(r), obtained in section 3.1.1, with r being replaced by r.</p><p>We will show in Section 4 the proposed BIC estimator r enjoys an excellent asymptotic property, say it tends to identify the true model consistently. If the prediction accuracy is our primary concern, we can consider the multifold crossvalidation (CV), which tends to select the model with the optimal prediction performance <ref type="bibr">(Zhang, 1993)</ref>. The data are splitted randomly into M groups of equal sizes (assuming that n/M is an integer for simplicity), G m , m = 1, . . . , M .</p><p>For each k, 1 &#8804; k &#8804; q, let &#194;-m (k) be the estimator of A, obtained by the method in Section 3.1.1 when the rank of A is k, without using the observations of the mth group. The cross-validation sum is defined as</p><p>The CV estimator of r is taken to be the minimiser of CV(k).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Computational algorithm</head><p>The minimisation of (3.2) can be difficult. We propose an iterative algorithm to solve this problem. The route of our algorithm is: we first minimise (3.2), for given {j 1 , &#8226; &#8226; &#8226; , j r }, with respect to a j l s, b k s, and denote the resulting</p><p>The details of our algorithm are described as follows.</p><p>For any given {j 1 , &#8226; &#8226; &#8226; , j r }, we minimise (3.2) with respect to a j l s and b k s by the following iterative approach</p><p>(1) We minimise</p><p>with respect to a j l s, and denote the minimiser by a (0) j l s. There are many existing methods to do the minimisation in this step, because this is the minimisation for standard penalised least squares estimation.</p><p>(2) Minimise</p><p>with respect to b k s, and denote the minimiser by b</p><p>k enjoys a closed form, therefore, the minimisation in this step is very easy.</p><p>(3) Let a k s are the minimser of (3.2), and the minimum of (3.2) is denoted by</p><p>However, this approach would have to compute q r F (j 1 , &#8226; &#8226; &#8226; , j r )s, which is computationally too expensive. We shall borrow the idea of forward selection to minimise F (j 1 , &#8226; &#8226; &#8226; , j r ), which is depicted as follows (I) Let F (j 1 ) be F (j 1 , &#8226; &#8226; &#8226; , j r ) when r = 1, and compute F (j 1 ) for each possible j 1 , 1 &#8804; j 1 &#8804; q. Let &#309;1 be the one which minimises F (j 1 ).</p><p>(II) For any k &lt; r, when we have { &#309;1 , &#8226; &#8226; &#8226; , &#309;k }, the way to select a j k+1 from</p><p>as follows: for each possible j k+1 , we arrange &#309;1 , &#8226; &#8226; &#8226; , &#309;k and j k+1 in ascent order, and denote them by j1 &lt; &#8226; &#8226; &#8226; &lt; jk+1 . We compute F ( j1 , &#8226; &#8226; &#8226; , jk+1 ). The selected j k+1 is the one which minimises</p><p>). We add the selected j k+1 into the set { &#309;1 , &#8226; &#8226; &#8226; , &#309;k }, and sort the elements in the new set in ascent order. With a little bit abuse of notation, we denote the new set by</p><p>2), and minimise (3.2) with respect to a &#309;l s and b k s. Denote the resulting minimiser by &#226;&#309; l s and bk s. We take</p><p>as a minimiser of (3.2) with respect to {j 1 , &#8226; &#8226; &#8226; , j r }, a &#309;l s and b k s.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Asymptotic properties</head><p>In this section we are going to investigate the asymptotic behavior of the proposed estimator of A.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Throughout this paper,</head><p>with probability tending to 1. " " and " " are similarly defined.</p><p>To make the theoretical derivation more neat, we write the minimisation of (3.2) in matrix form. Specifically, for any given integer k &#8712; [1, q), when r = k, the minimisation of (3.2) can be written to the minimisation of the following objective function</p><p>Without loss of generality, we use the (adaptive) lasso type penalty</p><p>see <ref type="bibr">Zou (2006)</ref>, where &#955; j &gt; 0 and u j are the (j, )th element of U.</p><p>Throughout this section, we assume that each column of X has been normalized to have L 2 -norm of n. Furthermore, we denote</p><p>where a j is the (j, )th entry of A.</p><p>In order to establish the asymptotic properties of the proposed methods, we impose the following technical conditions:</p><p>Condition 1 There exist positive constants &#954; and &#954; such that with probability</p><p>Condition 2 For some positive constants C and K,</p><p>Condition 3 The elements of A and XA are bounded. The matrix A is s nsparse in the sense that max 1&#8804; &#8804;q</p><p>where</p><p>Remark 1 Condition 1 implies that the predictor matrix has a reasonably good behavior; this is a type of restricted eigenvalue assumption and is commonly used in the literature, e.g., <ref type="bibr">Fan and Peng (2004)</ref>. Condition 2 requires that each entry of E is sub-Gaussian, which ensures its tail probability decays exponentially. Condition 3 facilitates our derivation but can be much relaxed so that the true signal strength depends on n as well. Condition 4 imposes requirement on the diverging rate in order to obtain consistent estimation when p, r diverges with n. Condition 5 is an identifiability assumption, ensuring that a true low-rank structure can be recognized.</p><p>We start with the establishment of the asymptotic property of the minimiser &#194;(r) = ( &#219;, V) of (4.1) when k = r. As far as the asymptotic properties are concerned, p, q and r are allowed to depend on n and diverge as n &#8594; &#8734;. The reason for us to suppress the subscript n of p, q and r is to make notations neat.</p><p>We define an index set</p><p>and its complement set is denoted by M c (D). We have the following theorem:</p><p>Theorem 1 Under Conditions 1-5, if &#947; 0n / rp log p/n &#8594; &#8734; and rs n &#947; 2 1n &#8594; 0 as n &#8594; &#8734;, there exists, with probability tending to one, a local minimiser { Dr , &#194;(r)} of L(U, V, D r ; r) satisfying: A Dr is full-rank, and</p><p>Theorem 1 implies that we can identify a "correct" Dr in the sense that A Dr is full rank with an overwhelming probability when n is large. The penalised estimators of the zero coefficients are exactly zero under some conditions on &#947; 0n .</p><p>The condition rs n &#947; 2 1n &#8594; 0 together with Condition 4 ensures that the proposed estimator is consistent. From the proof of this theorem in the Appendix, we can see, as a special case, that &#194;(r) -A = O p ( r(p + q) log p/n) when &#947; 0n = &#947; 1n = 0. This is in line with the relevant existing results, see, for example, <ref type="bibr">Negahban and Wainwright (2011)</ref>.</p><p>When the coefficient matrix is sparse, under properly selected tuning parameters, the proposed penalised estimator would enjoy the "oracle property". Specifically, if the adaptive lasso penalty with tuning parameter &#955; j = &#955; n &#227;-1 j is used, where &#195; = (&#227; jl ) p&#215;q is the standard least-squares estimator of A, we can verify that &#194;(r) -A = O p ({r(s n + q)(log p/n)} 1/2 ), provided that rqp 2 log p/n &#8594; 0 as n &#8594; &#8734;, by setting &#955; n &#8764; log p/n and using the fact that &#227;j = O p (1) for a j = 0 and &#227;j = O p ( pq log p/n) for a j = 0.</p><p>As mentioned before, the estimation of the rank r of A plays a very important role in the estimation procedure of A. Theorem 2 shows that the proposed BIC estimator r, defined by (3.3), is consistent.</p><p>It is very easy to see Theorem 1 together with Theorem 2 imply the proposed estimator of A is consistent.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Simulation studies</head><p>In this section, we use two simulated examples, low and high dimensional ones, to assess the coefficient matrix estimation, the prediction, and the rank recovery performance of the proposed method. We assess the proposed method and consider the rank being estimated by either 5 fold cross-validation (OUR-CV;</p><p>Eq.(3.4)) or BIC (OUR-BIC; Eq.(3.3)). Throughout this section, OUR-CV and OUR-BIC are implemented by the algorithm introduced in Section 3.2 with P &#955; (&#8226;) being the L 1 penalty function. The tuning parameter &#955; is selected by 5 fold cross-validation.</p><p>In the low dimensional example, we compare OUR-CV and OUR-BIC with the Factor Estimation and Selection method (FES) proposed in <ref type="bibr">Yuan et al. (2007)</ref>, the Rank Selection Criterion (RSC) proposed in <ref type="bibr">Bunea et al. (2011)</ref>, and the Self-Tuning Rank Selection (STRS) proposed in <ref type="bibr">Bing and Wegkamp (2019)</ref>. Besides, we also consider the Ordinary Least Squares estimator (OLS) and a Low-rank Matrix Decomposition estimator (LMD) as two benchmarks.</p><p>Denote &#194;OLS the OLS estimator of A, the LMD estimator of A is obtained from a rank r truncated singular value decomposition of &#194;OLS as</p><p>where D = diag{&#963; 1 , . . . , &#963; r} is a diagonal matrix of r largest positive singular values of &#194;OLS , and U = (u 1 , . . . , u r ) and V = (v 1 , . . . , v r) are corresponding left and right singular vectors of &#194;OLS , respectively. Further, r is estimated by the eigen-ratio method, see <ref type="bibr">Ahn and Horenstein (2013)</ref> and references therein.</p><p>In the high dimensional example, we only compare OUR-CV and OUR-BIC with RSC and STRS as FES, OLS and LMD are not applicable.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Simulation settings</head><p>Consider the multivariate linear regression model (2.2). Similar to Bunea et al.</p><p>(2011) and <ref type="bibr">Bing and Wegkamp (2019)</ref>, we consider a data generating process as follows.</p><p>(1) Coefficient matrix A:</p><p>and r &#8804; min(p, q). The entries of &#915; 0 and &#915; 1 are independently drawn from N (0, 1). The parameter r controls the rank of A. The parameters b and r together control the signal to noise ratio in (2.2).</p><p>(2) Error matrix E: The entries of E are independently drawn from N (0, 1).</p><p>The design matrix X is generated with the following two settings to cover the low and high dimensional cases, respectively.</p><p>(3a) Design matrix X when n &gt; p (low dimensional case):</p><p>are independently drawn from multivariate Normal distribution N p (0, &#931;).</p><p>For i, j = 1, &#8226; &#8226; &#8226; , p, the (i, j)th entry of &#931; is defined as &#931; ij = &#951; |i-j| for some &#951; &#8712; (0, 1).</p><p>(3b) Design matrix X when p &gt; n &gt; q (high dimensional case): Let X = &#923; 0 &#923; 1 &#931; 1/2 , with &#923; 0 &#8712; IR n&#215;q , and &#923; 1 &#8712; IR q&#215;p . The entries of &#923; 0 and &#923; 1 are independently drawn from N (0, 1). The covariance matrix &#931; is defined as in (3a).</p><p>For each case, we generate a testing sample {Y * , X * } of size n * independent of {Y, X} to assess the prediction performance of each method. To sum up, the parameters that control the data generating process are listed in Table <ref type="table">1</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Low dimensional example</head><p>In the first example, we examine the performance of OUR-CV, OUR-BIC, RSC, STRS, FES, OLS and LMD in the low dimensional case. We set n = 100, n * = 50, p = 25 and q = 20. Then, we vary the rank r = 5, 10, 15, the signal strength parameter b = 0.2, 0.4, and the correlation level &#951; = 0.5, 0.9.</p><p>We simulate 200 replications for each scenario.</p><p>For each replication, we calculate two re-scaled Frobenius norms as The simulation results of the low dimensional example with b = 0.2 and 0.4 are presented in Tables <ref type="table">2</ref> and<ref type="table">3</ref>, respectively. In most scenarios, OUR-CV performs slightly better than OUR-BIC but pays a price on the computational cost. When the rank is small (e.g. r = 5), OUR-CV, OUR-BIC and STRS can recovery the correct rank with a rate close to 1 and have small estimation and prediction errors. RSC struggles when &#951; is large and resultes in low correct recovery rates. LMD suffers when the signal to noise ratio is small. When the rank is moderate or large (e.g. r = 10 or 15), the correct rank recovery rates of RSC, STRS and LMD drop. As a result, the estimation and prediction errors of RSC, STRS and LMD are also inflated. Compared with the other methods, OUR-CV and OUR-BIC are less sensitive to rank, correlation level and the signal to noise ratio. In general, FES performs similar as RSC in terms of estimation and prediction, and OLS performs unsatisfactory as it ignores the low rank structure in A.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">High dimensional example</head><p>In the high dimensional example, we examine the performance of OUR-CV and OUR-BIC and compare them with RSC and STRS. We set n = 40, n * = 40, p = 100 and q = 25. Then, we vary the rank r = 10, 20, the signal strength parameter b = 0.2, 0.4, and the correlation level &#951; = 0.5, 0.9. We simulate 200 replications for each scenario. The estimation and prediction errors are still measured by the sample mean and sample standard deviation of the rescaled Frobenius norms defined in (5.1). The estimation accuracy of rank r is measured by the correct rank recovery rate defined in (5.2).</p><p>The simulation results of the high dimensional example with b = 0.2 and 0.4 are presented in Tables <ref type="table">4</ref> and<ref type="table">5</ref>, respectively. Similar to the low dimensional case, OUR-CV outperforms OUR-BIC in most scenarios. When both &#951; and r are large, OUR-CV and OUR-BIC maintains reasonable correct rank recovery rates while RSC and STRS fail to recovery the correct rank in most replications. The columns &#8710; and &#915; report the sample mean and sample standard deviation (in parentheses) of &#8710; k and &#915; k , which are defined in (5.1), over 200 replications. The column R reports the rank recovery rate, which is defined in (5.2), over 200 replications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Real data analysis</head><p>Particulate matter up to 2.5&#181;m (PM2.5) is a complex mixture of solid particles, chemicals (e.g. sulfates, nitrates) and liquid droplets in the air, which include inhalable particles that are small enough to penetrate the thoracic region of the respiratory system. Short term (days) exposure to inhalable PM2.5 can cause an increase in hospital admissions related to respiratory and cardiovascular morbidity, such as aggravation of asthma, respiratory symptoms, and cardiovascular disorders. Long term (years) exposure to inhalable PM2.5 may The columns &#8710; and &#915; report the sample mean and sample standard deviation (in parentheses) of &#8710; k and &#915; k , which are defined in (5.1), over 200 replications. The column R reports the rank recovery rate, which is defined in (5.2), over 200 replications.</p><p>lead to an increase in mortality from cardiovascular and respiratory diseases, like lung cancer. The hazardous effects of inhalable PM2.5 on human health have been well-documented, see <ref type="bibr">Riediker et al. (2004)</ref>, <ref type="bibr">Polichetti et al. (2009)</ref>, <ref type="bibr">Franck et al. (2011</ref><ref type="bibr">), Xing et al. (2016)</ref>, <ref type="bibr">Pun et al. (2017)</ref> and references therein.</p><p>In this section, we investigate the relationship between concentration of PM2.5 and four air pollutants: ozone, sulfur dioxide (SO2), carbon monoxide (CO), and nitrogen dioxide (NO2). The dataset for us to study is available at <ref type="url">https://www.epa.gov/outdoor-air-quality-data</ref>,  The columns &#8710; and &#915; report the sample mean and sample standard deviation (in parentheses) of &#8710; k and &#915; k , which are defined in (5.1), over 200 replications. The column R reports the rank recovery rate, which is defined in (5.2), over 200 replications.</p><p>it was collected from 37 outdoor monitoring sites across the United States.</p><p>Specifically, the concentration of each of the 4 pollutants was measured and collected daily from the 37 sites between January 2017 to April 2019, and it has 729 observations in total. The concentration of PM2.5 was collected in the same manner.</p><p>What we are interested in is the association between the concentrations of PM2.5 and the concentrations of the four air pollutants at the 37 monitor sites.</p><p>As the concentrations of the four air pollutants at one site may also contribute the concentrations of PM2.5 at other sites, we include the concentrations of the four air pollutants at all 37 sites in the explanatory variables for the concentration of PM2.5 at each site of the 37 sites, this gives us 148 explanatory variables for the concentration of PM2.5 at each site. In Figure <ref type="figure">1</ref>, we plot the sample means of the concentrations of PM2.5 and of the four air pollutants against the geological locations where they were collected.</p><p>We take the first-order difference for each column of the dataset to remove the non-stationarity, and standardize it to make it have mean 0 and variance</p><p>) &#8712; IR 728&#215;37 be the matrix of the 728 observations of the response variable which contains the concentrations of PM2.5 collected by the 37 sites, and X = (X 1 , &#8226; &#8226; &#8226; , X 148 ) &#8712; IR 728&#215;148 be matrix of the 728 observations of predictor which contains the 148 explanatory variables. We apply the multivariate linear regression model (2.2) to fit the dataset, where E &#8712; IR 728&#215;37 is the matrix of random errors, and A &#8712; IR 148&#215;37 is the coefficient matrix of interest.</p><p>We compare the prediction performance of our method with rank estimated by the 5-fold cross-validation (OUR-CV), the Self-Tuning Rank Selection (STRS) proposed in <ref type="bibr">Bing and Wegkamp (2019)</ref> where q = 37 and n t = 128, which are the number of columns and the number of rows of Y test , respectively. q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.030 0.035 0.040 0.045 OZONE q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 1 2 3 4 SO2 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.2 0.4 0.6 0.8 CO q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 10 20 30 NO2 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 15 20 25</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>PM2.5</head><p>Figure <ref type="figure">1</ref>: Sample means of the concentrations of PM2.5 and of the four air pollutants</p><p>We report in Table <ref type="table">6</ref> the prediction error as well as the estimated rank of the coefficient matrix for each method concerned. According to Table <ref type="table">6</ref>, OUR-CV achieves the smallest prediction error among the three competitors.</p><p>Besides, the rank estimated by OUR-CV is 3 which is more parsimonious than the one estimated by STRS. To justify the rank estimation results, we apply eigen-decomposition to the coefficient matrix estimated by the OLS method, and draw the scree plot with the top 20 eigenvalues in Figure <ref type="figure">2</ref>. The scree plot shows a clear elbow shape at the third eigenvalue. always dominate the second term with arbitrarily large probability. Consider the third term on the right side of (A.4). By Condition 1 again, we get</p><p>and accordingly H -H = O(&#945; n C). By (A.3), we have that</p><p>holds uniformly in D, which is of smaller order of &#8710; 1 .</p><p>For &#8710; 3 , observe that which implies that with probability tending to one the case of k &gt; r would not happen as long as k/h n &#8594; 0.</p></div></body>
		</text>
</TEI>
