<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>High-Dimensional Mediation Analysis for Selecting DNA Methylation Loci Mediating Childhood Trauma and Cortisol Stress Reactivity</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>07/03/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10436267</idno>
					<idno type="doi">10.1080/01621459.2022.2053136</idno>
					<title level='j'>Journal of the American Statistical Association</title>
<idno>0162-1459</idno>
<biblScope unit="volume">117</biblScope>
<biblScope unit="issue">539</biblScope>					

					<author>Xu Guo</author><author>Runze Li</author><author>Jingyuan Liu</author><author>Mudong Zeng</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Childhood trauma tends to influence cortisol stress reactivity through the mediating effects of DNA methylation. Houtepen et al. conducted a study to investigate the role of DNA methylation in cortisol stress reactivity and its relationship with childhood trauma. The study collected a dataset consisting of 385,882 DNA methylation loci, cortisol stress reactivity, one-dimensional score on a childhood trauma questionnaire and several covariates for 85 healthy individuals. Of great scientific interest is to identify the active mediating loci out of the 385,882 ones. Houtepen et al. conducted 385,882 linear mediation analyses, in each of which one locus was considered, and identified three active mediating loci. More recently, van Kesteren and Oberski proposed a coordinate-wise mediation filter (CMF) and applied it to the same dataset. They identified five active mediating loci. Unfortunately, the three loci identified by Houtepen et al. are completely different from the five loci identified by van Kesteren and Oberski, probably because both Houtepen et al. and van Kesteren and Oberski did not consider all loci jointly in their analyses. The high dimensional DNA methylation loci indeed necessitate new techniques for identifying active mediating loci and testing the direct and indirect effects of the early life traumatic stress on later cortisol alteration. Motivated by the contradictory results in the aforementioned two scientific works, we develop a new estimating and testing procedure, and apply it to the same dataset as that analyzed by the two works. We identify three new loci: cg19230917, cg06422529 and cg03199124, and their effect sizes and p-values are 321.196 (p-value = 0.035965), 418.173 (p-value = 0.000234) and 471.865 (p-value = 0.001691), respectively. These three loci possess both reasonably neurobiological interpretations and statistically significant effects via our proposed tests. Based on our new procedure, we further confirm that the childhood trauma does not have significant direct effects on cortisol change-it only indirectly affects cortisol through DNA methylation, and the indirect effect is negative. Supplementary materials for this article are available online.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Childhood trauma plays a pivotal role in the development of psychiatric disorders across life span <ref type="bibr">(Burke et al. 2005;</ref><ref type="bibr">Petrowski et al. 2013)</ref>. Its persistently detrimental influence is typically realized via altering neuroendocrine substances like cortisol <ref type="bibr">(Heim et al. 2000;</ref><ref type="bibr">Carpenter et al. 2007)</ref>. Ever since the pilot study conducted by <ref type="bibr">Luecken (1998)</ref>, researchers thereof have sought for the mechanism relating cortisol change to various circumstances of childhood trauma, such as maltreatment <ref type="bibr">(Carpenter et al. 2007</ref>), physical abuse <ref type="bibr">(Heim et al. 2000;</ref><ref type="bibr">Bremner et al. 2003;</ref><ref type="bibr">Elzinga et al. 2010;</ref><ref type="bibr">Carpenter et al. 2011)</ref>, early parental loss <ref type="bibr">(Luecken 1998;</ref><ref type="bibr">Kraft and Luecken 2009)</ref>, separation experience <ref type="bibr">(Pesonen et al. 2010)</ref>, among others.</p><p>On finding such relations, the aforementioned works nevertheless have not reached a concordant solution. This pushes through deeper exploration toward epigenetic alteration involved in the traumatic stress. Convincingly demonstrated by preclinical studies, childhood trauma tends to influence neuroendocrine system in adulthood via altering DNA methylation patterns. <ref type="bibr">McGowan et al. (2009)</ref> studied the epigenetic regulation of glucocorticoid receptor (NR3C1) in human brain associated with childhood abuse. <ref type="bibr">Perround et al. (2011)</ref> showed that early life adverse events may permanently impact on the Hypothalamus-Pituitary-Adrenal axis (HPAA) though epigenetic modifications of NR3C1. <ref type="bibr">Edelman et al. (2012)</ref> demonstrated epigenetic changes at the GR exon 1F correlate with HPAA reactivity measured by total cortisol (area under curve). See <ref type="bibr">Vinkers et al. (2015)</ref> for a comparative review of literature regarding trauma-induced changes in DNA methylation in humans.</p><p>These works mainly concentrated on single-layer linear models, where effects of early life trauma on DNA methylation and effects of DNA methylation on HPAA or cortisol alteration are separately evaluated. However, DNA methylation ought to play a bridging role in the relation between childhood trauma and cortisol stress reactivity. In addition, all of their scientific findings are based on epigenetic modifications of a single gene. In theory, this is unlikely to be the case and would result in estimation bias. In sight of such issues, <ref type="bibr">Houtepen et al. (2016)</ref> conducted a genome-wide mediation analysis and identified a locus on the KITLG gene that mediates the relationship between childhood trauma and cortisol stress reactivity. Although starting at 385,882 DNA methylation loci, only the top three loci were selected for further investigation by the QQ plot of the p-values obtained from individual significance tests, with a total discard of the dependence structure and joint effects of DNA methylation. To account for the between-loci dependency, van <ref type="bibr">Kesteren and Oberski (2019)</ref> proposed an embedding algorithm called coordinate-wise mediation filter (CMF), which consists of an inner loop and outer loop. A key strategy of CMF to address dependency is the use of residuals and projection when detecting loci in the inner loop. This CMF algorithm targets dichotomous decisions-whether each of the DNA methylation locus should be recognized as a mediator, while offers no guarantee of either statistical significance or model fits. Interestingly, van Kesteren and Oberski (2019) identified completely different DNA methylation loci from <ref type="bibr">Houtepen et al. (2016)</ref>, based on the same dataset but using the CMF algorithm. In response to this contradiction, we in this article carry out an in-depth mediation analysis for a thorough understanding of how early life trauma affects cortisol stress reactivity in adulthood via DNA methylation.</p><p>From a statistical point of view, this is a high dimensional mediation problem, with DNA methylation loci being potential mediators, the vast majority of which though are supposed to be inactive. Nothwithstanding no shortage of strategies dealing with high dimensional mediators, including those in <ref type="bibr">Houtepen et al. (2016)</ref> and van <ref type="bibr">Kesteren and Oberski (2019)</ref>, most existing literature rely on the marginal screening or penalized regression for sparse estimation. See for instance <ref type="bibr">Preacher and Hayes (2008)</ref>, <ref type="bibr">Zhang et al. (2016)</ref>, <ref type="bibr">Serang et al. (2017)</ref>, and so forth. A pitfall of using these dimension reduction techniques in each or either layer of mediation models lies in the pertinent difference between penalizing paths and finding actual mediators. That is, they choose paths instead of mediators. As a potential insight to break through this obstacle, <ref type="bibr">Zhou, Wang, and Zhao (2020)</ref> proposed a debiased Lasso method that can integrate the two layers of high dimensional mediation models, and they also developed significance tests for both direct and indirect effects. However, the method proposed in <ref type="bibr">Zhou, Wang, and Zhao (2020)</ref> involves high dimensional matrix estimation and operation, which might bring about a huge computational burden. In addition, the procedure penalizes all parameters, and the debiased step relies on the entire covariance matrix. This leads to inevitable efficiency loss of the estimators. More recently, <ref type="bibr">Guo et al. (2021)</ref> observed that despite of high dimensional mediators, the direct and indirect effects are both low dimensional, with sum being the total effect. They thereby proposed a partial penalized approach for estimating the direct effects, which avoids high dimensional matrix estimation and the debiased step, and thus, enhances efficiency of proposed estimators. In spite of the plausible theory and efficient algorithms, <ref type="bibr">Guo et al. (2021)</ref> have not yet explicitly elucidated the method with potential confounders, which typically should be considered when studying traumatic effects on cortisol alteration via DNA methylation, as in the literature <ref type="bibr">(Houtepen et al. 2016;</ref><ref type="bibr">van Kesteren and Oberski 2019)</ref>. Therefore, we in this article extend the work of <ref type="bibr">Guo et al. (2021)</ref> to the models with confounders. Then we use our new procedure to study the mediating role of DNA methylation relating childhood trauma and cortisol stress reactivity, with several clinical variables involved as confounders. We further develop relevant tests for the direct and indirect effects of the early life trauma on cortisol stress reactivity.</p><p>Aside from the eight DNA methylation loci detected by <ref type="bibr">Houtepen et al. (2016)</ref> and van Kesteren and Oberski (2019), we identify three additional loci on the RAB5IF gene (cg19230917), the CPQ gene (cg06422529) and the AGPAT1 gene (cg03199124) as mediators. We look through existing literature, and find reasonably neurobiological interpretations toward these three genes, with details referred to Section 3. Thus, our findings point out a potential direction for deeper neurobiological and epigenetic investigation of the connection between traumatic stress and cortisol alteration. From statistical point of view, we perform several statistical tests, and the results are also in support of the selected genes. According to the tests for the direct and indirect effects proposed in this article, the childhood trauma influences cortisol reactivity only through DNA methylation, since the indirect effect is negatively significant, yet the direct effect is not significant. In the full model with all detected loci, those from the newly identified genes are all significant, while the KITLG gene (cg27512205) selected by <ref type="bibr">Houtepen et al. (2016)</ref>, the HNRNPF gene (cg12500973) and the ZSCAN30 gene (cg16657538) selected by <ref type="bibr">van Kesteren and Oberski (2019)</ref> are no longer significant. However, models with only the genes in <ref type="bibr">Houtepen et al. (2016)</ref> yield a contradictory conclusion that KITLG is significant.</p><p>In Section 2, we introduce the statistical formulation of the high dimensional mediation problem, including the mediation models with confounders involved, the estimation for direct and indirect effects, and the tests of significance of indirect and direct effects. The detailed analysis is presented and discussed in Section 3. We also conduct a thorough simulation study to validate the finite sample performance of the proposed procedure in Section 4. A brief summary and conclusion are provided in Section 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Statistical Formulation: High Dimensional Linear Mediation Models with Confounders</head><p>In this section, we introduce the high dimensional mediation models with confounders involved, as the statistical formulation associating childhood trauma with cortisol stress reactivity via altering DNA methylation. Then we extend the partial penalization technique in <ref type="bibr">Guo et al. (2021)</ref> to these models, for estimating and testing the direct and indirect traumatic effects.</p><p>Let y be the response variable, m consist of p-dimensional mediators, x consist of q-dimensional exposure variables, and z consist of d-dimensional confounding variables. In our study, y is designated as the cortisol stress reactivity, x is childhood trauma, and elements in m correspond to DNA methylation loci that potentially mediate relations between trauma and cortisol. Moreover, we take several clinical variables as confounders in z, with detailed descriptions in Section 3. Consider linear mediation models</p><p>where &#949; 1 is a random error with E&#949; 1 = 0 and var(&#949; 1 ) = &#963; 2 1 and &#949; 2 is a random error vector with E(&#949; 2 ) = 0 and cov(&#949; 2 ) = * . Assume that &#949; 1 is independent of m, x and z, and &#949; 2 is independent of x and z. Furthermore, assume that &#949; 1 and &#949; 2 are independent.</p><p>Motivated by the real data analysis in Section 3, it is assumed throughout this article that q and d have fixed and finite dimensions, while p is high dimensional. Plugging (2.2) into (2.1), we obtain</p><p>where &#945; 1 and &#946; = 1 &#945; 0 are called the direct and indirect effect of exposure x in mediation literature, respectively, and &#947; x = &#945; 1 + &#946; is called the total effect of x. Often of primary interest from mediation point of view is to estimate and test &#945; 1 and &#946;. And these two parameters possess their own interpretations as natural indirect effect and natural direct effect in causal inference.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Natural Direct and Natural Indirect Effects</head><p>We link the parameters &#945; 1 and &#946; with natural direct and natural indirect effects on a causal diagram. Let y(x, m) denote the potential outcome that would have been observed had x and m been set to x and m, respectively, and m(x) denote the potential mediator that would have been observed had x been set to x. Following <ref type="bibr">Imai, Keele, and Tingley (2010)</ref>, <ref type="bibr">Vanderweele and Vansteelandt (2014)</ref>, and others, for x = x 1 versus x 0 , the natural direct effect is defined as</p><p>while the indirect effect is defined as</p><p>The total effect is then naturally defined as the sum of natural direct and indirect effects</p><p>Furthermore, the independence assumptions of random errors in the mediation models (2.1) and (2.2) ensure the following sequential ignorability conditions <ref type="bibr">(Imai, Keele, and Tingley 2010;</ref><ref type="bibr">Vanderweele and Vansteelandt 2014;</ref><ref type="bibr">Huang 2019;</ref><ref type="bibr">Zhou, Wang, and Zhao 2020)</ref>.</p><p>(A1) x&#8869; &#8869;y(x, m)|z: that is, no unmeasured confounders between the exposure and outcome.</p><p>(A2) m&#8869; &#8869;y(x, m)|(x, z): no unmeasured confounders between the mediators and outcome. (A3) x&#8869; &#8869;m(x)|z: no unmeasured confounders between the exposure and mediator. (A4) m(x)&#8869; &#8869;y(x, m)|z: no exposure-dependent confounders between the mediators and outcome, where x is the realization of exposure at a different value from x.</p><p>Under these sequential ignorability conditions, <ref type="bibr">Vanderweele and Vansteelandt (2014)</ref> showed that</p><p>Thus, &#945; 1 can be interpreted as the average natural direct effect, and &#946; = 1 &#945; 0 can be interpreted as the average natural indirect effect.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Identifying Active Mediators</head><p>In this section, we introduce the procedure of identifying active mediators in the mediation models (2.1) and (2.2), and estimating the direct effect &#945; 1 and indirect effect &#946; that can get around high dimensional matrix estimation. Suppose that {m i ,</p><p>Despite high dimensionality of m, model (2.3) is indeed a fixed-dimensional model. Therefore, the coefficient of x, or say the total effect &#947; x = &#945; 1 + &#946;, could be naturally estimated via the ordinary least squared estimator in model (2.3), that is,</p><p>(2.4)</p><p>where I q is q&#215;q dimensional identity matrix, and O q&#215;d is a q&#215;d zero matrix.</p><p>Another key observation is that x and z in model (2.1) are both fixed dimensional, thus, we opt not to impose sparsity on &#945; 1 and &#945; 2 . On the other hand, sparsity on &#945; 0 , the coefficient associated with the high dimensional mediator m, could be naturally and reasonably assumed, as so in most existing highdimensional literature. Therefore, following <ref type="bibr">Guo et al. (2021)</ref>, we apply the partial penalized least squared method to fit model (2.1) by only penalizing &#945; 0 . That is, the objective function subjected to minimization is</p><p>where &#945; 0j is the jth element in &#945; 0 , and p &#955; (&#8226;) is a penalty function with tuning parameter &#955;. Throughout this article, we will use the SCAD penalty, whose first-order derivative is</p><p>and set a = 3.7 as suggested by <ref type="bibr">Fan and Li (2001)</ref>. The tuning parameter &#955; is selected by HBIC <ref type="bibr">(Wang, Kim, and Li 2013)</ref>.</p><p>A numerical algorithm to solve this penalized least squares problem is given in Section S.4 in the supplementary material of this article. Denote the corresponding estimates to be &#945; 0 , &#945; 1 and &#945; 2 . Further note that &#946; = &#947; x -&#945; 1 . Then we can estimate the indirect effect &#946; by &#946; = &#947; x -&#945; 1 . Let A = {j : &#945; 0j = 0} and A = {j : &#945; 0j = 0}. Under suitable conditions, we could obtain oracle property for our proposed estimates. That is, with probability tending to 1, A = A. Note that all truly active mediators are included in A. This then implies that we can identify all truly potential mediators. We will empirically evaluate the performance of the partial penalized least squared method in (2.5) in the simulation section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Test of Direct Effect and Indirect Effect</head><p>In terms of further statistical inference, penalizing &#945; 0 gains efficiency when estimating the coefficients, and hence, enhances power toward the subsequential tests. Meanwhile, not penalizing &#945; 1 and &#945; 2 renders their respective estimators unbiasedness, thus, there is no need for conducting any of the debiased, desparsified or decorrelated procedures <ref type="bibr">(Zhang and Zhang 2014;</ref><ref type="bibr">Van de Geer et al. 2014;</ref><ref type="bibr">Ning and Liu 2017;</ref><ref type="bibr">Zhou, Wang, and Zhao 2020)</ref>, which admittedly correct estimation biases brought by ordinary regularization methods yet sacrifice efficiency.</p><p>To develop tests for the direct effect &#945; 1 and indirect effect &#946;, we first derive the asymptotic distributions of their estimators &#945; 1 and</p><p>2 ) T and &#946; w = w &#945; 0 . Thus, models (2.1) and (2.2) can be rewritten as</p><p>which coincide with the causal mediation models considered in <ref type="bibr">Guo et al. (2021)</ref>. Thus, incorporating the results in Corollary 1 of <ref type="bibr">Guo et al. (2021)</ref>, the asymptotic distribution of &#945; 1 and &#946; can be obtained in a similar fashion. Specifically, define m A to be the subvector of m corresponding to A = {j : &#945; 0j = 0}. And</p><p>where</p><p>The asymptotic covariance matrices in (2.7) and (2.8) could be estimated in the same routine as <ref type="bibr">Guo et al. (2021)</ref>, by replacing quantities related to x in their work with those related to w in this article. These estimates lay the foundation of subsequential tests.</p><p>For testing the direct effect &#945; 1 with hypotheses</p><p>we modify the F-type lack-of-fit test proposed by <ref type="bibr">Guo et al. (2021)</ref> by incorporating confounding effects. In model (2.1), denote RSS f to be the residual sum of squares (RSS) in the full model fitted by the partial penalized least squares method (2.5), and RSS r to be the RSS in the reduced model with x deleted from</p><p>(2.1), obtained by the same partial penalized regression yet with objective function</p><p>(2.9)</p><p>The test statistic thereby is defined as</p><p>which follows &#967; 2 (q), the chi-squared distribution with degrees of freedom q, under the null hypothesis. And it possesses local power for local alternatives which converge to the null at the rate of n -1/2 . For testing the indirect effect &#946; with hypotheses</p><p>we construct the Wald test statistic with the estimated covariance matrices, namely,</p><p>where</p><p>WW . The hat versions are the sample counterparts of the covariance matrices. The limiting null distribution of S is &#967; 2 (q), and the statistic can also detect the local effects with root-n convergence rate, as discussed in <ref type="bibr">Guo et al. (2021)</ref>. In addition, the Wald test for H 0&#946; is based on the asymptotical normality of &#946;. One may construct a Wald test for individual mediation effect &#946; j or a subvector of &#946; based on their marginal asymptotical normality.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">A Case Study: Exploration of Mediating Effects of DNA Methylation between Childhood Trauma and Cortisol Stress Reactivity</head><p>This section is devoted to an empirical analysis of the same dataset as that in <ref type="bibr">Houtepen et al. (2016)</ref>   <ref type="formula">2016</ref>) first ran 385,882 linear regression models-response being cortisol stress reactivity, predictors being each out of the 385,882 DNA methylation loci, respectively, and confounders being several clinical variables. They reported 22,425 loci with p-values less than 0.05, while no statistically significant loci at level 0.05 after adjustment for multiple testing. The authors then selected three loci that stood out in the p-value distribution of the genomewide cortisol stress reactivity analysis. The three loci are cg27512205 (denoted by m 1 ), cg05608730 (m 2 ) and cg26179948 (m 3 ), based on which the authors further conducted a mediation analysis, and identified a locus on the KITLG gene (cg27512205) that is not only associated to cortisol stress reactivity, but also partly mediates the relationship between childhood trauma and cortisol stress reactivity. More importantly, they replicated the analysis using two independent samples from the whole blood and buccal (cross-tissue) DNA, respectively, and concluded that the KITLG locus is indeed a mediator.</p><p>More recently, van Kesteren and Oberski (2019) proposed a coordinate-wise mediation filter (CMF), which aims to improve the marginal screening method for linear mediation models with high-dimensional mediators. They further applied CMF for an empirical analysis of the same dataset as <ref type="bibr">Houtepen et al. (2016)</ref>, and identified five loci as the mediators. The five loci are cg16657538 (m 4 ), cg25626453 (m 5 ), cg02309301 (m 6 ), cg13136721(m 7 ), and cg12500973(m 8 ), which are completely different from the three loci identified by <ref type="bibr">Houtepen et al. (2016)</ref>. This contradiction motivates us to conduct a further analysis using the new procedure for studying the mediating role of DNA methylation that relates childhood trauma and cortisol alteration.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Mediation Analysis via the Proposed Procedures</head><p>In our analysis, the exposure variable (x) is a one-dimensional score on a childhood trauma questionnaire, and the outcome y is the increased area under the curve (iAUC) in cortisol after a stress test. We consider 385,882 DNA methylation loci in the blood as potential mediators in m. Following van Kesteren and Oberski (2019), we first carry out a screening step to retain the top 1000 potential mediators by ranking the absolute value of the product of two correlations-the correlation between x and each element of m, and between y and each element of m. This indeed is a marginal screening procedure based on Pearson correlation proposed by <ref type="bibr">Fan and Lv (2008)</ref>. They showed that for linear models, under some regularity conditions, the screening procedure possesses a sure screening property. We also include the eight loci identified by <ref type="bibr">Houtepen et al. (2016)</ref> and van Kesteren and Oberski (2019) as domain knowledge and for comparison purpose. Furthermore, eight clinical variables are involved, including age (Z 1 ), sex (Z 2 ), B cell proportion (Z 3 ), CD4 T cell proportion (Z 4 ), CD8 T cell proportion (Z 5 ), Monocytes cell proportion (Z 6 ), Granulocytes cell proportion (Z 7 ) and Natural Killer cell proportion (Z 8 ), as confounding variables. This leads to the linear mediation models (2.1) and (2.2), where x (with dimension q = 1) and y are defined above; the confounder vector z is z = (Z 0 , Z 1 , . . . , Z 8 ) T , with Z 0 &#8801; 1 to include an intercept in the model.</p><p>We apply the proposed procedure to analyze the data. In the partial penalized least squares approach, we first select the tuning parameter &#955; by HBIC, and &#955; = 60.8163. The eight loci m 1 , . . . , m 8 are treated as domain knowledge and are not penalized. Aside from them, our proposed method selects three additional loci cg19230917(m 9 ), cg06422529(m 10 ), and cg03199124(m 11 ). The estimated coefficients &#945; 0 , &#945; 1 , and &#945; 2 ,  <ref type="table"/>and<ref type="table">2</ref>, the newly identified loci m 9 , m 10 , and m 11 should be considered as mediators since their coefficients are significant, and the coefficients of exposure variable on these loci are also significant at level 0.05. Table <ref type="table">3</ref> presents the results for testing the direct and indirect effects. The indirect effect is significant with p-value 0.0016, while the direct effect is insignificant since the pvalue is 0.7643. Further note that the estimate of the indirect effect equals -17.3726 &lt; 0. This implies that childhood trauma influences the cortisol stress reactivity only through the mediation mechanism of the DNA methylation, and the indirect effect is significantly negative.</p><p>Table <ref type="table">4</ref> lists the 11 DNA methylation loci together with the genes to which they belong. It also provides some field knowledge of these genes dug out from existing research, according to which, the newly identified genes m 9 , m 10 , and m 11 have particularly interesting biological and genetical interpretations. The locus m 9 corresponds to the RAB5IF gene (cg19230917). This gene modulates cell endocytosis process by which cells engulf substances, such as hormones, from outside into the cell <ref type="bibr">(Ravikumar et al. 2008)</ref>. Cortisol is a steroid hormone produced by the adrenal glands, and it may signal the cells through receptor for endocytosis. Thus, the RAB5IF gene likely plays a mediator rule that transmits the epigenetic alterations evoked by the traumatic stress. m 10 belongs to the CPQ gene (cg06422529), which is shown by <ref type="bibr">Hauptmann et al. (2017)</ref> to function in thyroid and tumor development. <ref type="bibr">Peter (2011)</ref> testified this gene as a pathway from stress to cortisol level change  <ref type="bibr">(Aguado and Campbell 1998)</ref>. Some existing literature <ref type="bibr">(Gonzalez-Bono et al. 2002;</ref><ref type="bibr">Kuo et al. 2007;</ref><ref type="bibr">Aschbacher et al. 2013</ref>) investigated the associations between physical stress like trauma and fat tissue biosynthesis. <ref type="bibr">Vicennati et al. (2009)</ref> conducted a retrospective study and showed that women weight gain caused by trauma stress is accompanied by abnormal hormonal level such as cortisol. Our study which finds gene AGPAT1 as a mediator relating trauma stress and cortisol level therefore, may provide clues for such stress pathophysiological mechanism research. In summary, there is a reasonable conjecture that the identified loci, or their located genes, regulate neurobiological pathways and mediate the cortisol stress reactivity in response to childhood trauma, as also indicated by Table <ref type="table">3</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Some Comparisons</head><p>It is worth to compare our results with those in Houtepen et al.  with (2.1) and (2.2) where m is taken to be m (1) , and models in van Kesteren and Oberski (2019) correspond to those with m (2) . We further consider the mediation models with m (3) , which merges m (1) and m (2) . The estimated regression coefficients &#945; j 's in model (2.1) are listed in Table <ref type="table">5</ref>. The estimated 1 and 2 and their values coincide with those in Table <ref type="table">2</ref> because regressing the multiple responses m over the exposure variable and confounding variables in linear model (2.2) coincides with regressing individual mediator m j over the exposure variable and confounding variables. Tables <ref type="table">1</ref> and<ref type="table">5</ref> both suggest that the direct effect of childhood trauma, or say the coefficient of the exposure variable x, is not significant in model (2.1). All confounding variables except for Z 2 (i.e., sex) are not significant. Mediators m 5 and m 6 are not significant based on all belonging models under investigation. Furthermore, comparing Tables <ref type="table">1</ref> and<ref type="table">5</ref>, we observe that the effect of mediator m 1 may change from significance to insignificance at level 0.05, after inclusion of other mediators into the model. The reversal of test results for mediator m 1 , as well as the insignificance of m 5 and m 6 , motivates us to explore the relationship among all the identified mediators. Their pairwise correlations, partial correlations given x and z, and several multiple regression models all reflect certain degree of association among mediators, which further explain their insignificance given other mediators included in the model. We put the detailed discussion in the supplementary material to save space.</p><p>The estimated direct and indirect effects for these three models are presented in Table <ref type="table">6</ref>, together with corresponding significance tests. This table indicates that the direct effect is not significant and indirect effect is significant for models with mediators m (1) and m (3) , while both direct and indirect effects are not significant for model with mediator m (2) .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Simulation Studies</head><p>We in this section conduct Monte Carlo simulation studies to investigate the finite sample performances of the statistical procedure described in Section 2, and compare it with the oracle tests that know the true mediator set A, with statistics S O and T O , and those in <ref type="bibr">Zhou, Wang, and Zhao (2020)</ref>, with statistics denoted by S Z and T Z . The results are based on 500 replications and significance level 0.05.</p><p>We set up the experiments to mimic the real data analyzed in Section 3 to the utmost. The sample size is taken to be the same, the dimension of potential mediators is 1000, corresponding to the 1000 candidate DNA methylation loci, and the exposure variable x and confounder z are directly extracted from the dataset. Meanwhile, m is generated via model (2.2), since it needs to be considered as random according to the mechanism of mediation models. Then y is accordingly generated from model (2.1). To accomplish this, we first draw Gaussian noise &#949; 1 &#8764; N(0, &#963; 2 1 ) in model (2.1), where &#963; 2 1 is the estimated &#963; 2 1 in Section 3. The multivariate noise in model (2.2) is generated from &#949; 2 &#8764; N(0, * ), where * is taken to be autoregressive covariance matrix. That is, the (i, j)-element of * equals &#961; |i-j| , and &#961; = 0.5. The true mediators in A are designed in accordance with the 11 detected loci, m 1 , . . . , m 11 , from the real data. Their associated coefficients &#945; 0 in model (2.1) is taken to be (1.0, 0.9, 0.8, -0.9, -0.8, -0.7, 0.6, 0.5, 0.4, 0.3, 0.2), with signs of elements consistent with those in &#945; 0 estimated in Section 3. Moreover, the direct effect &#945; 1 = c 2 , where c 2 = 0, 0.1, . . . , 1.0, to capture the size and power curve of the test for direct effect. For generating the indirect effect, the true value of</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Simulation Studies Without Confounding Variables</head><p>We first consider models without confounding variable z. That is, &#945; 2 and 2 are both taken zero in model (2.1) and (2.2). We evaluate the indirect effect tests, by fixing the direct effect &#945; 1 = c 2 = 0.5. The left panel of Figure <ref type="figure">1</ref> depicts the size (c 1 = 0) and power (c 1 = &#177;0.1, . . . , &#177;1.0) for the three tests with statistics S, Figure <ref type="figure">1</ref>. Left panel is the empirical sizes and powers of tests S, S O , and S Z at significance level 0.05 over 500 replications for testing indirect effect of mediation model without confounding variables. Solid line, dash line and two-dash line represent the sizes and powers of S, S O , and S Z , respectively. Right panel is empirical sizes and powers of tests T, T O , and T Z at significance level 0.05 over 500 replications for testing direct effect of mediation model without confounding variables. The solid, dash line and two-dash line represent the sizes and powers of T, T O , and T Z , respectively. Table <ref type="table">7</ref>. Estimated biases and standard deviations (in parentheses) of different methods with different c 1 and c 2 when there is absent of confounding variables.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>New method</head><p>Oracle Zhou et al. 's method S O and S Z . From this figure, powers of all three tests increase as |c 1 | increases, and sizes are well controlled. Our proposed test S performs as well as the oracle test S O , and is more powerful than S Z . For instance, when c 1 = 0.4, the empirical powers of S and S O are 0.63, while that of S Z is 0.23. We also consider testing direct effect &#945; 1 by holding c 1 = 0.5, corresponding to true value of indirect effect &#946; = -0.7989. Similarly, the right panel of Figure <ref type="figure">1</ref> shows the empirical size (c 2 = 0) and power (c 2 = &#177;0.1, . . . , &#177;1.0) for the proposed test T, the oracle one T O , and T Z proposed by <ref type="bibr">Zhou, Wang, and Zhao (2020)</ref>. The powers of all three tests increase as the value of |c 2 | increases. T performs closely with T O , and is more powerful than T Z when c 2 is positive. For instance, when c 2 = 0.2, the empirical power of T and T O can reach 0.78, while the empirical power of T Z test is 0.06.</p><p>Moreover, we investigate the performances of the estimators of direct effect &#945; 1 and indirect effect &#946; in terms of bias and standard deviation. The results are reported in Table <ref type="table">7</ref>. From this table, the biases of our proposed estimators &#945; 1 , &#946; and oracle ones &#945; O 1 , &#946; O are very small, while the biases of &#945; Z 1 are very large. This in turn results in low power of S Z and T Z .</p><p>Table <ref type="table">8</ref> depicts the sample standard deviations of the estimates &#945; 1 and &#946; over 500 replications in the column with label "std, " which can be regarded the true value of standard error of the estimates. These sample standard deviations are also shown in parentheses of Table <ref type="table">7</ref>. In the column with label "se(std)" in Table <ref type="table">8</ref>, we report the sample average and sample standard deviation of the 500 estimates of standard errors of &#945; 1 and &#946; based on the asymptotic covariance matrix formulas in (2.7) and (2.8). Note that the R package "freebird" in <ref type="bibr">Zhou, Wang, and Zhao (2020)</ref> does not provide the estimated standard error of &#945; 1 . The difference between the column "std" and "se(std)" can be used to gauge the performance of the standard error formula based on the asymptotical covariance matrices. From Table <ref type="table">8</ref>, both the new method and the oracle have smaller difference than the method proposed by <ref type="bibr">Zhou, Wang, and Zhao (2020)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Simulation Studies with Confounding Variables</head><p>We next examine the performances of the proposed methods for the models with confounding variables. In our simulation, we set the associated coefficients &#945; 2 and 2 to be those estimated from the real data.</p><p>Figure <ref type="figure">2</ref> shows the empirical sizes and powers of the tests S, S O , and S Z for indirect effect, and the tests T, T O , and T Z for direct effect. The left panel assesses the performance of tests for  the indirect effect, holding c 2 = 0.5 as constant. From Figure <ref type="figure">2</ref>, S performs as well as S O , while S Z exhibits a shifting power curve to the right and the minimum of the curve is not attained at the null hypothesis (c 1 = 0). For testing the direct effect &#945; 1 , we hold c 1 = 0.5 and hence, &#946; = -0.7989. The values of c 2 vary from -1 to 1. The results are shown in the right panel of Figure <ref type="figure">2</ref>. Not surprisingly, T performs as well as T O , while T Z suffers from an even more severe shifting power curve than S Z for the indirect effects. For instance, when c 2 = 0.4, the empirical power of Zhou, Wang, and Zhao ( <ref type="formula">2020</ref>) is only 0.024, while the empirical powers of our test and the oracle are 0.986 and 0.992, respectively.</p><p>To understand in depth the abnormal behavior of the power curves of <ref type="bibr">Zhou, Wang, and Zhao (2020)</ref>'s tests, we investigate the performance of estimated direct effect &#945; 1 and indirect effect &#946; in terms of bias and standard deviation, as reported in Table <ref type="table">9</ref>. The biases of <ref type="bibr">Zhou, Wang, and Zhao (2020)</ref>'s estimates are fairly large compared to the proposed estimators &#945; 1 , &#946; and oracle ones &#945; O 1 , &#946; O . When holding c 2 = 0.5, the bias of &#945; Z 1 increases dramatically as c 1 increases. Similar phenomenon occurs when holding c 1 = 0.5, where the bias of &#946; Z changes substantially. The bias of estimated &#945; Z 1 and &#946; Z results in the shifted curves shown in Figure <ref type="figure">2</ref>. The large bias of <ref type="bibr">Zhou, Wang, and Zhao (2020)</ref>'s estimates and the low power of their tests are possibly due to the penalization on direct effect in the scaled lasso, as also pointed out in <ref type="bibr">Zhou, Wang, and Zhao (2020)</ref> (see the discussion in their sec. 7). The penalization on direct effects would make sense when the direct effects are expected to small. This is another main merit of applying partial penalized method toward this problem setting. We explore Zhou, Wang, and Zhao (2020)'s method more to understand its inferior performance. Note that the proposed method does not penalize the direct effect &#945; 1 , while <ref type="bibr">Zhou, Wang, and Zhao (2020)</ref>'s method does penalize the direct effect in fitting scaled lasso <ref type="bibr">(Sun and Zhang 2012)</ref>. This leads to a larger estimated error variance &#963; 2 1 than the proposed method and the oracle estimator. Figure <ref type="figure">3</ref> compares the estimated &#963; 1 of the proposed estimate, oracle estimate and <ref type="bibr">Zhou, Wang, and Zhao (2020)</ref> when confounding variables are involved in the mediation model. From Figure <ref type="figure">3</ref>, we can observe that when c 1 or c 2 changes from negative to positive, Zhou, Wang, and Zhao (2020)'s estimated &#963; 1 dramatically increases, while the proposed method and oracle estimate do not. The increasing trend of variance estimation results in large biases of initial estimates used in Zhou, Wang, and Zhao (2020), making the debiased step   more challenging. In addition, as c 1 or c 2 increases, estimating through D -UU &#8804; &#964; , where UU = uu T , u = (m T , w T ) T , and D and are defined following <ref type="bibr">Zhou, Wang, and Zhao (2020)</ref>, requires larger tuning parameter &#964; , corresponding to more penalization on parameters and hence, further biases as well. Moreover, the power loss in the debiased step is attributed in part to multicollinearity, which also increases with c 1 and c 2 , and when more confounders are involved.</p><p>As in Table <ref type="table">8</ref>, we report the empirical standard deviation of the 500 estimates and the average of 500 estimated standard errors over the 500 replications in Table <ref type="table">10</ref> to examine the accuracy of variance estimation. For the new method and oracle, the standard errors estimated by Monte Carlo simulations are close to those calculated from formulas; while the empirical standard deviation and the average standard error of <ref type="bibr">Zhou, Wang, and Zhao (2020)</ref> have a large gap.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>Early life trauma plays a critical role in developing psychiatric disorders, typically via altering certain neuroendocrine substances like cortisol. Various researches thus, have been conducted to understand the mechanism relating cortisol change to different circumstances of early life trauma. Along with such prolific research output, scientists gradually realized the bridging role of DNA methylation toward the relation between childhood trauma and cortisol stress reactivity. On genome-wide level, <ref type="bibr">Houtepen et al. (2016)</ref> conducted a study to investigate how DNA methylation affects cortisol stress reactivity and its relationship with childhood trauma. They identified three top-rated DNA methylation loci by ranking the p-values in an increasing order, one of which, on the KITLG gene (cg27512205), was shown not only to associate with cortisol change, but also partly mediate the relationship between childhood trauma and cortisol stress reactivity. Another study by <ref type="bibr">van Kesteren and Oberski (2019)</ref>, however, yielded a completely different set of loci, based on the same dataset while using their proposed CMF algorithm.</p><p>Motivated by such contradictory results in <ref type="bibr">Houtepen et al. (2016)</ref> and van <ref type="bibr">Kesteren and Oberski (2019)</ref>, in which the authors did not consider the potentially active mediating loci jointly, we propose a partial penalized least squared method for linear mediation models with high-dimensional mediators in the presence of confounders. We further develop relevant tests for the direct and indirect effects in such high-dimensional linear mediation models. Simulation studies validate the capability of the proposed approach for efficiently estimating and testing the direct and indirect effects, and the numerical comparisons also imply that the proposed procedure outperforms the debiased method advocated by <ref type="bibr">Zhou, Wang, and Zhao (2020)</ref>.</p><p>We use this partial penalized least squares method and testing procedures to investigate the high dimensional mediating effects of DNA methylation loci to relate childhood trauma and cortisol stress reactivity, with confounding variables involved.</p><p>For comparison purpose, we analyze the same dataset as <ref type="bibr">Houtepen et al. (2016)</ref> and van <ref type="bibr">Kesteren and Oberski (2019)</ref>. We choose to include the eight DNA methylation loci discovered by these two papers in the model as domain knowledge. Using the proposed approach, we identified three new loci, on the RAB5IF gene (cg19230917), the CPQ gene (cg06422529) and the AGPAT1 gene (cg03199124), respectively, that actively play the mediator role. We compare our findings with <ref type="bibr">Houtepen et al. (2016)</ref> and van Kesteren and Oberski (2019) from statistical perspectives, where tests and related analyses are all in favor of the three newly identified loci. Furthermore, we estimate and test the direct and indirect effects for childhood trauma on cortisol change, and conclude that the early life trauma affects cortisol only indirectly through DNA methylation and the indirect effect is negative, while the direct effect is insignificant.</p><p>From domain knowledge in existing literature, we also provide biological and genetical interpretations for the three selected loci and their belonging genes. The RAB5IF gene takes charge of cell endocytosis process, by which cells engulf substances like cortisol, thus, reasonably serves as a mediator which transmits the hormone change brought by the traumatic stress. As to the CPQ gene, previous research has verified it as a pathway from stress to cortisol change in fish. Thus, incorporating our findings, an neurobiological exploration toward its role in human is worthwhile. The AGPAT1 gene, on the other hand, was shown to control fat tissue biosyn-thesis; while some retrospective studies demonstrated that fat biosynthesis and storage caused by trauma stress is accompanied with abnormal hormonal level such as cortisol. Therefore, our findings may offer potential clues for such pathophysiological mechanism research. In short, statistical tests and scientific interpretations both show convincing evidence that the newly identified three DNA methylation loci, or their located genes, should be considered as active mediators that relate childhood trauma and cortisol stress reactivity.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>is set to be 1 = c 1 1 , where c 1 = 0, &#177;0.1, . . . , &#177;1.0 and 1 is the respective estimate from Section 3, thus, the indirect effect &#946; = 1 &#945; 0 = c 1 1 &#945; 0 = -1.5977c 1 . As to the coefficients of confounding variables, we design the following two scenarios of models, without and with confounders, respectively.</p></note>
		</body>
		</text>
</TEI>
