<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Aggregating Dependent Signals with Heavy-Tailed Combination Tests</title></titleStmt>
			<publicationStmt>
				<publisher>Biometrika</publisher>
				<date>05/30/2025</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10597135</idno>
					<idno type="doi">10.1093/biomet/asaf038</idno>
					<title level='j'>Biometrika</title>
<idno>0006-3444</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Lin Gui</author><author>Yuchao Jiang</author><author>Jingshu Wang</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Combining dependent p-values poses a long-standing challenge in statistical inference, particularly when aggregating findings from multiple methods to enhance signal detection. Recently, p-value combination tests based on regularly varying-tailed distributions, such as the Cauchy combination test and harmonic mean p-value, have attracted attention for their robustness to unknown dependence. This paper provides a theoretical and empirical evaluation of these methods under an asymptotic regime where the number of p-values is fixed and the global test significance level approaches zero. We examine two types of dependence among the p-values. First, when p-values are pairwise asymptotically independent, such as with bivariate normal test statistics with no perfect correlation, we prove that these combination tests are asymptotically valid. However, they become equivalent to the Bonferroni test as the significance level tends to zero for both one-sided and two-sided p-values. Empirical investigations suggest that this equivalence can emerge at moderately small significance levels. Second, under pairwise quasi-asymptotic dependence, such as with bivariate t-distributed test statistics, our simulations suggest that these combination tests can remain valid and exhibit notable power gains over Bonferroni, even as the significance level diminishes. These findings highlight the potential advantages of these combination tests in scenarios where p-values exhibit substantial dependence. Our simulations also examine how test performance depends on the support and tail heaviness of the underlying distributions.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Combining dependent p-values to assess the global null hypothesis has long been a fundamental challenge in statistical inference. A common scenario arises when integrating the results of various methods on the same dataset to enhance signal detection power <ref type="bibr">(Wu et al., 2016;</ref><ref type="bibr">Rosenbaum, 2012)</ref>. When individual p-values have arbitrary dependence, the Bonferroni test is the most common approach with a theoretical guarantee. However, it is often criticized for being overly conservative in practical applications.</p><p>Specifically, consider n individual p-values P 1 , . . . , P n . To test the global null hypothesis, i.e., all n null hypotheses are true, the Bonferroni test calculates the combined p-value as n &#215; min (P 1 , . . . , P n ). Due to the scaling factor n, the Bonferroni combined p-value may exceed any of the individual p-values, leading to a loss of power during the combination process.</p><p>Recently, a novel approach gaining traction involves the combination of p-values through transformations based on heavy-tailed distributions <ref type="bibr">(Liu et al., 2019;</ref><ref type="bibr">Wilson, 2019a)</ref>. Let X i be defined as Q F (1 -P i ), where F(&#8226;) represents the cumulative distribution function of a heavytailed distribution and Q F is its quantile function. The core idea is to compute the combined p-value based on the tail distribution of S n = n i=1 X i , which under the global null is robust to dependence among the heavy-tailed variables X 1 , . . . , X n . The Cauchy combination test, which sets F as the standard Cauchy distribution, was first introduced in Liu et al. ( <ref type="formula">2019</ref>) for genomewide association studies (GWAS) and has since been applied in genetic and genomic research, including spatial transcriptomics <ref type="bibr">(Sun et al., 2020)</ref>, ChIP-seq data <ref type="bibr">(Qin et al., 2020)</ref>, and singlecell genomics <ref type="bibr">(Cai et al., 2022)</ref>. Another popular method, the harmonic mean p-value <ref type="bibr">(Wilson, 2019a)</ref>, employs the Pareto distribution with shape parameter &#947; = 1 as F.</p><p>Despite the growing popularity of these heavy-tailed combination tests in practical applications, there has been limited theoretical investigation and empirical evaluation of these methods. Existing studies <ref type="bibr">(Liu &amp; Xie, 2020;</ref><ref type="bibr">Fang et al., 2023)</ref> have provided asymptotic validity of these tests as the significance level &#945; &#8594; 0 for pairwise bivariate normal test statistics. These results closely related to earlier findings on sums of regularly varying tail variables, showing that pr (S n &gt; x) and n {1 -F(x)} are asymptotically equivalent as x &#8594; +&#8734;, provided that the variables X 1 , . . . , X n are pairwise quasi-asymptotically independent <ref type="bibr">(Chen &amp; Yuen, 2009)</ref>. Intuitively, for heavytail distributed X 1 , . . . , X n , their maximum typically dominates the sum, making the latter less sensitive to dependence among X 1 , . . . , X n . Yet this same intuition raises doubts about the true benefits of these tests compared to the Bonferroni test. Additionally, the assumption of quasiasymptotic independence, while covering any bivariate normal variables that are not perfectly correlated, remains more stringent than allowing arbitrary dependence. For example, bivariate t-distributed variables, which are frequently used as test statistics, are not quasi-asymptotically independent. This raises questions about the robustness of these tests when faced with unknown dependence structures.</p><p>This paper addresses these concerns through theoretical and empirical analyses. Many applications employ heavy-tailed combination tests to aggregate results from different methods or studies, often in settings where the number of base hypotheses, n, is moderate rather than excessively large. Accordingly, we focus on scenarios where n is fixed and analyze the asymptotic regime as the significance level &#945; &#8594; 0. Our theoretical investigation shows that when test statistics are quasi-asymptotically independent, particularly when they follow a bivariate normal distribution with imperfect correlation, the rejection regions of heavy-tailed combination tests are asymptotically equivalent to those of the Bonferroni test as &#945; approaches zero. This suggests that in the same asymptotic regime where combination tests have proven to be valid, they offer no real power advantage over Bonferroni's approach. However, when the assumption of asymptotic independence is violated, such as when test statistics follow a multivariate t distribution, our empirical results indicate that combination tests still appear to be asymptotically valid when the tail index &#947; 1, despite the lack of a theoretical guarantee. More strikingly, they exhibit significantly greater power than the Bonferroni test, highlighting their potential advantages in settings where p-values are strongly dependent, a scenario that often arises when aggregating results from different methods applied to the same dataset. Furthermore, through simulations and real-world case studies, we observe that the empirical validity and power of these tests are affected by both the heaviness and support of the heavy-tail distribution.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Model setup and theoretical results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Model setup</head><p>Consider n test statistics T 1 , . . . ,T n , where each T i is for a base null hypothesis H 0,i . For each base hypothesis, we construct a one-sided or two-sided base p-value P i based on the distribution of T i under H 0,i . We are interested in testing the global null hypothesis</p><p>The test statistics T 1 , . . . ,T n may exhibit unknown dependence structures among each other.</p><p>For the heavy-tailed combination tests, we apply a transformation of the p-values into quantiles of heavy-tailed distributions. Specifically, let F denote the cumulative distribution function (CDF) of the heavy-tailed distribution and Q F represent its quantile function, defined as</p><p>We define the individual transformed test statistics as</p><p>A combination test can then be constructed based on the sum</p><p>or more generally, any weighted sum S n, &#236; &#969; = n i=1 &#969; i X i with non-random positive weights &#969; i s.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Tail properties of the sum S n</head><p>We begin by reviewing existing theoretical results on the tail properties of S n . If X 1 , . . . , X n belong to the sub-exponential family, a major class of heavy-tailed distributions, it is well-known that the tail probability of S n = X 1 + &#8226; &#8226; &#8226; + X n is asymptotically equivalent to the sum of individual tail probabilities under the assumption that the X i s are mutually independent. That is,</p><p>where F = 1 -F denotes the tail probability <ref type="bibr">(Embrechts et al., 2013)</ref>. When the independence assumption fails, previous works <ref type="bibr">(Chen &amp; Yuen, 2009;</ref><ref type="bibr">Asmussen et al., 2011;</ref><ref type="bibr">Albrecher et al., 2006;</ref><ref type="bibr">Kortschak &amp; Albrecher, 2009;</ref><ref type="bibr">Geluk &amp; Ng, 2006;</ref><ref type="bibr">Tang, 2008)</ref> have shown that (1) still holds for different subclasses of sub-exponential distributions under certain assumptions of the dependence structure.</p><p>Here, we restate several key results that form the foundation of the theoretical properties of the heavy-tailed combination tests, which will be detailed in Section 2.3. For any variable X, we denote X + = max (X, 0) and X -= max (-X, 0). To begin, we introduce the concepts of quasi-asymptotic independence and the consistently-varying subclass C of sub-exponential distributions, following <ref type="bibr">Chen &amp; Yuen (2009)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>L. Gui et al.</head><p>Definition 1 (Quasi-asymptotic independence). Two non-negative random variables X 1 and X 2 with cumulative distribution functions F 1 and F 2 , are quasi-asymptotically independent if</p><p>More generally, two real-valued random variables, X 1 and X 2 , are quasi-asymptotically indepen-</p><p>When X 1 and X 2 have the same marginal distribution, (2) can be rewritten as pr(X</p><p>Theorem 3.1 in <ref type="bibr">Chen &amp; Yuen (2009)</ref> established the asymptotic tail probability of S n for distributions within C , provided that quasi-asymptotic independence holds.</p><p>Theorem 1 (Theorem 3.1 of <ref type="bibr">Chen &amp; Yuen (2009)</ref>). Let X 1 , . . . , X n be n pairwise quasiasymptotically independent real-valued random variables with distributions F 1 , . . . ,</p><p>The asymptotic equivalence (3) can hold for broader subclasses of heavy-tailed distributions beyond C under stronger dependence assumptions. For instance, Geluk &amp; <ref type="bibr">Tang (2009)</ref> provided the necessary dependence structure requirements for this equivalence to hold for dominatedvarying tailed and long-tailed random variables. Additionally, <ref type="bibr">Asmussen et al. (2011)</ref> verified this for log-normal distributions when coupled with a Gaussian copula. However, <ref type="bibr">Botev &amp; L'Ecuyer (2017)</ref> showed that convergence in (3) can be extremely slow for log-normal distributions, requiring the tail probability to be as small as 10 -233 to achieve reasonable approximations.</p><p>Moreover, researchers have observed asymptotic equivalence between the tail probability of the S n and that of max(X 1 , . . . , X n ).</p><p>Corollary 1. With the same setting as in Theorem 1, the tail probability of the sum and the maximum has the following relationship</p><p>Remark 1. We provide a proof of Corollary 1 in Supplementary S3.2, which essentially restates earlier results <ref type="bibr">(Geluk &amp; Ng, 2006;</ref><ref type="bibr">Tang, 2008;</ref><ref type="bibr">Ko &amp; Tang, 2008)</ref>, to facilitate understanding for interested readers.</p><p>Table <ref type="table">1</ref> presents a list of common distributions in C . All of these distributions also belong to a smaller subclass, the regularly varying tailed distributions R, defined as follows:</p><p>Table <ref type="table">1</ref>: Regularly varying tailed distributions and their tail indices. &#934; is the cumulative distribution function of a standard normal distribution. &#915; is the gamma function. J(s, x) = &#8747; &#8734; x t s-1 e -t dt is the incomplete gamma function and</p><p>dt is the regularized incomplete eta function, Ft (c) is the survival function at c of the corresponding t distribution with the same degree of freedom &#947; Distributions: Survival Function Tail index Support Cauchy:</p><p>Following <ref type="bibr">Cline (1983)</ref>, the parameter &#947; is referred to as the tail index, characterizing the tail heaviness <ref type="bibr">(Teugels et al., 1987)</ref> of a distribution. Distributions with a smaller &#947; exhibit heavier tails. For example, for the Student's t distribution, &#947; is the same as the degree of freedom, with the Cauchy distribution being a special case with &#947; = 1.</p><p>In Table <ref type="table">1</ref>, all distributions, except for the Student's t distributions that includes the Cauchy distribution, have a lower bound in their support. In contrast, the Student's t distributions have symmetric densities around the origin, and their supports cover the entire real line. As a consequence, when p i approaches 1, the transformed test statistics X i can become substantially negative, which may affect both the power and type-I error control in the associated combination tests. To address this issue, we introduce the left-truncated Student's t distribution in Table <ref type="table">1</ref>, defined as a conditional Student's t distribution with a left-bounded support interval of [c, +&#8734;). Specifically, we define F t,&#947; (x) = pr(X x) with X following a Student's t distribution with degree of freedom &#947;. The cumulative distribution function of the left-truncated t distribution is</p><p>With this definition, the left-truncated t distribution remains a regularly varying tailed distribution with the same tail index &#947;, as proved in Proposition S1. In our experiments, we vary the truncation level c by setting c as the 1p 0 quantile of the t distribution with the same tail index &#947;, and we refer to p 0 as the truncation threshold. This approach allows us to explore the effects of different levels of truncation on the performance of combination tests in practice.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Asymptotic validity of the heavy-tailed combination tests</head><p>The asymptotic validity of heavy-tailed transformation-based combination tests can be established based on Theorem 1. In particular, <ref type="bibr">Liu &amp; Xie (2020)</ref> demonstrated the asymptotic validity L. Gui et al.</p><p>of the Cauchy combination test. Extending this work, <ref type="bibr">Fang et al. (2023)</ref> expanded these results to cover regularly varying distributions under additional constraints. However, both results are only limited to two-sided p-values, which are always positively dependent. In this section, we present a unified theory for the asymptotic validity of the heavy-tailed combination tests that accommodates both one-sided and two-sided p-values.</p><p>We first define combination tests applying the sum S n , directly inspired by Theorem 1.</p><p>Definition 4 (Combination test). Let F be the cumulative distribution function of a distribution in R -&#947; . The combination test approximates the tail probability pr(S n &gt; x) by n F(x). Specifically, the combined p-value is defined as n F(S n ), and the corresponding decision function at the significance level &#945; is</p><p>In addition to the sum S n , the widely accepted Cauchy and harmonic combination tests, as introduced by <ref type="bibr">Liu et al. (2019)</ref> and <ref type="bibr">Wilson (2019a)</ref>, utilize the average M n and directly approximate the tail probability pr(M n &gt; x) using F(x). Indeed, any regularly varying tailed distribution with tail index &#947; = 1 can be used to define a similar average-based combination test:</p><p>Definition 5 (Average-based combination test). Let F be the cumulative distribution function of a distribution in R -1 . The average-based combination test approximates the tail probability pr(M n &gt; x) by F(x). Specifically, the combined p-value is defined as F(M n ) and the corresponding decision function at the significance level &#945; is</p><p>More generally, one can define a weighted combination test, which includes both the tests defined in Definitions 4 and 5 as special cases. As noted in <ref type="bibr">Liu &amp; Xie (2020)</ref> and <ref type="bibr">Fang et al. (2023)</ref>, the weighted test can incorporate prior information on the importance of each base hypothesis to enhance power.</p><p>Definition 6 (Weighted combination test). Let F be the cumulative distribution function of a distribution in R -&#947; and let &#236; &#969; = (&#969; 1 , . . . , &#969; n ) &#8712; R n + be a non-random weight vector associated with each hypothesis. Define the weighted sum as S n, &#236; &#969; = n i=1 &#969; i X i and let &#954; = n i=1 &#969; &#947; i where &#969; &#947; i is the &#947;th power of &#969; i . Then the weighted combination test approximates the tail probability pr(S n, &#236; &#969; &gt; x) by &#954; F(x). Specifically, the combined p-value is defined as &#954; F(S n, &#236; &#969; ) and the corresponding decision function at the significance level &#945; is</p><p>Remark 2. The sum-based and average-based combination test in Definitions 4 and 5 are special cases of the weighted combination tests with uniform weights &#969; i = 1 or &#969; i = 1/n. Although the weighted combination test is not scale-free regarding the weights, empirical simulations suggest that the weight scaling has minimal practical impact.</p><p>The asymptotic validity of the combination tests in <ref type="bibr">Liu &amp; Xie (2020)</ref> and <ref type="bibr">Fang et al. (2023)</ref> relies on pairwise bivariate normality of the test statistics {T i } n i=1 , ensuring pairwise quasiasymptotic independence as required by Theorem 1. Under the same assumption, we can establish the asymptotic validity for the combination tests defined in Definitions 4 to 6. Additionally, the asymptotic result is uniform in the nuisance parameters, particularly pairwise correlation &#961; i j s, if we impose mild constraints on them.</p><p>Theorem 2. Assume that the test statistics {T i } n i=1 are pairwise normal with correlations &#961; i j &#8712; [-&#961; 0 , &#961; 0 ] (&#961; 0 &gt; 0) and are marginally following standard normal distributions under the global null. Then, the type-I error of the tests defined in Definitions 4 to 6 using two-sided p-values</p><p>where &#966; F comb is the test's decision function defined in (4) to (6). For the combination tests with onesided p-values</p><p>i=1 , the relationship (7) still holds with an additional assumption that the cumulative distribution function F(&#8226;) satisfies that F(x) F(-x) for sufficiently large x.</p><p>Remark 3. Our analysis considers fixed n. Prior work <ref type="bibr">(Liu &amp; Xie, 2020;</ref><ref type="bibr">Long et al., 2023)</ref> established the asymptotic validity of the Cauchy combination test as n &#8594; &#8734;, assuming n grows at a slower rate than the decay of &#945; &#8594; 0. In addition, Vovk &amp; Wang (2020) introduced an adjusted rejection threshold for the harmonic mean p-value to ensure validity as n &#8594; &#8734;, even under arbitrary dependence among the p-values.</p><p>The asymptotic validity of combination tests hinges on proving the pairwise asymptotic independence of the transformed statistics {X i } n i=1 . Theorem 2 provides a stronger asymptotic validity than previous studies <ref type="bibr">(Liu &amp; Xie, 2020;</ref><ref type="bibr">Fang et al., 2023)</ref> as uniform convergence is guaranteed over the set of correlation matrices. It also imposes minimal distributional requirements on F and further addresses one-sided p-values. Unlike two-sided p-values, which are always non-negatively correlated under bivariate normality as stated in Proposition S2, one-sided p-values can exhibit negative correlations. To establish the test's asymptotic validity, an additional constraint is required that F(x) F(-x) for sufficiently large x. This condition, met by all distributions in Table <ref type="table">1</ref>, ensures that the left tail is either absent or lighter than the right tail.</p><p>Theorem 2 requires no (i, j) pair has perfect correlaion. When &#961; i j = &#177;1, though the transformed statistics X i and X j are no longer quasi-asymptotically independent, a weaker form of asymptotic validity still holds when the tail index &#947; 1, as stated below.</p><p>Corollary 2. Under assumptions of Theorem 2 while allowing &#961; i j = &#177;1 for any (i, j) pairs, if additionally the tail index &#947; 1, then the combination tests defined in Definitions 4 to 6 using two-sided p-values are still asymptotically valid satisfying</p><p>where &#931; = (&#961; i j ) n&#215;n is the correlation matrix of test statistics T i s, and</p><p>As a special case of Corollary 2, when &#961; i j &#8801; 1 for all pairs of test statistics, it holds that Corollary 3. Under conditions of Corollary 2, if &#961; i j &#8801; 1 for all (i, j) pairs, then the combination tests defined in Definitions 4 to 6 using either one-sided or two-sided p-values satisfies</p><p>In particular, when all weights are 1, the limit is n &#947;-1 .</p><p>L. Gui et al.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Asymptotic equivalence to the Bonferroni test</head><p>In this subsection, we explore the relationship between the heavy-tailed combination tests and the Bonferroni test. We begin by defining the weighted Bonferroni test, a generalization of the standard Bonferroni test that incorporates pre-chosen weights.</p><p>Definition 7 (Weighted Bonferroni test). Let P 1 , . . . , P n be the p-values and &#236; &#969; = (&#969; 1 , . . . , &#969; n ) &#8712; R n + be a non-random weight vector satisfying n i=1 &#969; i = 1, then the weighted Bonferroni test at the significance level &#945; has the decision function</p><p>It has been shown that the weighted Bonferroni test controls type-I error under any dependence structure <ref type="bibr">(Genovese et al., 2006)</ref>. The standard Bonferroni test is a special case where &#969; i = 1/n.</p><p>Given that the set</p><p>, the decision function of the Bonferroni test can be rewritten as</p><p>Thus, Corollary 1 implies that the type-I error of the Bonferroni test and the standard combination tests are asymptotically the same. Given this, we investigate whether the combination tests are indeed asymptotically equivalent to the Bonferroni test. We find out that the rejection regions of the weighted combination tests converge to those of a weighted Bonferroni test as &#945; &#8594; 0.</p><p>Theorem 3. Assume that the test statistics {T i } n i=1 are pairwise normal with correlations &#961; i j &#8712; [-&#961; 0 , &#961; 0 ] and have a common marginal variance 1. Means of marginal normals are all finite. Then for two-sided p-values, when &#945; &#8594; 0, any weighted heavy-tailed combination test defined in Definition 6 is asymptotically equivalent to a weighted Bonferroni test. Namely,</p><p>For one-sided p-values, the conclusion retains when further assuming that the cumulative distribution function F(&#8226;) satisfies that F(x) F(-x) for sufficiently large x.</p><p>Theorem 3 establishes the asymptotic equivalence between the combination tests and the Bonferroni test under any hypothesis configuration, provided that the test statistics are pairwise normal and not perfectly correlated. As the significance level &#945; approaches zero, the rejection regions of both the combination tests and the Bonferroni test shrink, and the differences between these rejection regions diminish at a higher order. This equivalence does require that the test statistics are not perfectly correlated, so that they are quasi-asymptotically independent.</p><p>To provide an intuitive understanding of Theorem 3, Fig. <ref type="figure">1</ref> compares the rejection regions of various tests in the test statistics space for two-sided p-values with n = 2. The key takeaway is that the heavy-tailed nature of the transformation distribution yields nearly square rejection regions, which closely resemble those of the Bonferroni test as &#945; decreases. In contrast, for combination tests relying on light-tailed distributions, such as Fisher's combination method, different rejection region shapes persist regardless of how small &#945; becomes. Thus, in the asymptotic regime where these heavy-tailed combination tests are proven valid and when the individual test statistics are not perfectly correlated, there is no power gain over the Bonferroni test. 3. Empirical evaluations of the heavy-tailed combination tests under asymptotic independence 3.1</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>. Empirical validity of the combination tests</head><p>The theoretical results in Section 2 provide valuable insight into the heavy-tailed combination tests. However, it is unclear to what extent these asymptotic results align with their practical performance at finite significance levels. We aim to conduct an empirical evaluation of the tests' validity, focusing on commonly used finite significance levels.</p><p>For a comprehensive study, we vary the significance level &#945;, number of hypotheses, tail heaviness and support of the distribution, and the level of dependence among the p-values. Specifically, we generate test statistics as z-values sampled from a multivariate normal distribution with mean &#236; &#181; = &#236; 0 n and covariance matrix &#931; &#961; . The covariance matrix &#931; &#961; &#8712; R n&#215;n has 1s on the diagonal and a common value &#961; off the diagonal, representing varying degrees of dependence. We assess performance at three values of &#961;, 0, 0.5, and 0.99, in line with no, moderate, and strong dependence. We calculate two-sided p-values from the z-values and conduct the combination tests based on different heavy-tailed distributions from four distribution families, the Student's t, Fr&#233;chet, Pareto, and inverse Gamma distributions. Each family has have a tunable tail index &#947; quantifying the tail heaviness, with a larger &#947; corresponding to a lighter tail. We vary this &#947; from 0.7 to 1.5 by 0.01 for all four distribution families. We also include the Bonferroni test and the Cauchy combination test as baselines. For significance levels, we adopt &#945; = 0.05 and 5 &#215; 10 -4 to account for different testing scenarios. The standard 0.05 is commonly used for a single global null hypothesis, while 5 &#215; 10 -4 reflects the stricter threshold needed in genetic applications, where multiple testing adjustments lower the effective significance level for individual p-values. For the number of hypotheses, we consider n = 5 and 100. Each scenario is replicated 10 6 times to calculate the empirical type-I errors of the tests.</p><p>Figure <ref type="figure">2</ref> and S1 present the results for n = 5 and 100. When &#945; = 0.05 and &#947; = 1, only the Cauchy combination test can strictly control empirical type-I error under independence, and no method achieves strict control when correlation &#961; i j = 0.5. Smaller &#945; improves error control and leads to a flatter curve across &#947;, consistent with the theoretical limit. Regarding the impact of tail heaviness on validity, differences between various distribution families diminish as &#945; decreases, making the empirical validity of the tests primarily dependent on the tail index &#947;. Fig. <ref type="figure">2</ref>: The type-I error of the combination test when n = 5 with different distributions: Cauchy (star point), inverse Gamma (blue), Fr&#233;chet (green), Pareto (purple), student t (red), left-truncated t with truncation threshold p 0 = 0.9 (dark orange), left-truncated t with truncation threshold p 0 = 0.7 (orange), left-truncated t with truncation treshold p 0 = 0.5 (light orange). The vertical axis represents the empirical type-I error, and the horizontal axis stands for the tail index &#947;.</p><p>error control is approximately achieved when &#947; 1. Distribution support also plays a role in type-I error control. Tests based on t distributions, which allow negative transformed statistics, outperform those using distributions with only positive support at &#945; = 0.05. To examine this further, left-truncated t-distributions with different truncation thresholds p 0 = 0.5, 0.7 and 0.9 are adopted. As shown in Fig. <ref type="figure">2</ref> and S1, their empirical type-I errors fall between those of the original t distributions and other distribution families. This suggests that a wider support to the left of the real line tends to reduce the type-I error of the combination tests.</p><p>Additionally, we have investigated the type-I error control of the combination tests when the base p-values are negatively correlated by generating one-sided p-values. Results are shown in Table <ref type="table">S1</ref>. We observe that when the p-values are negatively correlated, the Cauchy combination test can be even more conservative than the Bonferroni test due to its unbounded support. This undesired conservativeness can be mitigated by using a left-truncated t-distribution with a moderate truncation threshold. For more details, see Supplementary Section S1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Empirical comparison with the Bonferroni test</head><p>Theoretically, we have shown that the combination tests are asymptotically equivalent to the Bonferroni test for pairwise normal test statistics. Empirically, we aim to compare their power at finite significance levels and determine how small &#945; needs to be for the asymptotic results to appear. Specifically, we evaluate significance levels &#945; = 0.05 and 5 &#215; 10 -4 while also approximating the asymptotic setting by letting &#945; approach 0.</p><p>We start with assessing the power of the combination tests and the Bonferroni test at finite &#945;s. Specifically, we define power as pr H 1,global (reject global null). We adopt the same simulation settings as in Section 3.1, generating one-sided p-values to obtain both positive and negative correlated p-values. We introduce both sparse and dense signals in the mean vector &#236; &#181; and consider three different numbers of hypotheses n = 5, 20, 100. The dense signals are generated as &#236; &#181; = &#236; &#181; n = (&#181;, &#181;, . . . , &#181;) &#8712; R n . For sparse signals, we employ &#236; and <ref type="figure">&#236;</ref> &#181; = ( &#236; 0 95 , &#236; &#181; 5 ) &#8712; R 100 as signal vectors. The parameter &#181; ranges from 0 to 6, ensuring that all testing methods can reach a power of 1, in increments of 0.5. For the covariance matrix &#931; &#961; , we select &#961; = 0, 0.5, 0.9, 0.99 and also consider the negative correlation &#961; = -0.2, to ensure the covariance matrix is positive definite, for n = 5. Each scenario is replicated 10 6 times to calculate the empirical power of the tests.</p><p>Figure <ref type="figure">3</ref> displays the maximum power difference between the combination tests using the Cauchy, truncated t 1 , Pareto, Fr&#233;chet, and Levy distributions, compared to the Bonferroni test when allowing &#181; to increase until all methods reach a power of 1. The truncation threshold for the t 1 distribution is set at p 0 = 0.9. The Cauchy, truncated t 1 , Fr&#233;chet, and Pareto distributions share a tail index &#947; = 1, whereas the Levy distribution has a tail index of 0.5, resulting in a smaller power difference compared to the Bonferroni test.</p><p>Our findings reveal that combination tests can achieve higher power at finite significance levels, particularly in situations where signals are dense. This remains the case for the Cauchy combination test even when p-values are negatively correlated, a setting in which it tends to be overly conservative. This likely stems from the nature of the combination test, which synthesizes L. Gui et al. signals from multiple sources rather than relying on a single dominant signal. These results suggest that the onset of asymptotic equivalence may occur at much smaller values of &#945; compared to that for asymptotic validity, especially when signals are dense.</p><p>To further investigate the asymptotic equivalence between the combination tests and the Bonferroni test, we examine how the size of their non-overlapping rejection regions evolves as &#945; approaches 0. Using the same settings as earlier in this section with n = 5 and &#961; = 0.5, we fix the signal level &#181; = 2 to ensure the power difference between the two tests is not negligible.</p><p>We consider three mean vectors: &#236; &#181; = &#236; 0 5 (global null),( &#236; 0 4 , 2) (sparse signal), &#236; 2 5 (dense signal), allowing us to compare their performance under different scenarios. As shown in Fig. <ref type="figure">4</ref>, the difference, quantified by the probability ratio between the overlapping rejection region and individual rejection regions, converges to zero as &#945; decreases, being consistent with the asymptotic equivalence established in Theorem 3.</p><p>Since the Bonferroni test is known to suffer under strong dependence, we also compare the combination tests against the adjusted Bonferroni method, minP. Specifically, we calibrate the cutoff for min(p 1 , . . . , p n ) using Monte Carlo sampling from the true data-generating model to ensure the actual type I error matches the nominal level &#945; (Table <ref type="table">S4</ref>). We replicate the simulation settings from Fig. <ref type="figure">3</ref>, replacing Bonferroni with minP as the baseline. As shown in Fig. <ref type="figure">S2</ref>, combination tests outperform minP when signals are dense and test statistics are weakly correlated, consistent with findings in <ref type="bibr">Liu &amp; Xie (2020)</ref>. However, minP relies on knowledge of the dependence structure among p-values, limiting its practicality in many applications and making it computationally intensive.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">The combination test under asymptotic dependence</head><p>Although heavy-tailed combination tests are typically employed when p-values have unknown dependence, they do not guarantee control of the type-I error under arbitrary dependence structures, even asymptotically. One key assumption for ensuring asymptotic type-I error control in Section 2.3 is the requirement of quasi-asymptotic independence, which can be restrictive in practice. For instance, when the sample size is small, test statistics are likely to follow a t-distribution rather than a normal distribution. Additionally, even when the sample size is large, it can still be challenging to ensure that two dependent test statistics are pairwise normal.</p><p>The strength of asymptotic dependence between any two variables (X 1 , X 2 ) with the same marginal distribution F can be quantified by the upper tail dependence coefficient <ref type="bibr">(Joe, 1997)</ref> </p><p>As discussed earlier, if X 1 and X 2 are bivariate normal and are not perfectly correlated, they are quasi-asymptotic independent, and hence &#955; = 0. However, many dependent variables do not satisfy quasi-asymptotic independence. For instance, for bivariate t-distributed variables (T 1 ,T 2 ) with degree of freedom &#957;, variances 1 and correlation &#961;, their tail dependent coefficient <ref type="bibr">(Demarta &amp; McNeil, 2005)</ref> is</p><p>, where t &#957; (&#8226;) is the cumulative distribution function of the t distribution. As a result, T 1 and T 2 are never quasi-asymptotically independent, even when &#961; = 0, due to shared covariance estimation.</p><p>To understand the sensitivity of the combination tests to violations of quasi-asymptotic independence, we generate test statistics (T 1 , . . . ,T n ) from a multivariate t distribution t &#957; ( &#236; 0 n , &#931; &#961; ), where &#931; &#961; is defined in Section 3. We choose an extreme degree of freedom &#957; = 2 and set the correlation &#961; to 0, 0.5, 0.9, and 0.99, resulting in tail dependence indices ranging from 0.18 to 0.91. All base p-values are one-sided and derived from the test statistics.</p><p>Table <ref type="table">2</ref> and S3 compare the empirical type-I errors of different combination tests at the significance level &#945; = 0.05 and 5 &#215; 10 -4 , and for n = 5 and n = 100. Surprisingly, the results indicate that type-I errors remain well-controlled regardless of the tail dependence coefficient, demonstrating the robustness of the combination tests to violations of the pairwise normal assumption for the test statistics.</p><p>Furthermore, Table <ref type="table">2</ref> and S3 suggest that the Bonferroni test tends to be exceedingly conservative when the dependence coefficient &#955; &gt; 0, especially when both n and &#955; are large. In contrast, the combination tests based on heavy-tailed distributions with &#947; = 1 consistently maintain a type-I error rate close to the specified significance level. Thus, we hypothesize that when test statistics are quasi-asymptotically dependent, the combination tests with a tail index &#947; 1 are still asymptotically valid when &#945; &#8594; 0, but they will not be asymptotically equivalent to the Bonferroni test. While the Bonferroni test can exhibit excessive conservatism, the combination tests with &#947; = 1 display neither conservatism nor inflation in their type-I error rates. For example, as discussed in Corollary 2, in situations where test statistics are perfectly correlated with &#961; = 1, resulting in a tail dependence coefficient of &#955; = 1, the combination tests with &#947; = 1 maintain an asymptotic type-I error of &#945;, whereas the true type-I error of the Bonferroni test is only &#945;/n.</p><p>We further investigate the power gain of the combination test over the Bonferroni test when test statistics follow a multivariate t-distribution. Compared to the power comparison in Section 3.2, we replace the distribution of the test statistics from a multivariate normal distribution to a multivariate t distribution with &#957; = 2, while keeping all other settings the same. Figure <ref type="figure">5</ref> displays the maximum power gain of each combination test over the Bonferroni test as the power of both tests grows from 0 to 1 as signal strength increases. Compared to the subtle power improvement we observed for multivariate normally distributed test statistics in Fig. <ref type="figure">3</ref>, the maximum power difference for multivariate t-distributed test statistics can be as large as 1 even when signals are sparse. The power difference does not diminish even when the significance level decreases from 0.05 to 5 &#215; 10 -4 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>L. Gui et al.</head><p>Table <ref type="table">2</ref>: Type-I error control of the combination tests when test statistics follow a multivariate t-distribution when n = 5. Values in parentheses are the corresponding standard errors. For the Fr&#233;chet and Pareto distributions, the tail index &#947; = 1. For truncated t 1 , the truncation threshold p 0 = 0.9</p><p>&#961; Cauchy Pareto Truncated t 1 Fr&#233;chet Levy Bonferroni Fisher 0 0.18 2.90E-02 5.30E-02 4.73E-02 5.17E-02 3.89E-02 3.56E-02 6.26E-02 (1.68E-04) (2.24E-04) (2.21E-04) (1.93E-04) (2.12E-04) (1.85E-04) (2.42E-04) 0.5 0.39 4.48E-02 5.24E-02 5.01E-02 5.13E-02 3.19E-02 2.65E-02 1.16E-01 (2.07E-04) (2.23E-04) (2.18E-04) (2.21E-02) (1.76E-04) (1.61E-04) (3.20E-04) 0.9 0.72 5.00E-02 5.09E-02 5.06E-02 4.99E-02 2.50E-02 1.67E-02 1.51E-01 (2.18E-04) (2.20E-04) (2.19E-04) (2.18E-04) (1.56E-04) (1.28E-04) (3.58E-04) 0.99 0.91 5.02E-02 5.03E-02 5.02E-02 4.92E-02 2.27E-02 1.19E-02 1.59E-01 (2.18E-04) (2.18E-04) (2.18E-04) (2.16E-04) (1.49E-04) (1.09E-04) (3.66E-04) (b) &#945; = 5 &#215; 10 -4 &#961; &#955; 2,&#961; Cauchy Pareto Truncated t 1 Fr&#233;chet Levy Bonferroni Fisher 0 0.18 2.48E-04 4.57E-04 4.57E-04 4.57E-04 3.49E-04 3.18E-04 2.17E-02 (1.57E-05) (2.14E-05) (2.14E-05) (2.14E-05) (1.88E-05) (1.78E-05) (1.46E-04) 0.5 0.39 3.94E-04 4.65E-04 4.65E-04 4.65E-04 3.08E-04 2.67E-04 2.63E-02 (1.98E-05) (2.16E-05) (2.16E-05) (2.16E-05) (1.75E-05) (1.63E-05) (1.60E-04) 0.9 0.72 5.20E-04 5.28E-04 5.28E-04 5.28E-04 2.37E-04 1.65E-04 3.82E-02 (2.28E-05) (2.30E-05) (2.30E-05) (2.30E-05) (1.54E-05) (1.28E-05) (1.92E-04) 0.99 0.91 5.24E-04 5.24E-04 5.24E-04 5.24E-04 2.22E-04 1.16E-04 4.25E-02 (2.29E-05) (2.29E-05) (2.29E-05) (2.29E-05) (1.49E-05) (1.08E-05) (2.02E-04)</p><p>These findings indicate a potential power advantage of the combination tests over the Bonferroni test, even in the asymptotic regime where &#945; &#8594; 0, when test statistics are pairwise asymptotically dependent. Our empirical results indicate that, unlike in the case of asymptotic independence, combination tests can remain asymptotically valid while achieving a nontrivial power improvement over the Bonferroni test under asymptotic dependence. This highlights the potential of asymptotic dependence as a valuable framework for advancing both the theoretical and practical understanding of combination tests.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Real Data Examples</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Circadian rhythm detection</head><p>Circadian rhythms, which are oscillations of behavior, physiology, and metabolism, are observed in almost all living organisms <ref type="bibr">(Pittendrigh, 1960)</ref>. Recent advances in omics technologies, such as microarray and next-generation sequencing, provide powerful platforms for identifying circadian genes that encode molecular clocks crucial for health and diseases <ref type="bibr">(Rijo-Ferreira &amp; Takahashi, 2019)</ref>. In this case study, we focus on a gene expression dataset obtained from mouse liver samples, collected every hour across 48 different circadian time points, denoted as CT points, ranging from CT18 to CT65, under complete darkness conditions <ref type="bibr">(Hughes et al., 2009)</ref>. At each time point, the expression levels of approximately 13,000 mouse genes were profiled by microarray. The objective of this case study is to identify genes that exhibit significant oscillatory behavior by aggregating results across all measured time points. One of the most widely used methods is JTK_CYCLE <ref type="bibr">(Hughes et al., 2010)</ref>. JTK_CYCLE determines whether a gene exhibits significant cyclic behavior by performing a Kendall's tau test. It compares the observed gene expression measurements across 48 time points to expected patterns with specific phases and periods using a rank-based correlation test. This process involves testing 216 combinations of phase and period, resulting in 216 correlated base p-values for each gene. By default, JTK_CYCLE combines these p-values using the Bonferroni test, though this approach has been shown to lack power in benchmarking studies <ref type="bibr">(Mei et al., 2021)</ref>.</p><p>In place of the Bonferroni test, we use the heavy-tailed combination tests to aggregate the 216 correlated p-values for each gene. For comparison, we also include Fisher's method. To assess the performance of different tests, we utilize a set of the 60 positive control, i.e., cyclic genes, and 61 negative control, i.e., non-cyclic genes, from <ref type="bibr">Wu et al. (2014)</ref> as ground truth. Figure <ref type="figure">6</ref> displays the box plots of the combined p-values for the positive and negative controls. Compared to the Bonferroni method, the combined p-values from heavy-tailed combination tests have higher detection power of the true signals, while avoiding false positives in negative controls compared to Fisher's method. Truncated" refers to using the t 1 distribution with truncation threshold p 0 = 0.9. For Fr&#233;chet and Pareto distributions, the tail index is set to &#947; = 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">SNP-based gene level association testing in GWAS</head><p>In the second real data analysis, similar to <ref type="bibr">Liu et al. (2019)</ref>, we combine correlated p-values to identify genes that are significantly associated with diseases in genome-wide association studies, referred to as GWAS for brevity. A gene of interest may contain multiple single-nucleotide polymorphisms, referred to as SNPs, each tested individually against the trait, e.g., disease status, using a simple regression framework, resulting in SNP-level p-values. Then, p-values from the SNPs within the same gene region are further combined via a gene-level test. SNPs that are close to each other on the genome are highly correlated due to linkage disequilibrium, leading to highly correlated SNP-level p-values for the same gene. Several methods have been developed for gene-level association testing, such as EPIC <ref type="bibr">(Wang et al., 2022) and</ref><ref type="bibr">MAGMA (de Leeuw et al., 2015)</ref>, which account for SNP-SNP correlations within the same gene. However, these methods can be computationally intensive. For example, deriving gene-level test statistics in these methods often requires inverting large covariance matrices.</p><p>In this analysis, we apply heavy-tailed combination tests to test for each gene's association with schizophrenia, referred to as SCZ <ref type="bibr">(Ripke et al., 2013)</ref>. To adjust for multiple testing errors, we apply the Benjamini-Hochberg procedure <ref type="bibr">(Benjamini &amp; Hochberg, 1995)</ref> on the gene-level combined p-values to control the false discovery rate, referred to as FDR for simplicity. Figure <ref type="figure">7</ref> shows the number of overlapping genes rejected by each method compared when FDR is controlled at 0.05 and 0.2. As illustrated, the number of genes detected by the combination tests is comparable to or even higher than those identified by Epic and Magma. Notably, the combination tests are highly computationally efficient, completing analyses almost instantly compared to domainspecific methods that require modeling the correlation structure.</p><p>Compared to the Bonferroni test, the combination tests identify 25% more significant genes, even at a low nominal FDR level of &#945; = 0.05. Figure <ref type="figure">S4</ref> summarizes the number of SNPs for each gene, showing that most genes have fewer than 100 SNPs, suggesting that the significant power gain is not due to combining an excessively large number of p-values, which could lead to inflation of type-I errors. Figure <ref type="figure">S5</ref> displays that even when focusing solely on genes with 50 or "Truncated" refers to the truncated t 1 distribution with truncation threshold p 0 = 0.9. For Fr&#233;chet and Pareto distributions, the tail index is set to &#947; = 1.</p><p>fewer SNPs, the combination tests still identify substantially more genes than the Bonferroni test.</p><p>Compared to the simulation results, the substantial power gain in this real data analysis likely results from the violations of quasi-asymptotic independence of the SNP-level p-values. To evaluate whether the additional genes detected by the heavy-tailed combination tests are biologically meaningful, we analyze the set of 939 genes detected at the FDR level &#945; = 0.2 by the Cauchy, truncated t 1 , Fr&#233;chet, or Pareto combination tests but not by the Bonferroni test. We conduct a gene-set enrichment analysis using DAVID <ref type="bibr">(Sherman et al., 2022)</ref>. Results are shown in Fig. <ref type="figure">S6</ref>. The top two significantly enriched gene ontology terms are "regulation of ion transmembrane transport" and "chemical synaptic transmission", both of which have been reported and confirmed by independent studies <ref type="bibr">(Favalli et al., 2012;</ref><ref type="bibr">Liu et al., 2022)</ref>. These findings underscore the enhanced statistical power of transformation tests compared to the Bonferroni test in practical genetic applications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Discussion</head><p>In this section, we examine the extensions and limitations of our results and discuss related literature. The asymptotic validity of the heavy-tailed combination tests can be generalized to cases where p-values are only valid, i.e., they satisfy pr(p &#945;) &#945;, as long as the p-values are pairwise quasi-asymptotically independent. Though the transformed test statistics X i derived from these valid p-values may lack a regularly varying tailed distribution, the combination tests should maintain control over type-I errors. Intuitively, this is because we can always construct uniformly distributed variables that are stochastically smaller than these valid p-values.</p><p>In the context of multiple testing, the combination tests can be applied within a closed testing procedure to identify individual non-null hypotheses. In Supplementary Section S2, we provide a shortcut algorithm for applying closed testing with combination tests. <ref type="bibr">Goeman et al. (2019a)</ref> L. Gui et al. demonstrated that, as n &#8594; +&#8734;, the closed testing procedure using harmonic mean p-values is significantly more powerful than the one based on Bonferroni corrections. However, for finite n and when the family-wise error rate approaches zero, the equivalence between combination tests and the Bonferroni test may extend to their respective closed testing procedures.</p><p>To balance validity and power, we recommend using a truncated t 1 distribution with truncation threshold p 0 = 0.9, based on the empirical results. This definition differs slightly from the truncated Cauchy distribution proposed by <ref type="bibr">Fang et al. (2023)</ref>, which assigns a point mass at the truncation threshold rather than rescaling the distribution. Notably, the half-Cauchy distribution in <ref type="bibr">Long et al. (2023)</ref> is a special case of our definition with p 0 = 0.5. While our focus is on establishing the asymptotic validity of combination tests using the truncated t 1 distribution under an unknown dependence structure, both <ref type="bibr">Fang et al. (2023)</ref> and <ref type="bibr">Long et al. (2023)</ref> have also provided adjustments that ensure exact validity when p-values are independent.</p><p>While our results establish the asymptotic validity of the heavy-tailed combination tests under quasi-asymptotically independent test statistics, the combination tests can exhibit noticeable inflation in type-I error rates under arbitrary dependence and finite &#945;. Exact control over type-I errors may be achieved with additional adjustments. For the harmonic mean pvalues, Vovk &amp; Wang (2020) demonstrated that it is valid under arbitrary dependence when scaled by a factor a n = (y n + n) 2 /(ny n + n) where y n is the unique solution to the equation y 2 n = n {(y n + 1) log(y n + 1)y n }. This factor asymptotically approaches log n when n increases and the test can be further improved through randomization techniques <ref type="bibr">(Gasparin et al., 2024)</ref>.</p><p>Other studies, such as <ref type="bibr">Wilson (2019a)</ref> and subsequent works <ref type="bibr">(Held, 2019;</ref><ref type="bibr">Wilson, 2019b;</ref><ref type="bibr">Goeman et al., 2019b)</ref> have provided empirically calibrated thresholds for harmonic mean p-value. Additionally, Chen et al. ( <ref type="formula">2024</ref>) establish an adjustment of the harmonic mean p-value to guarantee its validity when the individual p-values follow a Clayton copula.</p><p>Numerous alternative methods for combining dependent p-values exist, each with distinct tradeoffs. Some approaches, such as those by <ref type="bibr">Goeman et al. (2004)</ref> and <ref type="bibr">Edelmann et al. (2020)</ref>, model specific dependence structures, which can be powerful but require strong model assumptions and can be computationally intensive. Other methods, like those by <ref type="bibr">Hommel (1983)</ref> and Vovk &amp; Wang (2020), guarantee type-I error control under arbitrary dependence. However, as discussed in earlier studies <ref type="bibr">(Fang et al., 2023;</ref><ref type="bibr">Chen et al., 2023)</ref>, combination methods with proven validity guarantees under arbitrary dependence may have limited power in practical applications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Data and Code Availability</head><p>The R package facilitating the implementation of heavy-tailed combination tests is accessible at <ref type="url">https://github.com/gl-ybnbxb/heavytailcombtest</ref>. The code to reproduce figures and tables is available at <ref type="url">https://github.com/gl-ybnbxb/  combination-test-reproduce-code</ref>. Time-series circadian gene expression data of mouse liver is downloaded from the Gene Expression Omnibus (GEO) database with accession number GSE11923. GWAS summary statistics of schizophrenia (SCZ) is downloaded from the Psychiatric Genomics Consortium at <ref type="url">https://pgc.unc.edu/for-researchers/download-results/</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Ac nowledgement</head><p>This work was supported by the National Science Foundation under grants DMS-2113646 and DMS-2238656, and by the National Institute of Health under grant R35 GM138342. We thank Nancy R. Zhang and Jian Ding for their helpful discussions. We acknowledge the University of</p></div></body>
		</text>
</TEI>
