skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Modified Neighborhood Hypothesis Test for Population Mean in Functional Data
When dealing with very high-dimensional and functional data, rank deficiency of sample covariance matrix often complicates the tests for population mean. To alleviate this rank deficiency problem, Munk et al. (J Multivar Anal 99:815–833, 2008) proposed neighborhood hypothesis testing procedure that tests whether the population mean is within a small, pre-specified neighborhood of a known quantity, M. How could we objectively specify a reasonable neighborhood, particularly when the sample space is unbounded? What should be the size of the neighborhood? In this article, we develop the modified neighborhood hypothesis testing framework to answer these two questions.We define the neighborhood as a proportion of the total amount of variation present in the population of functions under study and proceed to derive the asymptotic null distribution of the appropriate test statistic. Power analyses suggest that our approach is appropriate when sample space is unbounded and is robust against error structures with nonzero mean. We then apply this framework to assess whether the near-default sigmoidal specification of dose-response curves is adequate for widely used CCLE database. Results suggest that our methodology could be used as a pre-processing step before using conventional efficacy metrics, obtained from sigmoid models (for example: IC50 or AUC), as downstream predictive targets.  more » « less
Award ID(s):
2007418
PAR ID:
10467518
Author(s) / Creator(s):
; ; ;
Editor(s):
Mateu, Jorge
Publisher / Repository:
Springer, International Biometric Society
Date Published:
Journal Name:
Journal of Agricultural, Biological and Environmental Statistics
ISSN:
1085-7117
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    In many applications of zero-inflated models, score tests are often used to evaluate whether the population heterogeneity as implied by these models is consistent with the data. The most frequently cited justification for using score tests is that they only require estimation under the null hypothesis. Because this estimation involves specifying a plausible model consistent with the null hypothesis, the testing procedure could lead to unreliable inferences under model misspecification. In this paper, we propose a score test of homogeneity for zero-inflated models that is robust against certain model misspecifications. Due to the true model being unknown in practical settings, our proposal is developed under a general framework of mixture models for which a layer of randomness is imposed on the model to account for uncertainty in the model specification. We exemplify this approach on the class of zero-inflated Poisson models, where a random term is imposed on the Poisson mean to adjust for relevant covariates missing from the mean model or a misspecified functional form. For this example, we show through simulations that the resulting score test of zero inflation maintains its empirical size at all levels, albeit a loss of power for the well-specified non-random mean model under the null. Frequencies of health promotion activities among young Girl Scouts and dental caries indices among inner-city children are used to illustrate the robustness of the proposed testing procedure. 
    more » « less
  2. We are interested in testing general linear hypotheses in a high-dimensional multivariate linear regression model. The framework includes many well-studied problems such as two-sample tests for equality of population means, MANOVA and others as special cases. A family of rotation-invariant tests is proposed that involves a flexible spectral shrinkage scheme applied to the sample error covariance matrix. The asymptotic normality of the test statistic under the null hypothesis is derived in the setting where dimensionality is comparable to sample sizes, assuming the existence of certain moments for the observations. The asymptotic power of the proposed test is studied under various local alternatives. The power characteristics are then utilized to propose a data-driven selection of the spectral shrinkage function. As an illustration of the general theory, we construct a family of tests involving ridge-type regularization and suggest possible extensions to more complex regularizers. A simulation study is carried out to examine the numerical performance of the proposed tests. 
    more » « less
  3. We investigate the impact of low-rank interference on the problem of distinguishing between two seabed types using ambient sound as an acoustic source. The resulting frequency-domain snapshots follow a zero-mean, circularly-symmetric Gaussian distribution, where each seabed type has a unique covariance matrix. Detecting changes in the seabed type across distinct spatial locations can be formulated as a two-sample hypothesis test for equality of covariance, for which Box's M-test is the classical solution. Interference sources such as passing ships result in additive noise with a low-rank covariance that can reduce the performance of hypothesis testing. We first present a method to construct a worst-case interference field, making hypothesis testing as difficult as possible. We then provide an alternating optimization procedure to recover the interference-free covariance matrix. Experiments on synthetic data show that the optimized interferer can greatly reduce hypothesis testing performance, while our recovery method perfectly eliminates this interference for a sufficiently small interference rank. On real data from the New England Shelf Break Acoustics experiment, we show that our approach successfully mitigates interference, allowing for accurate hypothesis testing and improving bottom loss estimation. 
    more » « less
  4. Fan, J; Pan, J. (Ed.)
    Testing whether the mean vector from some population is zero or not is a fundamental problem in statistics. In the high-dimensional regime, where the dimension of data p is greater than the sample size n, traditional methods such as Hotelling’s T2 test cannot be directly applied. One can project the high-dimensional vector onto a space of low dimension and then traditional methods can be applied. In this paper, we propose a projection test based on a new estimation of the optimal projection direction Σ^{−1}μ. Under the assumption that the optimal projection Σ^{−1}μ is sparse, we use a regularized quadratic programming with nonconvex penalty and linear constraint to estimate it. Simulation studies and real data analysis are conducted to examine the finite sample performance of different tests in terms of type I error and power. 
    more » « less
  5. We develop conservative tests for the mean of a bounded population under stratified sampling and apply them to risk-limiting post-election audits. The tests are "anytime valid" under sequential sampling, allowing optional stopping in each stratum. Our core method expresses a global hypothesis about the population mean as a union of intersection hypotheses describing within-stratum means. It tests each intersection hypothesis using independent test supermartingales (TSMs) combined across strata by multiplication. A P-value for each intersection hypothesis is the reciprocal of that test statistic, and the largest P-value in the union is a P-value for the global hypothesis. This approach has two primary moving parts: the rule selecting which stratum to draw from next given the sample so far, and the form of the TSM within each stratum. These rules may vary over intersection hypotheses. We construct the test with the smallest expected stopping time, and present a few strategies for approximating that optimum. Approximately optimal methods are challenging to compute when there are more than two strata, while some simple rules that scale well can be inconsistent -- the resulting test will never reject for some alternatives, no matter how large the sample. We present a set of rules that leads to a computationally tractable test for arbitrarily many strata. In instances that arise in auditing and other applications, its expected sample size is nearly optimal and substantially smaller than that of previous methods. 
    more » « less