Abstract In this article, we introduce a functional structural equation model for estimating directional relations from multivariate functional data. We decouple the estimation into two major steps: directional order determination and selection through sparse functional regression. We first propose a score function at the linear operator level, and show that its minimization can recover the true directional order when the relation between each function and its parental functions is nonlinear. We then develop a sparse functional additive regression, where both the response and the multivariate predictors are functions and the regression relation is additive and nonlinear. We also propose strategies to speed up the computation and scale up our method. In theory, we establish the consistencies of order determination, sparse functional additive regression, and directed acyclic graph estimation, while allowing both the dimension of the Karhunen–Loéve expansion coefficients and the number of random functions to diverge with the sample size. We illustrate the efficacy of our method through simulations, and an application to brain effective connectivity analysis.
more »
« less
A nested error regression model with high-dimensional parameter for small area estimation
Abstract In this paper, we propose a flexible nested error regression small area model with high-dimensional parameter that incorporates heterogeneity in regression coefficients and variance components. We develop a new robust small area-specific estimating equations method that allows appropriate pooling of a large number of areas in estimating small area-specific model parameters. We propose a parametric bootstrap and jackknife method to estimate not only the mean squared errors but also other commonly used uncertainty measures such as standard errors and coefficients of variation. We conduct both model-based and design-based simulation experiments and real-life data analysis to evaluate the proposed methodology.
more »
« less
- Award ID(s):
- 1758808
- PAR ID:
- 10396408
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Journal of the Royal Statistical Society Series B: Statistical Methodology
- Volume:
- 85
- Issue:
- 2
- ISSN:
- 1369-7412
- Page Range / eLocation ID:
- p. 212-239
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Summary Computerised Record Linkage methods help us combine multiple data sets from different sources when a single data set with all necessary information is unavailable or when data collection on additional variables is time consuming and extremely costly. Linkage errors are inevitable in the linked data set because of the unavailability of error‐free unique identifiers. A small amount of linkage errors can lead to substantial bias and increased variability in estimating parameters of a statistical model. In this paper, we propose a unified theory for statistical analysis with linked data. Our proposed method, unlike the ones available for secondary data analysis of linked data, exploits record linkage process data as an alternative to taking a costly sample to evaluate error rates from the record linkage procedure. A jackknife method is introduced to estimate bias, covariance matrix and mean squared error of our proposed estimators. Simulation results are presented to evaluate the performance of the proposed estimators that account for linkage errors.more » « less
-
Abstract In many categorical response regression applications, the response categories admit a multiresolution structure. That is, subsets of the response categories may naturally be combined into coarser response categories. In such applications, practitioners are often interested in estimating the resolution at which a predictor affects the response category probabilities. In this paper, we propose a method for fitting the multinomial logistic regression model in high dimensions that addresses this problem in a unified and data-driven way. Our method allows practitioners to identify which predictors distinguish between coarse categories but not fine categories, which predictors distinguish between fine categories, and which predictors are irrelevant. For model fitting, we propose a scalable algorithm that can be applied when the coarse categories are defined by either overlapping or nonoverlapping sets of fine categories. Statistical properties of our method reveal that it can take advantage of this multiresolution structure in a way existing estimators cannot. We use our method to model cell-type probabilities as a function of a cell's gene expression profile (i.e., cell-type annotation). Our fitted model provides novel biological insights which may be useful for future automated and manual cell-type annotation methodology.more » « less
-
Abstract Quantile regression for right‐ or left‐censored outcomes has attracted attention due to its ability to accommodate heterogeneity in regression analysis of survival times. Rank‐based inferential methods have desirable properties for quantile regression analysis, but censored data poses challenges to the general concept of ranking. In this article, we propose a notion of censored quantile regression rank scores, which enables us to construct rank‐based tests for quantile regression coefficients at a single quantile or over a quantile region. A model‐based bootstrap algorithm is proposed to implement the tests. We also illustrate the advantage of focusing on a quantile region instead of a single quantile level when testing the effect of certain covariates in a quantile regression framework.more » « less
-
Inbreeding depression can reduce the viability of wild populations. Detecting inbreeding depression in the wild is difficult; developing accurate estimates of inbreeding can be time and labor intensive. In this study, we used a two-step modeling procedure to incorporate uncertainty inherent in estimating individual inbreeding coefficients from multilocus genotypes into estimates of inbreeding depression in a population of Weddell seals (Leptonychotes weddellii). The two-step modeling procedure presented in this paper provides a method for estimating the magnitude of a known source of error, which is assumed absent in classic regression models, and incorporating this error into inferences about inbreeding depression. The method is essentially an errors-in-variables regression with non-normal errors in both the dependent and independent variables. These models, therefore, allow for a better evaluation of the uncertainty surrounding the biological importance of inbreeding depression in non-pedigreed wild populations. For this study we genotyped 154 adult female seals from the population in Erebus Bay, Antarctica, at 29 microsatellite loci, 12 of which are novel. We used a statistical evidence approach to inference rather than hypothesis testing because the discovery of both low and high levels of inbreeding are of scientific interest. We found evidence for an absence of inbreeding depression in lifetime reproductive success, adult survival, age at maturity, and the reproductive interval of female seals in this population.more » « less