We introduce a flexible marginal modelling approach for statistical inference for clustered and longitudinal data under minimal assumptions. This estimated estimating equations approach is semiparametric and the proposed models are fitted by quasi-likelihood regression, where the unknown marginal means are a function of the fixed effects linear predictor with unknown smooth link, and variance–covariance is an unknown smooth function of the marginal means. We propose to estimate the nonparametric link and variance–covariance functions via smoothing methods, whereas the regression parameters are obtained via the estimated estimating equations. These are score equations that contain nonparametric function estimates. The proposed estimated estimating equations approach is motivated by its flexibility and easy implementation. Moreover, if data follow a generalized linear mixed model, with either a specified or an unspecified distribution of random effects and link function, the model proposed emerges as the corresponding marginal (population-average) version and can be used to obtain inference for the fixed effects in the underlying generalized linear mixed model, without the need to specify any other components of this generalized linear mixed model. Among marginal models, the estimated estimating equations approach provides a flexible alternative to modelling with generalized estimating equations. Applications of estimated estimating equations include diagnostics and link selection. The asymptotic distribution of the proposed estimators for the model parameters is derived, enabling statistical inference. Practical illustrations include Poisson modelling of repeated epileptic seizure counts and simulations for clustered binomial responses.
We propose a class of semiparametric functional regression models to describe the influence of vector-valued covariates on a sample of response curves. Each observed curve is viewed as the realization of a random process, composed of an overall mean function and random components. The finite dimensional covariates influence the random components of the eigenfunction expansion through single-index models that include unknown smooth link and variance functions. The parametric components of the single-index models are estimated via quasi-score estimating equations with link and variance functions being estimated nonparametrically. We obtain several basic asymptotic results. The functional regression models proposed are illustrated with the analysis of a data set consisting of egg laying curves for 1000 female Mediterranean fruit-flies (medflies).more » « less
- NSF-PAR ID:
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Journal of the Royal Statistical Society Series B: Statistical Methodology
- Page Range / eLocation ID:
- p. 405-423
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
Riverscape genetics, which applies concepts in landscape genetics to riverine ecosystems, lack appropriate quantitative methods that address the spatial autocorrelation structure of linear stream networks and account for bidirectional geneflow. To address these challenges, we present a general framework for the design and analysis of riverscape genetic studies. Our framework starts with the estimation of pairwise genetic distance at sample sites and the development of a spatially structured ecological network (SSEN) on which riverscape covariates are measured. We then introduce the novel bidirectional geneflow in riverscapes (BGR) model that uses principles of isolation‐by‐resistance to quantify the effects of environmental covariates on genetic connectivity, with spatial covariance defined using simultaneous autoregressive models on the SSEN and the generalized Wishart distribution to model pairwise distance matrices arising through a random walk model of geneflow. We highlight the utility of this framework in an analysis of riverscape genetics for brook trout (
Salvelinus fontinalis) in north central Pennsylvania, USA. Using the fixation index ( FST) as the measure of genetic distance, we estimated the effects of 12 riverscape covariates on geneflow by evaluating the relative support of eight competing BGR models. We then compared the performance of the top‐ranked BGR model to results obtained from comparable analyses using multiple regression on distance matrices (MRM) and the program STRUCTURE. We found that the BGR model had more power to detect covariate effects, particularly for variables that were only partial barriers to geneflow and/or uncommon in the riverscape, making it more informative for assessing patterns of population connectivity and identifying threats to species conservation. This case study highlights the utility of our modeling framework over other quantitative methods in riverscape genetics, particularly the ability to rigorously test hypotheses about factors that influence geneflow and probabilistically estimate the effect of riverscape covariates, including stream flow direction. This framework is flexible across taxa and riverine networks, is easily executable, and provides intuitive results that can be used to investigate the likely outcomes of current and future management scenarios.
The nonlinear responses of species to environmental variability can play an important role in the maintenance of ecological diversity. Nonetheless, many models use parametric nonlinear terms which pre‐determine the ecological conclusions. Motivated by this concern, we study the estimate of the second derivative (curvature) of the link function in a functional single index model. Since the coefficient function and the link function are both unknown, the estimate is expressed as a nested optimization. We first estimate the coefficient function by minimizing squared error where the link function is estimated with a Nadaraya‐Watson estimator for each candidate coefficient function. The first and second derivatives of the link function are then estimated via local‐quadratic regression using the estimated coefficient function. In this paper, we derive a convergence rate for the curvature of the nonlinear response. In addition, we prove that the argument of the linear predictor can be estimated root‐
nconsistently. However, practical implementation of the method requires solving a nonlinear optimization problem, and our results show that the estimates of the link function and the coefficient function are quite sensitive to the choices of starting values.
Flood‐frequency curves, critical for water infrastructure design, are typically developed based on a stationary climate assumption. However, climate changes are expected to violate this assumption. Here, we propose a new, climate‐informed methodology for estimating flood‐frequency curves under non‐stationary future climate conditions. The methodology develops an asynchronous, semiparametric local‐likelihood regression (ASLLR) model that relates moments of annual maximum flood to climate variables using the generalized linear model. We estimate the first two marginal moments (MM) – the mean and variance – of the underlying log‐Pearson Type‐3 distribution from the ASLLR with the monthly rainfall and temperature as predictors. The proposed methodology, ASLLR‐MM, is applied to 40 U.S. Geological Survey streamgages covering 18 water resources regions across the conterminous United States. A correction based on the aridity index was applied on the estimated variance, after which the ASLLR‐MM approach was evaluated with both historical (1951–2005) and projected (2006–2035, under RCP4.5 and RCP8.5) monthly precipitation and temperature from eight Global Circulation Models (GCMs) consisting of 39 ensemble members. The estimated flood‐frequency quantiles resulting from the ASLLR‐MM and GCM members compare well with the flood‐frequency quantiles estimated using the historical period of observed climate and flood information for humid basins, whereas the uncertainty in model estimates is higher in arid basins. Considering additional atmospheric and land‐surface conditions and a multi‐level model structure that includes other basins in a region could further improve the model performance in arid basins.
We study the heteroscedastic partially linear single-index model with an unspecified error variance function, which allows for high dimensional covariates in both the linear and the single-index components of the mean function. We propose a class of consistent estimators of the parameters by using a proper weighting strategy. An interesting finding is that the linearity condition which is widely assumed in the dimension reduction literature is not necessary for methodological or theoretical development: it contributes only to the simplification of non-optimal consistent estimation. We also find that the performance of the usual weighted least square type of estimators deteriorates when the non-parametric component is badly estimated. However, estimators in our family automatically provide protection against such deterioration, in that the consistency can be achieved even if the baseline non-parametric function is completely misspecified. We further show that the most efficient estimator is a member of this family and can be easily obtained by using non-parametric estimation. Properties of the estimators proposed are presented through theoretical illustration and numerical simulations. An example on gender discrimination is used to demonstrate and to compare the practical performance of the estimators.