Over the past decade, there has been growing enthusiasm for using electronic medical records (EMRs) for biomedical research. Quantile regression estimates distributional associations, providing unique insights into the intricacies and heterogeneity of the EMR data. However, the widespread nonignorable missing observations in EMR often obscure the true associations and challenge its potential for robust biomedical discoveries. We propose a novel method to estimate the covariate effects in the presence of nonignorable missing responses under quantile regression. This method imposes no parametric specifications on response distributions, which subtly uses implicit distributions induced by the corresponding quantile regression models. We show that the proposed estimator is consistent and asymptotically normal. We also provide an efficient algorithm to obtain the proposed estimate and a randomly weighted bootstrap approach for statistical inferences. Numerical studies, including an empirical analysis of real-world EMR data, are used to assess the proposed method's finite-sample performance compared to existing literature.
More Like this
-
Abstract -
Abstract Recently, due to accelerations in urban and industrial development, the health impact of air pollution has become a topic of key concern. Of the various forms of air pollution, fine atmospheric particulate matter (PM2.5; particles less than 2.5 micrometers in diameter) appears to pose the greatest risk to human health. While even moderate levels of PM2.5can be detrimental to health, spikes in PM2.5to atypically high levels are even more dangerous. These spikes are believed to be associated with regionally specific meteorological factors. To quantify these associations, we develop a Bayesian spatiotemporal quantile regression model to estimate the spatially varying effects of meteorological variables purported to be related to PM2.5levels. By adopting a quantile regression model, we are able to examine the entire distribution of PM2.5levels; for example, we are able to identify which meteorological drivers are related to abnormally high PM2.5levels. Our approach uses penalized splines to model the spatially varying meteorological effects and to account for spatiotemporal dependence. The performance of the methodology is evaluated through extensive numerical studies. We apply our modeling techniques to 5 years of daily PM2.5data collected throughout the eastern United States to reveal the effects of various meteorological drivers.