NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Sequential online subsampling for thinning experimental designs

https://doi.org/10.1016/j.jspi.2020.08.001

Pronzato, Luc; Wang, HaiYing (May 2021, Journal of Statistical Planning and Inference)
null (Ed.)
Full Text Available
Fast Optimal Subsampling Probability Approximation for Generalized Linear Models

https://doi.org/10.1016/j.ecosta.2021.02.007

Lee, JooChul; Schifano, Elizabeth D.; Wang, HaiYing (March 2021, Econometrics and Statistics)
null (Ed.)
Full Text Available
Distributed subdata selection for big data via sampling-based approach

https://doi.org/10.1016/j.csda.2020.107072

Zhang, Haixiang; Wang, HaiYing (January 2021, Computational Statistics & Data Analysis)
null (Ed.)
Full Text Available
OPTIMAL SUBSAMPLING ALGORITHMS FOR BIG DATA REGRESSIONS

https://doi.org/10.5705/ss.202018.0439

Ai, Mingyao; Yu, Jun; Zhang, Huiming; Wang, HaiYing (January 2021, Statistica Sinica)

Full Text Available
Sampling‐based estimation for massive survival data with additive hazards model

https://doi.org/10.1002/sim.8783

Zuo, Lulu; Zhang, Haixiang; Wang, HaiYing; Liu, Lei (November 2020, Statistics in Medicine)

For massive survival data, we propose a subsampling algorithm to efficiently approximate the estimates of regression parameters in the additive hazards model. We establish consistency and asymptotic normality of the subsample‐based estimator given the full data. The optimal subsampling probabilities are obtained via minimizing asymptotic variance of the resulting estimator. The subsample‐based procedure can largely reduce the computational cost compared with the full data method. In numerical simulations, our method has low bias and satisfactory coverage probabilities. We provide an illustrative example on the survival analysis of patients with lymphoma cancer from the Surveillance, Epidemiology, and End Results Program.
more » « less
Online updating method to correct for measurement error in big data streams

https://doi.org/10.1016/j.csda.2020.106976

Lee, JooChul; Wang, HaiYing; Schifano, Elizabeth D. (September 2020, Computational Statistics & Data Analysis)

Full Text Available
Optimal subsampling for quantile regression in big data

https://doi.org/10.1093/biomet/asaa043

Wang, Haiying; Ma, Yanyuan (July 2020, Biometrika)
null (Ed.)
Summary We investigate optimal subsampling for quantile regression. We derive the asymptotic distribution of a general subsampling estimator and then derive two versions of optimal subsampling probabilities. One version minimizes the trace of the asymptotic variance-covariance matrix for a linearly transformed parameter estimator and the other minimizes that of the original parameter estimator. The former does not depend on the densities of the responses given covariates and is easy to implement. Algorithms based on optimal subsampling probabilities are proposed and asymptotic distributions, and the asymptotic optimality of the resulting estimators are established. Furthermore, we propose an iterative subsampling procedure based on the optimal subsampling probabilities in the linearly transformed parameter estimation which has great scalability to utilize available computational resources. In addition, this procedure yields standard errors for parameter estimators without estimating the densities of the responses given the covariates. We provide numerical examples based on both simulated and real data to illustrate the proposed method.
more » « less
Full Text Available
Information-based optimal subdata selection for big data logistic regression

https://doi.org/10.1016/j.jspi.2020.03.004

Cheng, Qianshun; Wang, Haiying; Yang, Min (April 2020, Journal of Statistical Planning and Inference)

Full Text Available
An online updating approach for testing the proportional hazards assumption with streams of survival data

https://doi.org/10.1111/biom.13137

Xue, Yishu; Wang, HaiYing; Yan, Jun; Schifano, Elizabeth D. (November 2019, Biometrics)

Full Text Available
Divide-and-Conquer Information-Based Optimal Subdata Selection Algorithm

https://doi.org/10.1007/s42519-019-0048-5

Wang, HaiYing (September 2019, Journal of Statistical Theory and Practice)

Full Text Available

« Prev Next »

Search for: All records