NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Model-assisted calibration estimation using generalized entropy calibration in survey sampling

Kim, Jae kwang; Kwon, Yonghyun; Qiu, Yumou; Park, Junyong (June 2025, Survey Methodology)

Free, publicly-accessible full text available June 30, 2026
Semiparametric adaptive estimation under informative sampling

https://doi.org/10.1214/25-AOS2509

Morikawa, Kosuke; Terada, Yoshikazu; Kim, Jae Kwang (June 2025, The Annals of Statistics)

Free, publicly-accessible full text available June 1, 2026
Information projection approach to smoothed propensity score weighting for handling selection bias under missing at random

https://doi.org/10.1007/s10463-024-00913-w

Wang, Hengfang; Kim, Jae Kwang (February 2025, Annals of the Institute of Statistical Mathematics)

Free, publicly-accessible full text available February 1, 2026
Robust propensity score weighting estimation under missing at random

https://doi.org/10.1214/24-EJS2263

Wang, Hengfang; Kim, Jae Kwang; Han, Jeongseop; Lee, Youngjo (January 2024, Electronic Journal of Statistics)

Full Text Available
Ultra Data-Oriented Parallel Fractional Hot-Deck Imputation With Efficient Linearized Variance Estimation

https://doi.org/10.1109/TKDE.2023.3249567

Yang, Yicheng; Kwon, Yonghyun; Kim, Jae Kwang; Cho, In Ho (September 2023, IEEE Transactions on Knowledge and Data Engineering)

Full Text Available
An Empirical Likelihood Approach to Reduce Selection Bias in Voluntary Samples

https://doi.org/10.1177/00080683231186488

Kim, Jae Kwang; Morikawa, Kosuke (May 2023, Calcutta Statistical Association Bulletin)

How to construct the pseudo-weights in voluntary samples is an important practical problem in survey sampling. The problem is quite challenging when the sampling mechanism for the voluntary sample is allowed to be non-ignorable. Under the assumption that the sample participation model is correctly specified, we can compute a consistent estimator of the model parameter and construct the propensity score estimator of the population mean. We propose using the empirical likelihood method to construct the final weights for voluntary samples by incorporating the bias calibration constraints and the benchmarking constraints. Linearization variance estimation of the proposed method is developed. A toy example is also presented to illustrate the idea and the computational details. A limited simulation study is also performed to evaluate the performance of the proposed methods.
more » « less
Full Text Available
Statistical inference using regularized M-estimation in the reproducing kernel Hilbert space for handling missing data

https://doi.org/10.1007/s10463-023-00872-8

Wang, Hengfang; Kim, Jae Kwang (April 2023, Annals of the Institute of Statistical Mathematics)

Full Text Available
Review: Evolution of Fractional Hot Deck Imputation for Curing Incomplete Data-From Small to Ultra Large Sizes

https://doi.org/10.5121/csit.2023.131315

Cho, In Ho; Kim, Jae-Kwang; Yang, Yicheng; Kwon, Yonghyun; Chapagain, Ashish (July 2023, International Conference on Computer Science and Information Technology)

Machine learning (ML) advancements hinge upon data - the vital ingredient for training. Statistically-curing the missing data is called imputation, and there are many imputation theories and tools. Butthey often require difficult statistical and/or discipline-specific assumptions, lacking general tools capable of curing large data. Fractional hot deck imputation (FHDI) can cure data by filling nonresponses with observed values (thus, hot-deck) without resorting to assumptions. The review paper summarizes how FHDI evolves to ultra dataoriented parallel version (UP-FHDI).Here, ultra data have concurrently large instances (bign) and high dimensionality (big-p). The evolution is made possible with specialized parallelism and fast variance estimation technique. Validations with scientific and engineering data confirm that UP-FHDI can cure ultra data(p >10,000& n > 1M), and the cured data sets can improve the prediction accuracy of subsequent ML. The evolved FHDI will help promote reliable ML with cured big data.
more » « less
Full Text Available
Soft calibration for selection bias problems under mixed-effects models

https://doi.org/10.1093/biomet/asad016

Gao, Chenyin; Yang, Shu; Kim, Jae Kwang (March 2023, Biometrika)

Abstract Calibration weighting has been widely used to correct selection biases in nonprobability sampling, missing data and causal inference. The main idea is to calibrate the biased sample to the benchmark by adjusting the subject weights. However, hard calibration can produce enormous weights when an exact calibration is enforced on a large set of extraneous covariates. This article proposes a soft calibration scheme, where the outcome and the selection indicator follow mixed-effect models. The scheme imposes an exact calibration on the fixed effects and an approximate calibration on the random effects. On the one hand, our soft calibration has an intrinsic connection with best linear unbiased prediction, which results in a more efficient estimation compared to hard calibration. On the other hand, soft calibration weighting estimation can be envisioned as penalized propensity score weight estimation, with the penalty term motivated by the mixed-effect structure. The asymptotic distribution and a valid variance estimator are derived for soft calibration. We demonstrate the superiority of the proposed estimator over other competitors in simulation studies and using a real-world data application on the effect of BMI screening on childhood obesity.
more » « less
Full Text Available
A note on weight smoothing in survey sampling

Kim, Jae Kwang; Wang, HaiYing (January 2023, Survey methodology)

Full Text Available

« Prev Next »

Search for: All records