Sparse Data Reconstruction, Missing Value and Multiple Imputation through Matrix Factorization

Sengupta, Nandana  (ORCID:0000000238747308); Udell, Madeleine; Srebro, Nathan; Evans, James  (ORCID:0000000198380707)

doi:10.1177/00811750221125799

Citation Details

Sparse Data Reconstruction, Missing Value and Multiple Imputation through Matrix Factorization

Social science approaches to missing values predict avoided, unrequested, or lost information from dense data sets, typically surveys. The authors propose a matrix factorization approach to missing data imputation that (1) identifies underlying factors to model similarities across respondents and responses and (2) regularizes across factors to reduce their overinfluence for optimal data reconstruction. This approach may enable social scientists to draw new conclusions from sparse data sets with a large number of features, for example, historical or archival sources, online surveys with high attrition rates, or data sets created from Web scraping, which confound traditional imputation techniques. The authors introduce matrix factorization techniques and detail their probabilistic interpretation, and they demonstrate these techniques’ consistency with Rubin’s multiple imputation framework. The authors show via simulations using artificial data and data from real-world subsets of the General Social Survey and National Longitudinal Study of Youth cases for which matrix factorization techniques may be preferred. These findings recommend the use of matrix factorization for data reconstruction in several settings, particularly when data are Boolean and categorical and when large proportions of the data are missing. more »

Award ID(s):: 1934843

PAR ID:: 10376578

Author(s) / Creator(s):: Sengupta, Nandana ; Udell, Madeleine ; Srebro, Nathan ; Evans, James

Publisher / Repository:: SAGE Publications

Date Published:: 2022-10-22

Journal Name:: Sociological Methodology

Volume:: 53

Issue:: 1

ISSN:: 0081-1750

Format(s):: Medium: X Size: p. 72-114

Size(s):: p. 72-114

Sponsoring Org:: National Science Foundation

Journal Article:
https://doi.org/10.1177/00811750221125799

More Like this