A general framework for regression with mismatched data based on mixture modelling

Slawski, Martin; West, Brady T; Bukke, Priyanjali; Wang, Zhenbang; Diao, Guoqing; Ben-David, Emanuel

doi:10.1093/jrsssa/qnae083

Citation Details

A general framework for regression with mismatched data based on mixture modelling

Abstract The advent of the information age has revolutionized data collection and has led to a rapid expansion of available data sources. Methods of data integration are indispensable when a question of interest cannot be addressed using a single data source. Record linkage (RL) is at the forefront of such data integration efforts. Incentives for sharing linked data for secondary analysis have prompted the need for methodology accounting for possible errors at the RL stage. Mismatch error is a common consequence resulting from the use of nonunique or noisy identifiers at that stage. In this paper, we present a framework to enable valid postlinkage inference in the secondary analysis setting in which only the linked file is given. The proposed framework covers a variety of statistical models and can flexibly incorporate information about the underlying RL process. We propose a mixture model for linked records whose two components reflect distributions conditional on match status, i.e. correct or false match. Regarding inference, we develop a method based on composite likelihood and the expectation-maximization algorithm that is implemented in the R package pldamixture. Extensive simulations and case studies involving contemporary RL applications corroborate the effectiveness of our framework. more »

Award ID(s):: 2120318

PAR ID:: 10647990

Author(s) / Creator(s):: Slawski, Martin; West, Brady T; Bukke, Priyanjali; Wang, Zhenbang; Diao, Guoqing; Ben-David, Emanuel

Publisher / Repository:: Oxford University Press

Date Published:: 2024-08-26

Journal Name:: Journal of the Royal Statistical Society Series A: Statistics in Society

Volume:: 188

Issue:: 3

ISSN:: 0964-1998

Page Range / eLocation ID:: 896 to 919

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1093/jrsssa/qnae083

More Like this