Modeling subpopulations for hierarchically structured data

Simpson, Andrew; Michael, Semhar  (ORCID:0000000295019550); Borchert, Dylan; Saunders, Christopher; Tang, Larry

doi:10.1002/sam.11650

Citation Details

Modeling subpopulations for hierarchically structured data

Abstract The field of forensic statistics offers a unique hierarchical data structure in which a population is composed of several subpopulations of sources and a sample is collected from each source. This subpopulation structure creates an additional layer of complexity. Hence, the data has a hierarchical structure in addition to the existence of underlying subpopulations. Finite mixtures are known for modeling heterogeneity; however, previous parameter estimation procedures assume that the data is generated through a simple random sampling process. We propose using a semi‐supervised mixture modeling approach to model the subpopulation structure which leverages the fact that we know the collection of samples came from the same source, yet an unknown subpopulation. A simulation study and a real data analysis based on famous glass datasets and a keystroke dynamic typing data set show that the proposed approach performs better than other approaches that have been used previously in practice. more »

Award ID(s):: 1828492

PAR ID:: 10475416

Author(s) / Creator(s):: Simpson, Andrew ; Michael, Semhar ; Borchert, Dylan ; Saunders, Christopher ; Tang, Larry

Publisher / Repository:: Wiley Blackwell (John Wiley & Sons)

Date Published:: 2023-11-22

Journal Name:: Statistical Analysis and Data Mining: The ASA Data Science Journal

ISSN:: 1932-1864

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Journal Article:
https://doi.org/10.1002/sam.11650

More Like this