skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Modeling subpopulations for hierarchically structured data
Abstract The field of forensic statistics offers a unique hierarchical data structure in which a population is composed of several subpopulations of sources and a sample is collected from each source. This subpopulation structure creates an additional layer of complexity. Hence, the data has a hierarchical structure in addition to the existence of underlying subpopulations. Finite mixtures are known for modeling heterogeneity; however, previous parameter estimation procedures assume that the data is generated through a simple random sampling process. We propose using a semi‐supervised mixture modeling approach to model the subpopulation structure which leverages the fact that we know the collection of samples came from the same source, yet an unknown subpopulation. A simulation study and a real data analysis based on famous glass datasets and a keystroke dynamic typing data set show that the proposed approach performs better than other approaches that have been used previously in practice.  more » « less
Award ID(s):
1828492
PAR ID:
10475416
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Statistical Analysis and Data Mining: The ASA Data Science Journal
ISSN:
1932-1864
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Basanta, David (Ed.)
    Tumor heterogeneity is a complex and widely recognized trait that poses significant challenges in developing effective cancer therapies. In particular, many tumors harbor a variety of subpopulations with distinct therapeutic response characteristics. Characterizing this heterogeneity by determining the subpopulation structure within a tumor enables more precise and successful treatment strategies. In our prior work, we developed PhenoPop, a computational framework for unravelling the drug-response subpopulation structure within a tumor from bulk high-throughput drug screening data. However, the deterministic nature of the underlying models driving PhenoPop restricts the model fit and the information it can extract from the data. As an advancement, we propose a stochastic model based on the linear birth-death process to address this limitation. Our model can formulate a dynamic variance along the horizon of the experiment so that the model uses more information from the data to provide a more robust estimation. In addition, the newly proposed model can be readily adapted to situations where the experimental data exhibits a positive time correlation. We test our model on simulated data (in silico) and experimental data (in vitro), which supports our argument about its advantages. 
    more » « less
  2. Prescreening is a methodology where forensic examiners select samples similar to given trace evidence to represent the background population. This background evidence helps assign a value of evidence using a likelihood ratio or Bayes factor. A key advantage of prescreening is its ability to mitigate effects from subpopulation structures within the alternative source population by isolating the relevant subpopulation. This paper examines the impact of prescreening before assigning evidence value. Extensive simulations with synthetic and real data, including trace element and fingerprint score examples, were conducted. The findings indicate that prescreening can provide an accurate evidence value in cases of subpopulation structures but may also yield more extreme or dampened evidence values within specific subpopulations. The study suggests that prescreening is beneficial for presenting evidence relative to the subpopulation of interest, provided the prescreening method and level are transparently reported alongside the evidence value. 
    more » « less
  3. Abstract Understanding how populations respond to spatially heterogeneous habitat disturbance is as critical to conservation as it is challenging. Here, we present a new, free, and open‐source metapopulation model: Dynamic Habitat Disturbance and Ecological Resilience (DyHDER), which incorporates subpopulation habitat condition and connectivity into a population viability analysis framework. Modeling temporally dynamic and spatially explicit habitat disturbance of varying magnitude and duration is accomplished through the use of habitat time‐series data and a mechanistic approach to adjusting subpopulation vital rates. Additionally, DyHDERuses a probabilistic dispersal model driven by site‐specific habitat suitability, density dependence, and directionally dependent connectivity. In the first application of DyHDER, we explore how fragmentation and projected climate change are predicted to impact a well‐studied Bonneville cutthroat trout metapopulation in the Logan River (Utah,USA). The DyHDERmodel predicts which subpopulations are most susceptible to disturbance, as well as the potential interactions between stressors. Further, the model predicts how populations may be expected to redistribute following disturbance. This information is valuable to conservationists and managers faced with protecting populations of conservation concern across landscapes undergoing changing disturbance regimes. The DyHDERmodel provides a valuable and generalizable new tool to explore metapopulation resilience to spatially and temporally dynamic stressors for a diverse range of taxa and ecosystems. 
    more » « less
  4. Avidan, S. (Ed.)
    The subpopulation shifting challenge, known as some subpopulations of a category that are not seen during training, severely limits the classification performance of the state-of-the-art convolutional neural networks. Thus, to mitigate this practical issue, we explore incremental subpopulation learning (ISL) to adapt the original model via incrementally learning the unseen subpopulations without retaining the seen population data. However, striking a great balance between subpopulation learning and seen population forgetting is the main challenge in ISL but is not well studied by existing approaches. These incremental learners simply use a pre-defined and fixed hyperparameter to balance the learning objective and forgetting regularization, but their learning is usually biased towards either side in the long run. In this paper, we propose a novel two-stage learning scheme to explicitly disentangle the acquisition and forgetting for achieving a better balance between subpopulation learning and seen population forgetting: in the first “gain-acquisition” stage, we progressively learn a new classifier based on the margin-enforce loss, which enforces the hard samples and population to have a larger weight for classifier updating and avoid uniformly updating all the population; in the second “counter-forgetting” stage, we search for the proper combination of the new and old classifiers by optimizing a novel objective based on proxies of forgetting and acquisition. We benchmark the representative and state-of-the-art non-exemplar-based incremental learning methods on a large-scale subpopulation shifting dataset for the first time. Under almost all the challenging ISL protocols, we significantly outperform other methods by a large margin, demonstrating our superiority to alleviate the subpopulation shifting problem (Code is released in https://github.com/wuyujack/ISL). 
    more » « less
  5. Abstract Astrophysically motivated population models for binary black hole (BBH) observables are often insufficient to capture the imprints of multiple formation channels. This is mainly due to the strongly parametrized nature of such investigations. Using a nonparametric model for the joint population-level distributions of BBH component masses and effective inspiral spins, we find hints of multiple subpopulations in the third gravitational-wave transient catalog. The higher (more positive) spin subpopulation is found to have a mass spectrum without any feature at in the 30–40Mrange, which is consistent with the predictions of isolated stellar binary evolution, simulations for which place the pileup due to pulsational pair-instability supernovae near 50Mor higher. The other subpopulation with effective spins closer to zero shows a feature at 30–40Mand is consistent with BBHs formed dynamically in globular clusters, which are expected to peak around 30M. We also compute merger rates for these two subpopulations and find that they are consistent with the theoretical predictions of the corresponding formation channels. We validate our results by checking their robustness against variations of several model configurations and by analyzing large simulated catalogs with the same model. 
    more » « less