skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Effects of Prescreening for Likelihood Ratio Approaches in the Forensic Identification of Source Problems
Prescreening is a methodology where forensic examiners select samples similar to given trace evidence to represent the background population. This background evidence helps assign a value of evidence using a likelihood ratio or Bayes factor. A key advantage of prescreening is its ability to mitigate effects from subpopulation structures within the alternative source population by isolating the relevant subpopulation. This paper examines the impact of prescreening before assigning evidence value. Extensive simulations with synthetic and real data, including trace element and fingerprint score examples, were conducted. The findings indicate that prescreening can provide an accurate evidence value in cases of subpopulation structures but may also yield more extreme or dampened evidence values within specific subpopulations. The study suggests that prescreening is beneficial for presenting evidence relative to the subpopulation of interest, provided the prescreening method and level are transparently reported alongside the evidence value.  more » « less
Award ID(s):
1828492
PAR ID:
10537189
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
SSRN
Date Published:
Format(s):
Medium: X
Institution:
South Dakota State University
Sponsoring Org:
National Science Foundation
More Like this
  1. Avidan, S. (Ed.)
    The subpopulation shifting challenge, known as some subpopulations of a category that are not seen during training, severely limits the classification performance of the state-of-the-art convolutional neural networks. Thus, to mitigate this practical issue, we explore incremental subpopulation learning (ISL) to adapt the original model via incrementally learning the unseen subpopulations without retaining the seen population data. However, striking a great balance between subpopulation learning and seen population forgetting is the main challenge in ISL but is not well studied by existing approaches. These incremental learners simply use a pre-defined and fixed hyperparameter to balance the learning objective and forgetting regularization, but their learning is usually biased towards either side in the long run. In this paper, we propose a novel two-stage learning scheme to explicitly disentangle the acquisition and forgetting for achieving a better balance between subpopulation learning and seen population forgetting: in the first “gain-acquisition” stage, we progressively learn a new classifier based on the margin-enforce loss, which enforces the hard samples and population to have a larger weight for classifier updating and avoid uniformly updating all the population; in the second “counter-forgetting” stage, we search for the proper combination of the new and old classifiers by optimizing a novel objective based on proxies of forgetting and acquisition. We benchmark the representative and state-of-the-art non-exemplar-based incremental learning methods on a large-scale subpopulation shifting dataset for the first time. Under almost all the challenging ISL protocols, we significantly outperform other methods by a large margin, demonstrating our superiority to alleviate the subpopulation shifting problem (Code is released in https://github.com/wuyujack/ISL). 
    more » « less
  2. Abstract We report the methods of and initial scientific inferences from the extraction of precision photometric information for the >800 trans-Neptunian objects (TNOs) discovered in the images of the Dark Energy Survey (DES). Scene-modeling photometry is used to obtain shot-noise-limited flux measures for each exposure of each TNO, with background sources subtracted. Comparison of double-source fits to the pixel data with single-source fits are used to identify and characterize two binary TNO systems. A Markov Chain Monte Carlo method samples the joint likelihood of the intrinsic colors of each source as well as the amplitude of its flux variation, given the time series of multiband flux measurements and their uncertainties. A catalog of these colors and light-curve amplitudesAis included with this publication. We show how to assign a likelihood to the distributionq(A) of light-curve amplitudes in any subpopulation. Using this method, we find decisive evidence (i.e., evidence ratio <0.01) that cold classical (CC) TNOs with absolute magnitude 6 <Hr< 8.2 are more variable than the hot classical (HC) population of the sameHr, reinforcing theories that the former form in situ and the latter arise from a different physical population. Resonant and scattering TNOs in thisHrrange have variability consistent with either the HCs or CCs. DES TNOs withHr< 6 are seen to be decisively less variable than higher-Hrmembers of any dynamical group, as expected. More surprising is that detached TNOs are decisively less variable than scattering TNOs, which requires them to have distinct source regions or some subsequent differential processing. 
    more » « less
  3. Because the average treatment effect (ATE) measures the change in social welfare, even if positive, there is a risk of negative effect on, say, some 10% of the population. Assessing such risk is difficult, however, because any one individual treatment effect (ITE) is never observed, so the 10% worst-affected cannot be identified, whereas distributional treatment effects only compare the first deciles within each treatment group, which does not correspond to any 10% subpopulation. In this paper, we consider how to nonetheless assess this important risk measure, formalized as the conditional value at risk (CVaR) of the ITE distribution. We leverage the availability of pretreatment covariates and characterize the tightest possible upper and lower bounds on ITE-CVaR given by the covariate-conditional average treatment effect (CATE) function. We then proceed to study how to estimate these bounds efficiently from data and construct confidence intervals. This is challenging even in randomized experiments as it requires understanding the distribution of the unknown CATE function, which can be very complex if we use rich covariates to best control for heterogeneity. We develop a debiasing method that overcomes this and prove it enjoys favorable statistical properties even when CATE and other nuisances are estimated by black box machine learning or even inconsistently. Studying a hypothetical change to French job search counseling services, our bounds and inference demonstrate a small social benefit entails a negative impact on a substantial subpopulation. This paper was accepted by J. George Shanthikumar, data science. Funding: This work was supported by the Division of Information and Intelligent Systems [Grant 1939704]. Supplemental Material: The data files and online appendices are available at https://doi.org/10.1287/mnsc.2023.4819 . 
    more » « less
  4. Abstract The distribution of legacy heavy metals in industrial city soils is not well documented. Therefore, fundamental details such as the ‘background’ (i.e., non-road/non-dripline) concentration of trace metals in urban soils are uncertain. While there has been a strong focus on mapping lead contamination near roads and residences, these studies are generally not placed in the context of the urban background. In this study, ‘background’ distributions of urban relevant trace metals: arsenic, cadmium, copper, lead, and zinc were mapped based on soil samples collected throughout Pittsburgh. Distinct spatial patterns were revealed: contamination is elevated in the eastern portion of the study area, driven by dominant wind patterns and historical coking activities in low-lying areas (paleochannels), areas subject to atmospheric temperature inversions that focus air contamination. The mixing analysis revealed spatial structures in contributions of industrial activities to metal soil contamination. In particular, regions enriched in cadmium relative to zinc (i.e., Zn:Cd<317) were located near historical coking operations, and areas enriched in lead relative to zinc (Pb:Zn>1) were located in areas with historical secondary lead smelters. These results suggest a comprehensive accounting of the trace metals concentrations in background soils has important implications for the assessment of exposure risk in populations residing in historically industrial areas. Relatively sparse sampling of background conditions in urban systems can indicate patterns of legacy contamination and attribute this contamination to historical sources. 
    more » « less
  5. Abstract The field of forensic statistics offers a unique hierarchical data structure in which a population is composed of several subpopulations of sources and a sample is collected from each source. This subpopulation structure creates an additional layer of complexity. Hence, the data has a hierarchical structure in addition to the existence of underlying subpopulations. Finite mixtures are known for modeling heterogeneity; however, previous parameter estimation procedures assume that the data is generated through a simple random sampling process. We propose using a semi‐supervised mixture modeling approach to model the subpopulation structure which leverages the fact that we know the collection of samples came from the same source, yet an unknown subpopulation. A simulation study and a real data analysis based on famous glass datasets and a keystroke dynamic typing data set show that the proposed approach performs better than other approaches that have been used previously in practice. 
    more » « less