skip to main content


Search for: All records

Award ID contains: 2113404

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    This paper develops a mathematical model and statistical methods to quantify trends in presence/absence observations of snow cover (not depths) and applies these in an analysis of Northern Hemispheric observations extracted from satellite flyovers during 1967–2021. A two-state Markov chain model with periodic dynamics is introduced to analyze changes in the data in a cell by cell fashion. Trends, converted to the number of weeks of snow cover lost/gained per century, are estimated for each study cell. Uncertainty margins for these trends are developed from the model and used to assess the significance of the trend estimates. Cells with questionable data quality are explicitly identified. Among trustworthy cells, snow presence is seen to be declining in almost twice as many cells as it is advancing. While Arctic and southern latitude snow presence is found to be rapidly receding, other locations, such as eastern Canada, are experiencing advancing snow cover.

    Significance Statement

    This project quantifies how the Northern Hemisphere’s snow cover has recently changed. Snow cover plays a critical role in the global energy balance due to its high albedo and insulating characteristics and is therefore a prominent indicator of climate change. On a regional scale, the spatial consistency of snow cover influences surface temperatures via variations in absorbed solar radiation, while continental-scale snow cover acts to maintain thermal stability in the Arctic and subarctic regions, leading to spatial and temporal impacts on global circulation patterns. Changing snow presence in Arctic regions could influence large-scale releases of carbon and methane gas. Given the importance of snow cover, understanding its trends enhances our understanding of climate change.

     
    more » « less
  2. Abstract

    The use of likelihood ratios for quantifying the strength of forensic evidence in criminal cases is gaining widespread acceptance in many forensic disciplines. Although some forensic scientists feel that subjective likelihood ratios are a reasonable way of expressing expert opinion regarding strength of evidence in criminal trials, legal requirements of reliability of expert evidence in the United Kingdom, United States and some other countries have encouraged researchers to develop likelihood ratio systems based on statistical modelling using relevant empirical data. Many such systems exhibit exceptional power to discriminate between the scenario presented by the prosecution and an alternate scenario implying the innocence of the defendant. However, such systems are not necessarily well calibrated. Consequently, verbal explanations to triers of fact, by forensic experts, of the meaning of the offered likelihood ratio may be misleading. In this article, we put forth a statistical approach for testing the calibration discrepancy of likelihood ratio systems using ground truth known empirical data. We provide point estimates as well as confidence intervals for the calibration discrepancy. Several examples, previously discussed in the literature, are used to illustrate our method. Results from a limited simulation study concerning the performance of the proposed approach are also provided.

     
    more » « less
  3. Abstract Background Modeling of single cell RNA-sequencing (scRNA-seq) data remains challenging due to a high percentage of zeros and data heterogeneity, so improved modeling has strong potential to benefit many downstream data analyses. The existing zero-inflated or over-dispersed models are based on aggregations at either the gene or the cell level. However, they typically lose accuracy due to a too crude aggregation at those two levels. Results We avoid the crude approximations entailed by such aggregation through proposing an independent Poisson distribution (IPD) particularly at each individual entry in the scRNA-seq data matrix. This approach naturally and intuitively models the large number of zeros as matrix entries with a very small Poisson parameter. The critical challenge of cell clustering is approached via a novel data representation as Departures from a simple homogeneous IPD (DIPD) to capture the per-gene-per-cell intrinsic heterogeneity generated by cell clusters. Our experiments using real data and crafted experiments show that using DIPD as a data representation for scRNA-seq data can uncover novel cell subtypes that are missed or can only be found by careful parameter tuning using conventional methods. Conclusions This new method has multiple advantages, including (1) no need for prior feature selection or manual optimization of hyperparameters; (2) flexibility to combine with and improve upon other methods, such as Seurat. Another novel contribution is the use of crafted experiments as part of the validation of our newly developed DIPD-based clustering pipeline. This new clustering pipeline is implemented in the R (CRAN) package scpoisson . 
    more » « less
    Free, publicly-accessible full text available December 1, 2024
  4. Abstract Model systems are an essential resource in cancer research. They simulate effects that we can infer into humans, but come at a risk of inaccurately representing human biology. This inaccuracy can lead to inconclusive experiments or misleading results, urging the need for an improved process for translating model system findings into human-relevant data. We present a process for applying joint dimension reduction (jDR) to horizontally integrate gene expression data across model systems and human tumor cohorts. We then use this approach to combine human TCGA gene expression data with data from human cancer cell lines and mouse model tumors. By identifying the aspects of genomic variation joint-acting across cohorts, we demonstrate how predictive modeling and clinical biomarkers from model systems can be improved. 
    more » « less
    Free, publicly-accessible full text available December 1, 2024
  5. Free, publicly-accessible full text available August 1, 2024
  6. Steed et al . ( 1 ) illustrates the crucial impact that the quality of official statistical data products may exert on the accuracy, stability, and equity of policy decisions on which they are based. The authors remind us that data, however responsibly curated, can be fallible. With this comment, we underscore the importance of conducting principled quality assessment of official statistical data products. We observe that the quality assessment procedure employed by Steed et al . needs improvement, due to (i) the inadmissibility of the estimator used, and (ii) the inconsistent probability model it induces on the joint space of the estimator and the observed data. We discuss the design of alternative statistical methods to conduct principled quality assessments for official statistical data products, showcasing two simulation-based methods for admissible minimax shrinkage estimation via multilevel empirical Bayesian modeling. For policymakers and stakeholders to accurately gauge the context-specific usability of data, the assessment should take into account both uncertainty sources inherent to the data and the downstream use cases, such as policy decisions based on those data products. 
    more » « less
    Free, publicly-accessible full text available June 2, 2024
  7. Free, publicly-accessible full text available June 1, 2024
  8. Data matrix centering is an ever-present yet under-examined aspect of data analysis. Functional data analysis (FDA) often operates with a default of centering such that the vectors in one dimension have mean zero. We find that centering along the other dimension identifies a novel useful mode of variation beyond those familiar in FDA. We explore ambiguities in both matrix orientation and nomenclature. Differences between centerings and their potential interaction can be easily misunderstood. We propose a unified framework and new terminology for centering operations. We clearly demonstrate the intuition behind and consequences of each centering choice with informative graphics. We also propose a new direction energy hypothesis test as part of a series of diagnostics for determining which choice of centering is best for a data set. We explore the application of these diagnostics in several FDA settings. 
    more » « less
  9. Advanced genomic and molecular profiling technologies accelerated the enlightenment of the regulatory mechanisms behind cancer development and progression, and the targeted therapies in patients. Along this line, intense studies with immense amounts of biological information have boosted the discovery of molecular biomarkers. Cancer is one of the leading causes of death around the world in recent years. Elucidation of genomic and epigenetic factors in Breast Cancer (BRCA) can provide a roadmap to uncover the disease mechanisms. Accordingly, unraveling the possible systematic connections between-omics data types and their contribution to BRCA tumor progression is crucial. In this study, we have developed a novel machine learning (ML) based integrative approach for multi-omics data analysis. This integrative approach combines information from gene expression (mRNA), microRNA (miRNA) and methylation data. Due to the complexity of cancer, this integrated data is expected to improve the prediction, diagnosis and treatment of disease through patterns only available from the 3-way interactions between these 3-omics datasets. In addition, the proposed method bridges the interpretation gap between the disease mechanisms that drive onset and progression. Our fundamental contribution is the 3 Multi-omics integrative tool (3Mint). This tool aims to perform grouping and scoring of groups using biological knowledge. Another major goal is improved gene selection via detection of novel groups of cross-omics biomarkers. Performance of 3Mint is assessed using different metrics. Our computational performance evaluations showed that the 3Mint classifies the BRCA molecular subtypes with lower number of genes when compared to the miRcorrNet tool which uses miRNA and mRNA gene expression profiles in terms of similar performance metrics (95% Accuracy). The incorporation of methylation data in 3Mint yields a much more focused analysis. The 3Mint tool and all other supplementary files are available at https://github.com/malikyousef/3Mint/ . 
    more » « less