Abstract This article proposes a new statistical model to infer interpretable population-level preferences from ordinal comparison data. Such data is ubiquitous, e.g., ranked choice votes, top-10 movie lists, and pairwise sports outcomes. Traditional statistical inference on ordinal comparison data results in an overall ranking of objects, e.g., from best to worst, with each object having a unique rank. However, the ranks of some objects may not be statistically distinguishable. This could happen due to insufficient data or to the true underlying object qualities being equal. Because uncertainty communication in estimates of overall rankings is notoriously difficult, we take a different approach and allow groups of objects to have equal ranks or berank-clusteredin our model. Existing models related to rank-clustering are limited by their inability to handle a variety of ordinal data types, to quantify uncertainty, or by the need to pre-specify the number and size of potential rank-clusters. We solve these limitations through our proposed BayesianRank-Clustered Bradley–Terry–Luce (BTL)model. We accommodate rank-clustering via parameter fusion by imposing a novel spike-and-slab prior on object-specific worth parameters in the BTL family of distributions for ordinal comparisons. We demonstrate rank-clustering on simulated and real datasets in surveys, elections, and sports analytics.
more »
« less
Predicting distributions of physical activity profiles in the National Health and Nutrition Examination Survey database using a partially linear Fréchet single index model
Summary Object-oriented data analysis is a fascinating and evolving field in modern statistical science, with the potential to make significant contributions to biomedical applications. This statistical framework facilitates the development of new methods to analyze complex data objects that capture more information than traditional clinical biomarkers. This paper applies the object-oriented framework to analyze physical activity levels, measured by accelerometers, as response objects in a regression model. Unlike traditional summary metrics, we utilize a recently proposed representation of physical activity data as a distributional object, providing a more nuanced and complete profile of individual energy expenditure across all ranges of monitoring intensity. A novel hybrid Fréchet regression model is proposed and applied to US population accelerometer data from National Health and Nutrition Examination Survey (NHANES) 2011 to 2014. The semi-parametric nature of the model allows for the inclusion of nonlinear effects for critical variables, such as age, which are biologically known to have subtle impacts on physical activity. Simultaneously, the inclusion of linear effects preserves interpretability for other variables, particularly categorical covariates such as ethnicity and sex. The results obtained are valuable from a public health perspective and could lead to new strategies for optimizing physical activity interventions in specific American subpopulations.
more »
« less
- Award ID(s):
- 2310943
- PAR ID:
- 10620625
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Biostatistics
- Volume:
- 26
- Issue:
- 1
- ISSN:
- 1468-4357
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Background/Objective: Environmental exposures, such as heavy metals, can significantly affect physical activity, an important determinant of health. This study explores the effect of physical activity on combined exposure to cadmium, lead, and mercury (metals), using data from the 2013–2014 National Health and Nutrition Examination Survey (NHANES). Methods: Physical activity was measured with ActiGraph GT3X+ devices worn continuously for 7 days, while blood samples were analyzed for metal content using inductively coupled plasma mass spectrometry. Descriptive statistics and multivariable linear regression were used to assess the impact of multi-metal exposure on physical activity. Additionally, Bayesian Kernel Machine Regression (BKMR) was applied to explore nonlinear and interactive effects of metal exposures on physical activity. Using a Gaussian process with a radial basis function kernel, BKMR estimates posterior distributions via Markov Chain Monte Carlo (MCMC) sampling, allowing for robust evaluation of individual and combined exposure-response relationships. Posterior Inclusion Probabilities (PIPs) were calculated to quantify the relative importance of each metal. Results: The linear regression analysis revealed positive associations between cadmium and lead exposure and physical activity. BKMR analysis, particularly the PIP, identified lead as the most influential metal in predicting physical activity, followed by cadmium and mercury. These PIP values provide a probabilistic measure of each metal’s importance, offering deeper insights into their relative contributions to the overall exposure effect. The study also uncovered complex relationships between metal exposures and physical activity. In univariate BKMR exposure-response analysis, lead and cadmium generally showed positive associations with physical activity, while mercury exhibited a slightly negative relationship. Bivariate exposure-response analysis further illustrated how the impact of one metal could be influenced by the presence and levels of another, confirming the trends observed in univariate analyses while also demonstrating the complexity varying doses of two metals can have on either increased or decreased physical activity. Additionally, the overall exposure effect analysis across different quantiles revealed that higher levels of combined metal exposures were associated with increased physical activity, though there was greater uncertainty at higher exposure levels as the 95% credible intervals were wider. Conclusions: Overall, this study fills a critical gap by investigating the interactive and combined effects of multiple metals on physical activity. The findings underscore the necessity of using advanced methods such as BKMR to capture the complex dynamics of environmental exposures and their impact on human behavior and health outcomes.more » « less
-
Abstract Background Single-cell RNA-sequencing (scRNA-seq) technologies allow for the study of gene expression in individual cells. Often, it is of interest to understand how transcriptional activity is associated with cell-specific covariates, such as cell type, genotype, or measures of cell health. Traditional approaches for this type of association mapping assume independence between the outcome variables (or genes), and perform a separate regression for each. However, these methods are computationally costly and ignore the substantial correlation structure of gene expression. Furthermore, count-based scRNA-seq data pose challenges for traditional models based on Gaussian assumptions. Results We aim to resolve these issues by developing a reduced-rank regression model that identifies low-dimensional linear associations between a large number of cell-specific covariates and high-dimensional gene expression readouts. Our probabilistic model uses a Poisson likelihood in order to account for the unique structure of scRNA-seq counts. We demonstrate the performance of our model using simulations, and we apply our model to a scRNA-seq dataset, a spatial gene expression dataset, and a bulk RNA-seq dataset to show its behavior in three distinct analyses. Conclusion We show that our statistical modeling approach, which is based on reduced-rank regression, captures associations between gene expression and cell- and sample-specific covariates by leveraging low-dimensional representations of transcriptional states.more » « less
-
Summary CRISPR genome engineering and single-cell RNA sequencing have accelerated biological discovery. Single-cell CRISPR screens unite these two technologies, linking genetic perturbations in individual cells to changes in gene expression and illuminating regulatory networks underlying diseases. Despite their promise, single-cell CRISPR screens present considerable statistical challenges. We demonstrate through theoretical and real data analyses that a standard method for estimation and inference in single-cell CRISPR screens—“thresholded regression”—exhibits attenuation bias and a bias-variance tradeoff as a function of an intrinsic, challenging-to-select tuning parameter. To overcome these difficulties, we introduce GLM-EIV (“GLM-based errors-in-variables”), a new method for single-cell CRISPR screen analysis. GLM-EIV extends the classical errors-in-variables model to responses and noisy predictors that are exponential family-distributed and potentially impacted by the same set of confounding variables. We develop a computational infrastructure to deploy GLM-EIV across hundreds of processors on clouds (e.g. Microsoft Azure) and high-performance clusters. Leveraging this infrastructure, we apply GLM-EIV to analyze two recent, large-scale, single-cell CRISPR screen datasets, yielding several new insights.more » « less
-
Abstract BackgroundUndergraduate students consistently struggle with mastering concepts related to thermodynamics. Prior work has shown that haptic technology and intensive hands‐on workshops help improve learning outcomes relative to traditional lecture‐based thermodynamics instruction. The current study takes a more feasible approach to improving thermal understanding by incorporating simple mechanical objects into individual problem‐solving exercises. Purpose/HypothesesThis study tests the impact of simple mechanical objects on learning outcomes (specifically, problem‐solving performance and conceptual understanding) for third‐year undergraduate engineering students in a thermodynamics course across a semester. Design/MethodDuring the semester, 119 engineering students in two sections of an undergraduate thermodynamics course completed three 15‐min, self‐guided problem‐solving tasks, one section without and the other with a simple and relevant physical object. Performance on the tasks and improvements in thermodynamics comprehension (measured via Thermal and Transport Concept Inventory scores) were compared between the two sections. ResultsStudents who had a simple, relevant object available to solve three thermodynamics problems consistently outperformed their counterparts without objects, although only to statistical significance when examining the simple effects for the third problem. At the end of the semester, students who had completed the tasks with the objects displayed significantly greater improvements in thermodynamics comprehension than their peers without the relevant object. Higher mechanical aptitude facilitated the beneficial effect of object availability on comprehension improvements. ConclusionFindings suggest that the incorporation of simple mechanical objects into active learning exercises in thermodynamics curricula could facilitate student learning in thermodynamics and potentially other abstract domains.more » « less
An official website of the United States government

