skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Using Machine Learning to Test Causal Hypotheses in Conjoint Analysis
Abstract Conjoint analysis is a popular experimental design used to measure multidimensional preferences. Many researchers focus on estimating the average marginal effects of each factor while averaging over the other factors. Although this allows for straightforward design-based estimation, the results critically depend on the ways in which factors interact with one another. An alternative model-based approach can compute various quantities of interest, but requires correct model specifications, a challenging task for conjoint analysis with many factors. We propose a new hypothesis testing approach based on the conditional randomization test (CRT) to answer the most fundamental question of conjoint analysis: Does a factor of interest matterin any waygiven the other factors? Although it only provides a formal test of these binary questions, the CRT is solely based on the randomization of factors, and hence requires no modeling assumption. This means that the CRT can provide a powerful and assumption-free statistical test by enabling the use of any test statistic, including those based on complex machine learning algorithms. We also show how to test commonly used regularity assumptions. Finally, we apply the proposed methodology to conjoint analysis of immigration preferences. An open-source software package is available for implementing the proposed methodology. The proposed methodology is implemented via an open-source software R packageCRTConjoint, available through the Comprehensive R Archive Networkhttps://cran.r-project.org/web/packages/CRTConjoint/index.html.  more » « less
Award ID(s):
2045981
PAR ID:
10519403
Author(s) / Creator(s):
; ;
Publisher / Repository:
Cambridge University Press
Date Published:
Journal Name:
Political Analysis
ISSN:
1047-1987
Page Range / eLocation ID:
1 to 16
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Schwartz, Russell (Ed.)
    Abstract Motivation While gene–environment (GxE) interactions contribute importantly to many different phenotypes, detecting such interactions requires well-powered studies and has proven difficult. To address this, we combine two approaches to improve GxE power: simultaneously evaluating multiple phenotypes and using a two-step analysis approach. Previous work shows that the power to identify a main genetic effect can be improved by simultaneously analyzing multiple related phenotypes. For a univariate phenotype, two-step methods produce higher power for detecting a GxE interaction compared to single step analysis. Therefore, we propose a two-step approach to test for an overall GxE effect for multiple phenotypes. Results Using simulations we demonstrate that, when more than one phenotype has GxE effect (i.e. GxE pleiotropy), our approach offers substantial gain in power (18–43%) to detect an aggregate-level GxE effect for a multivariate phenotype compared to an analogous two-step method to identify GxE effect for a univariate phenotype. We applied the proposed approach to simultaneously analyze three lipids, LDL, HDL and Triglyceride with the frequency of alcohol consumption as environmental factor in the UK Biobank. The method identified two loci with an overall GxE effect on the vector of lipids, one of which was missed by the competing approaches. Availability and implementation We provide an R package MPGE implementing the proposed approach which is available from CRAN: https://cran.r-project.org/web/packages/MPGE/index.html Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  2. Abstract Hardy–Weinberg proportions (HWP) are often explored to evaluate the assumption of random mating. However, in autopolyploids, organisms with more than two sets of homologous chromosomes, HWP and random mating are different hypotheses that require different statistical testing approaches. Currently, the only available methods to test for random mating in autopolyploids (i) heavily rely on asymptotic approximations and (ii) assume genotypes are known, ignoring genotype uncertainty. Furthermore, these approaches are all frequentist, and so do not carry the benefits of Bayesian analysis, including ease of interpretability, incorporation of prior information, and consistency under the null. Here, we present Bayesian approaches to test for random mating, bringing the benefits of Bayesian analysis to this problem. Our Bayesian methods also (i) do not rely on asymptotic approximations, being appropriate for small sample sizes, and (ii) optionally account for genotype uncertainty via genotype likelihoods. We validate our methods in simulations and demonstrate on two real datasets how testing for random mating is more useful for detecting genotyping errors than testing for HWP (in a natural population) and testing for Mendelian segregation (in an experimental S1 population). Our methods are implemented in Version 2.0.2 of thehwepR package on the Comprehensive R Archive Networkhttps://cran.r‐project.org/package=hwep. 
    more » « less
  3. Abstract MotivationMendelian randomization (MR) infers causal relationships between exposures and outcomes using genetic variants as instrumental variables. Typically, MR considers only a pair of exposure and outcome at a time, limiting its capability of capturing the entire causal network. We overcome this limitation by developing MR.RGM (Mendelian randomization via reciprocal graphical model), a fast R-package that implements the Bayesian reciprocal graphical model and enables practitioners to construct holistic causal networks with possibly cyclic/reciprocal causation and proper uncertainty quantifications, offering a comprehensive understanding of complex biological systems and their interconnections. ResultsWe developed MR.RGM, an open-source R package that applies bidirectional MR using a network-based strategy, enabling the exploration of causal relationships among multiple variables in complex biological systems. MR.RGM holds the promise of unveiling intricate interactions and advancing our understanding of genetic networks, disease risks, and phenotypic complexities. Availability and implementationMR.RGM is available at CRAN (https://CRAN.R-project.org/package=MR.RGM, DOI: 10.32614/CRAN.package.MR.RGM) and https://github.com/bitansa/MR.RGM. 
    more » « less
  4. Wren, Jonathan (Ed.)
    Abstract Summary Heterogeneity is a hallmark of many complex human diseases, and unsupervised heterogeneity analysis has been extensively conducted using high-throughput molecular measurements and histopathological imaging features. ‘Classic’ heterogeneity analysis has been based on simple statistics such as mean, variance and correlation. Network-based analysis takes interconnections as well as individual variable properties into consideration and can be more informative. Several Gaussian graphical model (GGM)-based heterogeneity analysis techniques have been developed, but friendly and portable software is still lacking. To facilitate more extensive usage, we develop the R package HeteroGGM, which conducts GGM-based heterogeneity analysis using the advanced penaliztaion techniques, can provide informative summary and graphical presentation, and is efficient and friendly. Availabilityand implementation The package is available at https://CRAN.R-project.org/package=HeteroGGM. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  5. Abstract We introduce the UWHAM (binless weighted histogram analysis method) and SWHAM (stochastic UWHAM) software package that can be used to estimate the density of states and free energy differences based on the data generated by multi-state simulations. The programs used to solve the UWHAM equations are written in the C++ language and operated via the command line interface. In this paper, first we review the theoretical bases of UWHAM, its stochastic solver RE-SWHAM (replica exchange-like SWHAM)and ST-SWHAM (serial tempering-like SWHAM). Then we provide a tutorial with examples that explains how to apply the UWHAM program package to analyze the data generated by different types of multi-state simulations: umbrella sampling, replica exchange, free energy perturbation simulations, etc. The tutorial examples also show that the UWHAM equations can be solved stochastically by applying the RE-SWHAM and ST-SWHAM programs when the data ensemble is large. If the simulations at some states are far from equilibrium, the Stratified RE-SWHAM program can be applied to obtain the equilibrium distribution of the state of interest. All the source codes and the tutorial examples are available from our group’s web page:https://ronlevygroup.cst.temple.edu/software/UWHAM_and_SWHAM_webpage/index.html. 
    more » « less