Abstract SummaryDespite the availability of existing calculators for statistical power analysis in genetic association studies, there has not been a model-invariant and test-independent tool that allows for both planning of prospective studies and systematic review of reported findings. In this work, we develop a web-based application U-PASS (Unified Power analysis of ASsociation Studies), implementing a unified framework for the analysis of common association tests for binary qualitative traits. The application quantifies the shared asymptotic power limits of the common association tests, and visualizes the fundamental statistical trade-off between risk allele frequency and odds ratio. The application also addresses the applicability of asymptotics-based power calculations in finite samples, and provides guidelines for single-SNP-based association tests. In addition to designing prospective studies, U-PASS enables researchers to retrospectively assess the statistical validity of previously reported associations. Availability and implementationU-PASS is an open-source R Shiny application. A live instance is hosted at https://power.stat.lsa.umich.edu. Source is available on https://github.com/Pill-GZ/U-PASS. Supplementary informationSupplementary data are available at Bioinformatics online. 
                        more » 
                        « less   
                    
                            
                            Unpaired data empowers association tests
                        
                    
    
            Abstract Motivation There is growing interest in the biomedical research community to incorporate retrospective data, available in healthcare systems, to shed light on associations between different biomarkers. Understanding the association between various types of biomedical data, such as genetic, blood biomarkers, imaging, etc. can provide a holistic understanding of human diseases. To formally test a hypothesized association between two types of data in Electronic Health Records (EHRs), one requires a substantial sample size with both data modalities to achieve a reasonable power. Current association test methods only allow using data from individuals who have both data modalities. Hence, researchers cannot take advantage of much larger EHR samples that includes individuals with at least one of the data types, which limits the power of the association test. Results We present a new method called the Semi-paired Association Test (SAT) that makes use of both paired and unpaired data. In contrast to classical approaches, incorporating unpaired data allows SAT to produce better control of false discovery and to improve the power of the association test. We study the properties of the new test theoretically and empirically, through a series of simulations and by applying our method on real studies in the context of Chronic Obstructive Pulmonary Disease. We are able to identify an association between the high-dimensional characterization of Computed Tomography chest images and several blood biomarkers as well as the expression of dozens of genes involved in the immune system. Availability and implementation Code is available on https://github.com/batmanlab/Semi-paired-Association-Test. Supplementary information Supplementary data are available at Bioinformatics online. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1839332
- PAR ID:
- 10299293
- Editor(s):
- Alfonso, Valencia
- Date Published:
- Journal Name:
- Bioinformatics
- Volume:
- 37
- Issue:
- 6
- ISSN:
- 1367-4803
- Page Range / eLocation ID:
- 785 to 792
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract The development of paired appendages was a key innovation during evolution and facilitated the aquatic to terrestrial transition of vertebrates. Largely derived from the lateral plate mesoderm (LPM), one hypothesis for the evolution of paired fins invokes derivation from unpaired median fins via a pair of lateral fin folds located between pectoral and pelvic fin territories 1 . Whilst unpaired and paired fins exhibit similar structural and molecular characteristics, no definitive evidence exists for paired lateral fin folds in larvae or adults of any extant or extinct species. As unpaired fin core components are regarded as exclusively derived from paraxial mesoderm, any transition presumes both co-option of a fin developmental programme to the LPM and bilateral duplication 2 . Here, we identify that the larval zebrafish unpaired pre-anal fin fold (PAFF) is derived from the LPM and thus may represent a developmental intermediate between median and paired fins. We trace the contribution of LPM to the PAFF in both cyclostomes and gnathostomes, supporting the notion that this is an ancient trait of vertebrates. Finally, we observe that the PAFF can be bifurcated by increasing bone morphogenetic protein signalling, generating LPM-derived paired fin folds. Our work provides evidence that lateral fin folds may have existed as embryonic anlage for elaboration to paired fins.more » « less
- 
            We present a method to learn a joint multimodal representation space that enables recognition of unseen activities in videos. We first compare the effect of placing various constraints on the embedding space using paired text and video data. We also propose a method to improve the joint embedding space using an adversarial formulation, allowing it to benefit from unpaired text and video data. By using unpaired text data, we show the ability to learn a representation that better captures unseen activities. In addition to testing on publicly available datasets, we introduce a new, large-scale text/video dataset. We experimentally confirm that using paired and unpaired data to learn a shared embedding space benefits three difficult tasks (i) zero-shot activity classification, (ii) unsupervised activity discovery, and (iii) unseen activity captioning, outperforming the state-of-the-arts.more » « less
- 
            Unpaired data training enables super-resolution confocal microscopy from low-resolution acquisitionsSupervised deep-learning models have enabled super-resolution imaging in several microscopic imaging modalities, increasing the spatial lateral bandwidth of the original input images beyond the diffraction limit. Despite their success, their practical application poses several challenges in terms of the amount of training data and its quality, requiring the experimental acquisition of large, paired databases to generate an accurate generalized model whose performance remains invariant to unseen data. Cycle-consistent generative adversarial networks (cycleGANs) are unsupervised models for image-to-image translation tasks that are trained on unpaired datasets. This paper introduces a cycleGAN framework specifically designed to increase the lateral resolution limit in confocal microscopy by training a cycleGAN model using low- and high-resolution unpaired confocal images of human glioblastoma cells. Training and testing performances of the cycleGAN model have been assessed by measuring specific metrics such as background standard deviation, peak-to-noise ratio, and a customized frequency content measure. Our cycleGAN model has been evaluated in terms of image fidelity and resolution improvement using a paired dataset, showing superior performance than other reported methods. This work highlights the efficacy and promise of cycleGAN models in tackling super-resolution microscopic imaging without paired training, paving the path for turning home-built low-resolution microscopic systems into low-cost super-resolution instruments by means of unsupervised deep learning.more » « less
- 
            Expression quantitative trait loci (eQTLs), or single-nucleotide polymorphisms that affect average gene expression levels, provide important insights into context-specific gene regulation. Classic eQTL analyses use one-to-one association tests, which test gene–variant pairs individually and ignore correlations induced by gene regulatory networks and linkage disequilibrium. Probabilistic topic models, such as latent Dirichlet allocation, estimate latent topics for a collection of count observations. Prior multimodal frameworks that bridge genotype and expression data assume matched sample numbers between modalities. However, many data sets have a nested structure where one individual has several associated gene expression samples and a single germline genotype vector. Here, we build a telescoping bimodal latent Dirichlet allocation (TBLDA) framework to learn shared topics across gene expression and genotype data that allows multiple RNA sequencing samples to correspond to a single individual’s genotype. By using raw count data, our model avoids possible adulteration via normalization procedures. Ancestral structure is captured in a genotype-specific latent space, effectively removing it from shared components. Using GTEx v8 expression data across 10 tissues and genotype data, we show that the estimated topics capture meaningful and robust biological signal in both modalities and identify associations within and across tissue types. We identify 4,645 cis-eQTLs and 995 trans-eQTLs by conducting eQTL mapping between the most informative features in each topic. Our TBLDA model is able to identify associations using raw sequencing count data when the samples in two separate data modalities are matched one-to-many, as is often the case in biological data. Our code is freely available at https://github.com/gewirtz/TBLDA .more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    