This content will become publicly available on February 1, 2023
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Sponsoring Org:
- National Science Foundation
More Like this
Integrative analysis of eQTL and GWAS summary statistics reveals transcriptomic alteration in Alzheimer brainsAbstract Background Large-scale genome-wide association studies have successfully identified many genetic variants significantly associated with Alzheimer’s disease (AD), such as rs429358, rs11038106, rs723804, rs13591776, and more. The next key step is to understand the function of these SNPs and the downstream biology through which they exert the effect on the development of AD. However, this remains a challenging task due to the tissue-specific nature of transcriptomic and proteomic data and the limited availability of brain tissue.In this paper, instead of using coupled transcriptomic data, we performed an integrative analysis of existing GWAS findings and expression quantitative trait loci (eQTL) results from AD-related brain regions to estimate the transcriptomic alterations in AD brain. Results We used summary-based mendelian randomization method along with heterogeneity in dependent instruments method and were able to identify 32 genes with potential altered levels in temporal cortex region. Among these, 10 of them were further validated using real gene expression data collected from temporal cortex region, and 19 SNPs from NECTIN and TOMM40 genes were found associated with multiple temporal cortex imaging phenotype. Conclusion Significant pathways from enriched gene networks included neutrophil degranulation, Cell surface interactions at the vascular wall, and Regulation of TP53 activity which aremore »
Noisy matrix completion aims at estimating a low-rank matrix given only partial and corrupted entries. Despite remarkable progress in designing efficient estimation algorithms, it remains largely unclear how to assess the uncertainty of the obtained estimates and how to perform efficient statistical inference on the unknown matrix (e.g., constructing a valid and short confidence interval for an unseen entry). This paper takes a substantial step toward addressing such tasks. We develop a simple procedure to compensate for the bias of the widely used convex and nonconvex estimators. The resulting debiased estimators admit nearly precise nonasymptotic distributional characterizations, which in turn enable optimal construction of confidence intervals/regions for, say, the missing entries and the low-rank factors. Our inferential procedures do not require sample splitting, thus avoiding unnecessary loss of data efficiency. As a byproduct, we obtain a sharp characterization of the estimation accuracy of our debiased estimators in both rate and constant. Our debiased estimators are tractable algorithms that provably achieve full statistical efficiency.
Recent advances in biomedical research have made massive amount of transcriptomic data available in public repositories from different sources. Due to the heterogeneity present in the individual experiments, identifying reproducible biomarkers for a given disease from multiple independent studies has become a major challenge. The widely used meta-analysis approaches, such as Fisher’s method, Stouffer’s method, minP and maxP, have at least two major limitations: (i) they are sensitive to outliers, and (ii) they perform only one statistical test for each individual study, and hence do not fully utilize the potential sample size to gain statistical power.
Here, we propose a gene-level meta-analysis framework that overcomes these limitations and identifies a gene signature that is reliable and reproducible across multiple independent studies of a given disease. The approach provides a comprehensive global signature that can be used to understand the underlying biological phenomena, and a smaller test signature that can be used to classify future samples of a given disease. We demonstrate the utility of the framework by constructing disease signatures for influenza and Alzheimer’s disease using nine datasets including 1108 individuals. These signatures are then validated on 12 independent datasets including 912 individuals. The results indicate that the proposedmore »
Supplementary data are available at Bioinformatics online.
With rapid progress in high-throughput genotyping and neuroimaging, researches of complex brain disorders, such as Alzheimer’s Disease (AD), have gained significant attention in recent years. Many prediction models have been studied to relate neuroimaging measures to cognitive status over the progressions when these disease develops. Missing data is one of the biggest challenge in accurate cognitive score prediction of subjects in longitudinal neuroimaging studies. To tackle this problem, in this paper we propose a novel formulation to learn an enriched representation for imaging biomarkers that can simultaneously capture both the information conveyed by baseline neuroimaging records and that by progressive variations of varied counts of available follow-up records over time. While the numbers of the brain scans of the participants vary, the learned biomarker representation for every participant is a fixed-length vector, which enable us to use traditional learning models to study AD developments. Our new objective is formulated to maximize the ratio of the summations of a number of L1-norm distances for improved robustness, which, though, is difficult to efficiently solve in general. Thus we derive a new efficient iterative solution algorithm and rigorously prove its convergence. We have performed extensive experiments on the Alzheimer’s Disease Neuroimaging Initiative (ADNI)more »
Inferring the Total-Evidence Timescale of Marattialean Fern Evolution in the Face of Model SensitivityFolk, Ryan (Ed.)Abstract Phylogenetic divergence-time estimation has been revolutionized by two recent developments: 1) total-evidence dating (or "tip-dating") approaches that allow for the incorporation of fossils as tips in the analysis, with their phylogenetic and temporal relationships to the extant taxa inferred from the data and 2) the fossilized birth-death (FBD) class of tree models that capture the processes that produce the tree (speciation, extinction, and fossilization) and thus provide a coherent and biologically interpretable tree prior. To explore the behavior of these methods, we apply them to marattialean ferns, a group that was dominant in Carboniferous landscapes prior to declining to its modest extant diversity of slightly over 100 species. We show that tree models have a dramatic influence on estimates of both divergence times and topological relationships. This influence is driven by the strong, counter-intuitive informativeness of the uniform tree prior, and the inherent nonidentifiability of divergence-time models. In contrast to the strong influence of the tree models, we find minor effects of differing the morphological transition model or the morphological clock model. We compare the performance of a large pool of candidate models using a combination of posterior-predictive simulation and Bayes factors. Notably, an FBD model with epoch-specific speciationmore »