Search for: All records

Creators/Authors contains: "Gerstein, Mark"

« Prev Next »

Total Resources

11

Resource Type
Conference Paper

0

Conference Proceeding

0

Dataset

0

Journal Article

11

Workshop Report

0

Availability
Full Text / Resource Available

11

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Illuminating links between cis-regulators and trans-acting variants in the human prefrontal cortex

https://doi.org/10.1186/s13073-022-01133-8

Liu, Shuang ; Won, Hyejung ; Clarke, Declan ; Matoba, Nana ; Khullar, Saniya ; Mu, Yudi ; Wang, Daifeng ; Gerstein, Mark ( November 2022 , Genome Medicine)

Abstract Background
Neuropsychiatric disorders afflict a large portion of the global population and constitute a significant source of disability worldwide. Although Genome-wide Association Studies (GWAS) have identified many disorder-associated variants, the underlying regulatory mechanisms linking them to disorders remain elusive, especially those involving distant genomic elements. Expression quantitative trait loci (eQTLs) constitute a powerful means of providing this missing link. However, most eQTL studies in human brains have focused exclusively on cis-eQTLs, which link variants to nearby genes (i.e., those within 1 Mb of a variant). A complete understanding of disease etiology requires a clearer understanding of trans-regulatory mechanisms, which, in turn, entails a detailed analysis of the relationships between variants and expression changes in distant genes.
Methods
By leveraging large datasets from the PsychENCODE consortium, we conducted a genome-wide survey of trans-eQTLs in the human dorsolateral prefrontal cortex. We also performed colocalization and mediation analyses to identify mediators in trans-regulation and use trans-eQTLs to link GWAS loci to schizophrenia risk genes.
Results
We identified ~80,000 candidate trans-eQTLs (at FDR<0.25) that influence the expression of ~10K target genes (i.e., “trans-eGenes”). We found that many variants associated with these candidate trans-eQTLs overlap with known cis-eQTLs. Moreover, for >60% of these variants (by colocalization), the cis-eQTL’s target gene acts as a mediator for the trans-eQTL SNP's effect on the trans-eGene, highlighting examples of cis-mediation as essential for trans-regulation. Furthermore, many of these colocalized variants fall into a discernable pattern wherein cis-eQTL’s target is a transcription factor or RNA-binding protein, which, in turn, targets the gene associated with the candidate trans-eQTL. Finally, we show that trans-regulatory mechanisms provide valuable insights into psychiatric disorders: beyond what had been possible using only cis-eQTLs, we link an additional 23 GWAS loci and 90 risk genes (using colocalization between candidate trans-eQTLs and schizophrenia GWAS loci).
Conclusions
We demonstrate that the transcriptional architecture of the human brain is orchestrated by both cis- and trans-regulatory variants and found that trans-eQTLs provide insights into brain-disease biology.

more » « less
Binding peptide generation for MHC Class I proteins with deep reinforcement learning

https://doi.org/10.1093/bioinformatics/btad055

Chen, Ziqi ; Zhang, Baoyi ; Guo, Hongyu ; Emani, Prashant ; Clancy, Trevor ; Jiang, Chongming ; Gerstein, Mark ; Ning, Xia ; Cheng, Chao ; Min, Martin Renqiang ; et al ( January 2023 , Bioinformatics)

Abstract Motivation
MHC Class I protein plays an important role in immunotherapy by presenting immunogenic peptides to anti-tumor immune cells. The repertoires of peptides for various MHC Class I proteins are distinct, which can be reflected by their diverse binding motifs. To characterize binding motifs for MHC Class I proteins, in vitro experiments have been conducted to screen peptides with high binding affinities to hundreds of given MHC Class I proteins. However, considering tens of thousands of known MHC Class I proteins, conducting in vitro experiments for extensive MHC proteins is infeasible, and thus a more efficient and scalable way to characterize binding motifs is needed.
Results
We presented a de novo generation framework, coined PepPPO, to characterize binding motif for any given MHC Class I proteins via generating repertoires of peptides presented by them. PepPPO leverages a reinforcement learning agent with a mutation policy to mutate random input peptides into positive presented ones. Using PepPPO, we characterized binding motifs for around 10 000 known human MHC Class I proteins with and without experimental data. These computed motifs demonstrated high similarities with those derived from experimental data. In addition, we found that the motifs could be used for the rapid screening of neoantigens at a much lower time cost than previous deep-learning methods.
Availability and implementation
The software can be found in https://github.com/minrq/pMHC.
Supplementary information
Supplementary data are available at Bioinformatics online.

more » « less
Network propagation-based prioritization of long tail genes in 17 cancer types

https://doi.org/10.1186/s13059-021-02504-x

Mohsen, Hussein ; Gunasekharan, Vignesh ; Qing, Tao ; Seay, Montrell ; Surovtseva, Yulia ; Negahban, Sahand ; Szallasi, Zoltan ; Pusztai, Lajos ; Gerstein, Mark B. ( October 2021 , Genome Biology)

Abstract Background
The diversity of genomic alterations in cancer poses challenges to fully understanding the etiologies of the disease. Recent interest in infrequent mutations, in genes that reside in the “long tail” of the mutational distribution, uncovered new genes with significant implications in cancer development. The study of cancer-relevant genes often requires integrative approaches pooling together multiple types of biological data. Network propagation methods demonstrate high efficacy in achieving this integration. Yet, the majority of these methods focus their assessment on detecting known cancer genes or identifying altered subnetworks. In this paper, we introduce a network propagation approach that entirely focuses on prioritizing long tail genes with potential functional impact on cancer development.
Results
We identify sets of often overlooked, rarely to moderately mutated genes whose biological interactions significantly propel their mutation-frequency-based rank upwards during propagation in 17 cancer types. We call these sets “upward mobility genes” and hypothesize that their significant rank improvement indicates functional importance. We report new cancer-pathway associations based on upward mobility genes that are not previously identified using driver genes alone, validate their role in cancer cell survival in vitro using extensive genome-wide RNAi and CRISPR data repositories, and further conduct in vitro functional screenings resulting in the validation of 18 previously unreported genes.
Conclusion
Our analysis extends the spectrum of cancer-relevant genes and identifies novel potential therapeutic targets.

more » « less
Cyclic and multilevel causation in evolutionary processes

https://doi.org/10.1007/s10539-020-09753-3

Warrell, Jonathan ; Gerstein, Mark ( October 2020 , Biology & Philosophy)
null (Ed.)
Abstract Many models of evolution are implicitly causal processes. Features such as causal feedback between evolutionary variables and evolutionary processes acting at multiple levels, though, mean that conventional causal models miss important phenomena. We develop here a general theoretical framework for analyzing evolutionary processes drawing on recent approaches to causal modeling developed in the machine-learning literature, which have extended Pearls do-calculus to incorporate cyclic causal interactions and multilevel causation. We also develop information-theoretic notions necessary to analyze causal information dynamics in our framework, introducing a causal generalization of the Partial Information Decomposition framework. We show how our causal framework helps to clarify conceptual issues in the contexts of complex trait analysis and cancer genetics, including assigning variation in an observed trait to genetic, epigenetic and environmental sources in the presence of epigenetic and environmental feedback processes, and variation in fitness to mutation processes in cancer using a multilevel causal model respectively, as well as relating causally-induced to observed variation in these variables via information theoretic bounds. In the process, we introduce a general class of multilevel causal evolutionary processes which connect evolutionary processes at multiple levels via coarse-graining relationships. Further, we show how a range of fitness models can be formulated in our framework, as well as a causal analog of Prices equation (generalizing the probabilistic Rice equation), clarifying the relationships between realized/probabilistic fitness and direct/indirect selection. Finally, we consider the potential relevance of our framework to foundational issues in biology and evolution, including supervenience, multilevel selection and individuality. Particularly, we argue that our class of multilevel causal evolutionary processes, in conjunction with a minimum description length principle, provides a conceptual framework in which identification of multiple levels of selection may be reduced to a model selection problem.
more » « less
Full Text Available
Predicting the frequencies of drug side effects

https://doi.org/10.1038/s41467-020-18305-y

Galeano, Diego ; Li, Shantao ; Gerstein, Mark ; Paccanaro, Alberto ( December 2020 , Nature Communications)
null (Ed.)
Abstract A central issue in drug risk-benefit assessment is identifying frequencies of side effects in humans. Currently, frequencies are experimentally determined in randomised controlled clinical trials. We present a machine learning framework for computationally predicting frequencies of drug side effects. Our matrix decomposition algorithm learns latent signatures of drugs and side effects that are both reproducible and biologically interpretable. We show the usefulness of our approach on 759 structurally and therapeutically diverse drugs and 994 side effects from all human physiological systems. Our approach can be applied to any drug for which a small number of side effect frequencies have been identified, in order to predict the frequencies of further, yet unidentified, side effects. We show that our model is informative of the biology underlying drug activity: individual components of the drug signatures are related to the distinct anatomical categories of the drugs and to the specific drug routes of administration.
more » « less
Full Text Available
Predicting changes in protein thermodynamic stability upon point mutation with deep 3D convolutional neural networks

https://doi.org/10.1371/journal.pcbi.1008291

Li, Bian ; Yang, Yucheng T. ; Capra, John A. ; Gerstein, Mark B. ( November 2020 , PLOS Computational Biology)
Fariselli, Piero (Ed.)
Predicting mutation-induced changes in protein thermodynamic stability (ΔΔG) is of great interest in protein engineering, variant interpretation, and protein biophysics. We introduce ThermoNet, a deep, 3D-convolutional neural network (3D-CNN) designed for structure-based prediction of ΔΔGs upon point mutation. To leverage the image-processing power inherent in CNNs, we treat protein structures as if they were multi-channel 3D images. In particular, the inputs to ThermoNet are uniformly constructed as multi-channel voxel grids based on biophysical properties derived from raw atom coordinates. We train and evaluate ThermoNet with a curated data set that accounts for protein homology and is balanced with direct and reverse mutations; this provides a framework for addressing biases that have likely influenced many previous ΔΔG prediction methods. ThermoNet demonstrates performance comparable to the best available methods on the widely used S sym test set. In addition, ThermoNet accurately predicts the effects of both stabilizing and destabilizing mutations, while most other methods exhibit a strong bias towards predicting destabilization. We further show that homology between S sym and widely used training sets like S2648 and VariBench has likely led to overestimated performance in previous studies. Finally, we demonstrate the practical utility of ThermoNet in predicting the ΔΔGs for two clinically relevant proteins, p53 and myoglobin, and for pathogenic and benign missense variants from ClinVar. Overall, our results suggest that 3D-CNNs can model the complex, non-linear interactions perturbed by mutations, directly from biophysical properties of atoms.
more » « less
Full Text Available
Comparing Technological Development and Biological Evolution from a Network Perspective

https://doi.org/10.1016/j.cels.2020.02.004

Yan, Koon-Kiu ; Wang, Daifeng ; Xiong, Kun ; Gerstein, Mark ( March 2020 , Cell Systems)

Full Text Available
Approaches for integrating heterogeneous RNA-seq data reveal cross-talk between microbes and genes in asthmatic patients

https://doi.org/10.1186/s13059-020-02033-z

Spakowicz, Daniel ; Lou, Shaoke ; Barron, Brian ; Gomez, Jose L. ; Li, Tianxiao ; Liu, Qing ; Grant, Nicole ; Yan, Xiting ; Hoyd, Rebecca ; Weinstock, George ; et al ( December 2020 , Genome Biology)

Abstract Sputum induction is a non-invasive method to evaluate the airway environment, particularly for asthma. RNA sequencing (RNA-seq) of sputum samples can be challenging to interpret due to the complex and heterogeneous mixtures of human cells and exogenous (microbial) material. In this study, we develop a pipeline that integrates dimensionality reduction and statistical modeling to grapple with the heterogeneity. LDA(Latent Dirichlet allocation)-link connects microbes to genes using reduced-dimensionality LDA topics. We validate our method with single-cell RNA-seq and microscopy and then apply it to the sputum of asthmatic patients to find known and novel relationships between microbes and genes.
more » « less
Full Text Available
Quantum computing at the frontiers of biological sciences

https://doi.org/10.1038/s41592-020-01004-3

Emani, Prashant S. ; Warrell, Jonathan ; Anticevic, Alan ; Bekiranov, Stefan ; Gandal, Michael ; McConnell, Michael J. ; Sapiro, Guillermo ; Aspuru-Guzik, Alán ; Baker, Justin T. ; Bastiani, Matteo ; et al ( July 2021 , Nature Methods)
null (Ed.)
Full Text Available
Encoding human serine phosphopeptides in bacteria for proteome-wide identification of phosphorylation-dependent interactions

https://doi.org/10.1038/nbt.4150

Barber, Karl W ; Muir, Paul ; Szeligowski, Richard V ; Rogulina, Svetlana ; Gerstein, Mark ; Sampson, Jeffrey R ; Isaacs, Farren J ; Rinehart, Jesse ( June 2018 , Nature Biotechnology)

Full Text Available

« Prev Next »