NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Jumper enables discontinuous transcript assembly in coronaviruses

https://doi.org/10.1038/s41467-021-26944-y

Sashittal, Palash; Zhang, Chuanyi; Peng, Jian; El-Kebir, Mohammed (November 2021, Nature Communications)

Abstract Genes in SARS-CoV-2 and other viruses in the order ofNidoviralesare expressed by a process of discontinuous transcription which is distinct from alternative splicing in eukaryotes and is mediated by the viral RNA-dependent RNA polymerase. Here, we introduce the DISCONTINUOUS TRANSCRIPT ASSEMBLYproblem of finding transcripts and their abundances given an alignment of paired-end short reads under a maximum likelihood model that accounts for varying transcript lengths. We show, using simulations, that our method, JUMPER, outperforms existing methods for classical transcript assembly. On short-read data of SARS-CoV-1, SARS-CoV-2 and MERS-CoV samples, we find that JUMPER not only identifies canonical transcripts that are part of the reference transcriptome, but also predicts expression of non-canonical transcripts that are supported by subsequent orthogonal analyses. Moreover, application of JUMPER on samples with and without treatment reveals viral drug response at the transcript level. As such, JUMPER enables detailed analyses ofNidoviralestranscriptomes under varying conditions.
more » « less
ECNet is an evolutionary context-integrated deep learning framework for protein engineering

https://doi.org/10.1038/s41467-021-25976-8

Luo, Yunan; Jiang, Guangde; Yu, Tianhao; Liu, Yang; Vo, Lam; Ding, Hantian; Su, Yufeng; Qian, Wesley Wei; Zhao, Huimin; Peng, Jian (September 2021, Nature Communications)

Abstract Machine learning has been increasingly used for protein engineering. However, because the general sequence contexts they capture are not specific to the protein being engineered, the accuracy of existing machine learning algorithms is rather limited. Here, we report ECNet (evolutionary context-integrated neural network), a deep-learning algorithm that exploits evolutionary contexts to predict functional fitness for protein engineering. This algorithm integrates local evolutionary context from homologous sequences that explicitly model residue-residue epistasis for the protein of interest with the global evolutionary context that encodes rich semantic and structural features from the enormous protein sequence universe. As such, it enables accurate mapping from sequence to function and provides generalization from low-order mutants to higher-order mutants. We show that ECNet predicts the sequence-function relationship more accurately as compared to existing machine learning algorithms by using ~50 deep mutational scanning and random mutagenesis datasets. Moreover, we used ECNet to guide the engineering of TEM-1 β-lactamase and identified variants with improved ampicillin resistance with high success rates.
more » « less
Learning structural motif representations for efficient protein structure search

https://doi.org/10.1093/bioinformatics/bty585

Liu, Yang; Ye, Qing; Wang, Liwei; Peng, Jian (September 2018, Bioinformatics)

Abstract MotivationGiven a protein of unknown function, fast identification of similar protein structures from the Protein Data Bank (PDB) is a critical step for inferring its biological function. Such structural neighbors can provide evolutionary insights into protein conformation, interfaces and binding sites that are not detectable from sequence similarity. However, the computational cost of performing pairwise structural alignment against all structures in PDB is prohibitively expensive. Alignment-free approaches have been introduced to enable fast but coarse comparisons by representing each protein as a vector of structure features or fingerprints and only computing similarity between vectors. As a notable example, FragBag represents each protein by a ‘bag of fragments’, which is a vector of frequencies of contiguous short backbone fragments from a predetermined library. Despite being efficient, the accuracy of FragBag is unsatisfactory because its backbone fragment library may not be optimally constructed and long-range interacting patterns are omitted. ResultsHere we present a new approach to learning effective structural motif presentations using deep learning. We develop DeepFold, a deep convolutional neural network model to extract structural motif features of a protein structure. We demonstrate that DeepFold substantially outperforms FragBag on protein structural search on a non-redundant protein structure database and a set of newly released structures. Remarkably, DeepFold not only extracts meaningful backbone segments but also finds important long-range interacting motifs for structural comparison. We expect that DeepFold will provide new insights into the evolution and hierarchical organization of protein structural motifs. Availability and implementationhttps://github.com/largelymfs/DeepFold
more » « less
Few-shot learning creates predictive models of drug response that translate from high-throughput screens to individual patients

https://doi.org/10.1038/s43018-020-00169-2

Ma, Jianzhu; Fong, Samson H.; Luo, Yunan; Bakkenist, Christopher J.; Shen, John Paul; Mourragui, Soufiane; Wessels, Lodewyk F.; Hafner, Marc; Sharan, Roded; Peng, Jian; et al (February 2021, Nature Cancer)
null (Ed.)
Full Text Available
Integrating thermodynamic and sequence contexts improves protein-RNA binding prediction

https://doi.org/10.1371/journal.pcbi.1007283

Su, Yufeng; Luo, Yunan; Zhao, Xiaoming; Liu, Yang; Peng, Jian; Ay, Ferhat (September 2019, PLOS Computational Biology)

Full Text Available
Typing tumors using pathways selected by somatic evolution

https://doi.org/10.1038/s41467-018-06464-y

Wang, Sheng; Ma, Jianzhu; Zhang, Wei; Shen, John Paul; Huang, Justin; Peng, Jian; Ideker, Trey (December 2018, Nature Communications)

Full Text Available
Reconstructing spatial organizations of chromosomes through manifold learning

https://doi.org/10.1093/nar/gky065

Zhu, Guangxiang; Deng, Wenxuan; Hu, Hailin; Ma, Rui; Zhang, Sai; Yang, Jinglin; Peng, Jian; Kaplan, Tommy; Zeng, Jianyang (February 2018, Nucleic Acids Research)

Full Text Available
Enhancing Evolutionary Couplings with Deep Convolutional Neural Networks

https://doi.org/10.1016/j.cels.2017.11.014

Liu, Yang; Palmedo, Perry; Ye, Qing; Berger, Bonnie; Peng, Jian (January 2018, Cell Systems)

Full Text Available
Large-scale integration of heterogeneous pharmacogenomic data for identifying drug mechanism of action

https://doi.org/10.1142/9789813235533_0005

Luo, Yunan; Wang, Sheng; Xiao, Jinfeng; Peng, Jian (January 2018, Proceedings of the Pacific Symposium)

Full Text Available
A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information

https://doi.org/10.1038/s41467-017-00680-8

Luo, Yunan; Zhao, Xinbin; Zhou, Jingtian; Yang, Jinglin; Zhang, Yanqing; Kuang, Wenhua; Peng, Jian; Chen, Ligong; Zeng, Jianyang (December 2017, Nature Communications)

Full Text Available

« Prev Next »

Search for: All records