Protein–peptide interactions play a crucial role in various cellular processes and are implicated in abnormal cellular behaviors leading to diseases such as cancer. Therefore, understanding these interactions is vital for both functional genomics and drug discovery efforts. Despite a significant increase in the availability of protein–peptide complexes, experimental methods for studying these interactions remain laborious, time-consuming, and expensive. Computational methods offer a complementary approach but often fall short in terms of prediction accuracy. To address these challenges, we introduce PepCNN, a deep learning-based prediction model that incorporates structural and sequence-based information from primary protein sequences. By utilizing a combination of half-sphere exposure, position specific scoring matrices from multiple-sequence alignment tool, and embedding from a pre-trained protein language model, PepCNN outperforms state-of-the-art methods in terms of specificity, precision, and AUC. The PepCNN software and datasets are publicly available at
In pursuit of scientific discovery, vast collections of unstructured structural and functional images are acquired; however, only an infinitesimally small fraction of this data is rigorously analyzed, with an even smaller fraction ever being published. One method to accelerate scientific discovery is to extract more insight from costly scientific experiments already conducted. Unfortunately, data from scientific experiments tend only to be accessible by the originator who knows the experiments and directives. Moreover, there are no robust methods to search unstructured databases of images to deduce correlations and insight. Here, we develop a machine learning approach to create image similarity projections to search unstructured image databases. To improve these projections, we develop and train a model to include symmetry-aware features. As an exemplar, we use a set of 25,133 piezoresponse force microscopy images collected on diverse materials systems over five years. We demonstrate how this tool can be used for interactive recursive image searching and exploration, highlighting structural similarities at various length scales. This tool justifies continued investment in federated scientific databases with standardized metadata schemas where the combination of filtering and recursive interactive searching can uncover synthesis-structure-property relations. We provide a customizable open-source package (
- Award ID(s):
- 1839234
- NSF-PAR ID:
- 10307701
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- npj Computational Materials
- Volume:
- 7
- Issue:
- 1
- ISSN:
- 2057-3960
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract https://github.com/abelavit/PepCNN.git . -
Abstract Background The pan-genome of a species is the union of the genes and non-coding sequences present in all individuals (cultivar, accessions, or strains) within that species.
Results Here we introduce PGV, a reference-agnostic representation of the pan-genome of a species based on the notion of consensus ordering. Our experimental results demonstrate that PGV enables an intuitive, effective and interactive visualization of a pan-genome by providing a genome browser that can elucidate complex structural genomic variations.
Conclusions The PGV software can be installed via conda or downloaded from
https://github.com/ucrbioinfo/PGV . The companion PGV browser athttp://pgv.cs.ucr.edu can be tested using example bed tracks available from the GitHub page. -
Abstract Rejecting cosmic rays (CRs) is essential for the scientific interpretation of CCD-captured data, but detecting CRs in single-exposure images has remained challenging. Conventional CR detectors require experimental parameter tuning for different instruments, and recent deep-learning methods only produce instrument-specific models that suffer from performance loss on telescopes not included in the training data. We present Cosmic-CoNN, a generic CR detector deployed for 24 telescopes at the Las Cumbres Observatory, which has been made possible by the three contributions in this work: (1) We build a large and diverse ground-based CR data set leveraging thousands of images from a global telescope network. (2) We propose a novel loss function and a neural network optimized for telescope imaging data to train generic CR-detection models. At 95% recall, our model achieves a precision of 93.70% on Las Cumbres imaging data and maintains a consistent performance on new ground-based instruments never used for training. Specifically, the Cosmic-CoNN model trained on the Las Cumbres CR data set maintains high precisions of 92.03% and 96.69% on Gemini GMOS-N/S 1 × 1 and 2 × 2 binning images, respectively. (3) We build a suite of tools including an interactive CR mask visualization and editing interface, console commands, and Python APIs to make automatic, robust CR detection widely accessible by the community of astronomers. Our data set, open-source code base, and trained models are available at
https://github.com/cy-xu/cosmic-conn . -
Abstract We present a critical analysis of physics-informed neural operators (PINOs) to solve partial differential equations (PDEs) that are ubiquitous in the study and modeling of physics phenomena using carefully curated datasets. Further, we provide a benchmarking suite which can be used to evaluate PINOs in solving such problems. We first demonstrate that our methods reproduce the accuracy and performance of other neural operators published elsewhere in the literature to learn the 1D wave equation and the 1D Burgers equation. Thereafter, we apply our PINOs to learn new types of equations, including the 2D Burgers equation in the scalar, inviscid and vector types. Finally, we show that our approach is also applicable to learn the physics of the 2D linear and nonlinear shallow water equations, which involve three coupled PDEs. We release our artificial intelligence surrogates and scientific software to produce initial data and boundary conditions to study a broad range of physically motivated scenarios. We provide the
source code , an interactivewebsite to visualize the predictions of our PINOs, and a tutorial for their use at theData and Learning Hub for Science . -
Abstract Background Fusion of RNA-binding proteins (RBPs) to RNA base-editing enzymes (such as APOBEC1 or ADAR) has emerged as a powerful tool for the discovery of RBP binding sites. However, current methods that analyze sequencing data from RNA-base editing experiments are vulnerable to false positives due to off-target editing, genetic variation and sequencing errors.
Results We present FLagging Areas of RNA-editing Enrichment (FLARE), a Snakemake-based pipeline that builds on the outputs of the SAILOR edit site discovery tool to identify regions statistically enriched for RNA editing. FLARE can be configured to analyze any type of RNA editing, including C to U and A to I. We applied FLARE to C-to-U editing data from a RBFOX2-APOBEC1 STAMP experiment, to show that our approach attains high specificity for detecting RBFOX2 binding sites. We also applied FLARE to detect regions of exogenously introduced as well as endogenous A-to-I editing.
Conclusions FLARE is a fast and flexible workflow that identifies significantly edited regions from RNA-seq data. The FLARE codebase is available at
https://github.com/YeoLab/FLARE .