Abstract In recent years, significant advancements have been made in deep learning‐based computational modeling of proteins, with DeepMind's AlphaFold2 standing out as a landmark achievement. These computationally modeled protein structures not only provide atomic coordinates but also include self‐confidence metrics to assess the relative quality of the modeling, either for individual residues or the entire protein. However, these self‐confidence scores are not always reliable; for instance, poorly modeled regions of a protein may sometimes be assigned high confidence. To address this limitation, we introduce Equivariant Quality Assessment Folding (EQAFold), an enhanced framework that refines the Local Distance Difference Test prediction head of AlphaFold to generate more accurate self‐confidence scores. Our results demonstrate that EQAFold outperforms the standard AlphaFold architecture and recent model quality assessment protocols in providing more reliable confidence metrics. Source code for EQAFold is available athttps://github.com/kiharalab/EQAFold_public.
more »
« less
DeepPhoPred : Accurate Deep Learning Model to Predict Microbial Phosphorylation
ABSTRACT Phosphorylation is a substantial posttranslational modification of proteins that refers to adding a phosphate group to the amino acid side chain after translation process in the ribosome. It is vital to coordinate cellular functions, such as regulating metabolism, proliferation, apoptosis, subcellular trafficking, and other crucial physiological processes. Phosphorylation prediction in a microbial organism can assist in understanding pathogenesis and host–pathogen interaction, drug and antibody design, and antimicrobial agent development. Experimental methods for predicting phosphorylation sites are costly, slow, and tedious. Hence low‐cost and high‐speed computational approaches are highly desirable. This paper presents a new deep learning tool called DeepPhoPred for predicting microbial phospho‐serine (pS), phospho‐threonine (pT), and phospho‐tyrosine (pY) sites. DeepPhoPred incorporates a two‐headed convolutional neural network architecture with the squeeze and excitation blocks followed by fully connected layers that jointly learn significant features from the peptide's structural and evolutionary information to predict phosphorylation sites. Our empirical results demonstrate that DeepPhoPred significantly outperforms the existing microbial phosphorylation site predictors with its highly efficient deep‐learning architecture. DeepPhoPred as a standalone predictor, all its source codes, and our employed datasets are publicly available athttps://github.com/faisalahm3d/DeepPhoPred.
more »
« less
- Award ID(s):
- 2152059
- PAR ID:
- 10616040
- Publisher / Repository:
- Wiley
- Date Published:
- Journal Name:
- Proteins: Structure, Function, and Bioinformatics
- Volume:
- 93
- Issue:
- 2
- ISSN:
- 0887-3585
- Page Range / eLocation ID:
- 465 to 481
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract The widespread misuse of antibiotics has escalated antibiotic resistance into a critical global public health concern. Beyond antibiotics, metals function as antibacterial agents. Metal resistance genes (MRGs) enable bacteria to tolerate metal-based antibacterials and may also foster antibiotic resistance within bacterial communities through co-selection. Thus, predicting bacterial MRGs is vital for elucidating their involvement in antibiotic resistance and metal tolerance mechanisms. The “best hit” approach is mainly utilized to identify and annotate MRGs. This method is sensitive to cutoff values and produces a high false negative rate. Other than the best hit approach, only a few antimicrobial resistance (AMR) detection tools exist for predicting MRGs. However, these tools lack comprehensive annotation for MRGs conferring resistance to multiple metals. To address such limitations, we introduce DeepMRG, a deep learning-based multi-label classifier, to predict bacterial MRGs. Because a bacterial MRG can confer resistance to multiple metals, DeepMRG is designed as a multi-label classifier capable of predicting multiple metal labels associated with an MRG. It leverages bit score-based similarity distribution of sequences with experimentally verified MRGs. To ensure unbiased model evaluation, we employed a clustering method to partition our dataset into six subsets, five for cross-validation and one for testing, with non-homologous sequences, mitigating the impact of sequence homology. DeepMRG consistently achieved high overall F1-scores and significantly reduced false negative rates across a wide range of datasets. It can be used to predict bacterial MRGs in metagenomic or isolate assemblies. The web server of DeepMRG can be accessed athttps://deepmrg.cs.vt.edu/deepmrgand the source code is available athttps://github.com/muhit-emon/DeepMRGunder the MIT license.more » « less
-
Abstract MotivationPolymerase chain reaction (PCR) enables rapid, cost-effective diagnostics but requires prior identification of genomic regions that allow sensitive and specific detection of target microbial groups, herein referred to as microbial signature sequences. We introduce Seqwin, an open-source framework designed to automate microbial genome signature discovery. Tens of thousands of microbial genomes are now available for a single species, limiting the application of existing manual and automated approaches for identifying signatures. Modern approaches that are capable of leveraging all available microbial genomes will ensure sensitive and accurate DNA signature identification and enable robust pathogen detection for clinical, environmental, and public health applications. ResultsSeqwin builds weighted pan-genome minimizer graphs and uses a traversal algorithm to identify signature sequences that occur frequently in target genomes but remain rare in non-targets. Unlike earlier tools that depend on strict presence or absence of sequences, Seqwin accommodates natural sequence variation and scales to very large genome collections. When applied to genomes from C. difficile, M. tuberculosis, and S. enterica, Seqwin recovered more high-quality signatures than alternative methods with lower computational burden. Seqwin’s analysis of nearly 15,000 S. enterica genomes yielded over 200 candidate signatures in 5 minutes. Seqwin provides an open-source solution for the long-standing need for scalable microbial signature discovery and diagnostic assay design. Availability and ImplementationSeqwin is freely available for academic use (https://github.com/treangenlab/Seqwin) and can be installed via Bioconda. Benchmarking datasets, outputs, and scripts are available on Zenodohttps://doi.org/10.5281/zenodo.19176444. Contacttreangen@rice.edu,xw66@rice.edu Supplementary MaterialsProvided as separate PDF and data files.more » « less
-
Abstract BackgroundPlant architecture can influence crop yield and quality. Manual extraction of architectural traits is, however, time-consuming, tedious, and error prone. The trait estimation from 3D data addresses occlusion issues with the availability of depth information while deep learning approaches enable learning features without manual design. The goal of this study was to develop a data processing workflow by leveraging 3D deep learning models and a novel 3D data annotation tool to segment cotton plant parts and derive important architectural traits. ResultsThe Point Voxel Convolutional Neural Network (PVCNN) combining both point- and voxel-based representations of 3D data shows less time consumption and better segmentation performance than point-based networks. Results indicate that the best mIoU (89.12%) and accuracy (96.19%) with average inference time of 0.88 s were achieved through PVCNN, compared to Pointnet and Pointnet++. On the seven derived architectural traits from segmented parts, an R2value of more than 0.8 and mean absolute percentage error of less than 10% were attained. ConclusionThis plant part segmentation method based on 3D deep learning enables effective and efficient architectural trait measurement from point clouds, which could be useful to advance plant breeding programs and characterization of in-season developmental traits. The plant part segmentation code is available athttps://github.com/UGA-BSAIL/plant_3d_deep_learning.more » « less
-
Abstract New X‐ray crystallography and cryo‐electron microscopy (cryo‐EM) approaches yield vast amounts of structural data from dynamic proteins and their complexes. Modeling the full conformational ensemble can provide important biological insights, but identifying and modeling an internally consistent set of alternate conformations remains a formidable challenge. qFit efficiently automates this process by generating a parsimonious multiconformer model. We refactored qFit from a distributed application into software that runs efficiently on a small server, desktop, or laptop. We describe the new qFit 3 software and provide some examples. qFit 3 is open‐source under the MIT license, and is available athttps://github.com/ExcitedStates/qfit-3.0.more » « less
An official website of the United States government

