Leveraging Large Language Models for Predicting Microbial Virulence from Protein Structure and Sequence

Quintana, Felix; Treangen, Todd; Kavraki, Lydia

doi:10.1145/3584371.3612953

Citation Details

Leveraging Large Language Models for Predicting Microbial Virulence from Protein Structure and Sequence

In the aftermath of COVID-19, screening for pathogens has never been a more relevant problem. However, computational screening for pathogens is challenging due to a variety of factors, including (i) the complexity and role of the host, (ii) virulence factor divergence and dynamics, and (iii) population and community-level dynamics. Considering a potential pathogen's molecular interactions, specifically individual proteins and protein interactions can help pinpoint a potential protein of a given microbe to cause disease. However, existing tools for pathogen screening rely on existing annotations (KEGG, GO, etc), making the assessment of novel and unannotated proteins more challenging. Here, we present an LLM-inspired approach that considers protein sequence and structure to predict protein virulence. We present a two-stage model incorporating evolutionary features captured from the DistilProtBert language model and protein structure in a graph convolutional network. Our model performs better than sequence alone for virulence function when high-quality structures are present, thus representing a path forward for virulence prediction of novel and unannotated proteins. more »

Award ID(s):: 2239114

PAR ID:: 10502775

Author(s) / Creator(s):: Quintana, Felix; Treangen, Todd; Kavraki, Lydia

Publisher / Repository:: ACM

Date Published:: 2023-09-03

Journal Name:: BCB '23: Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

ISBN:: 9798400701269

Page Range / eLocation ID:: 1 to 6

Format(s):: Medium: X

Location:: Houston TX USA

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3584371.3612953

More Like this