DR-BERT: A protein language model to annotate disordered regions

Nambiar, Ananthan; Forsyth, John Malcolm; Liu, Simon; Maslov, Sergei

doi:10.1016/j.str.2024.04.010

Citation Details

DR-BERT: A protein language model to annotate disordered regions

Despite their lack of a rigid structure, intrinsically disordered regions (IDRs) in proteins play important roles in cellular functions, including mediating protein-protein interactions. Therefore, it is important to computationally annotate IDRs with high accuracy. In this study, we present Disordered Region prediction using Bidirectional Encoder Representations from Transformers (DR-BERT), a compact protein language model. Unlike most popular tools, DR-BERT is pretrained on unannotated proteins and trained to predict IDRs without relying on explicit evolutionary or biophysical data. Despite this, DR-BERT demonstrates significant improvement over existing methods on the Critical Assessment of protein Intrinsic Disorder (CAID) evaluation dataset and outperforms competitors on two out of four test cases in the CAID 2 dataset, while maintaining competitiveness in the others. This performance is due to the information learned during pretraining and DR- BERT’s ability to use contextual information. more »

Award ID(s):: 2107344

PAR ID:: 10549924

Author(s) / Creator(s):: Nambiar, Ananthan; Forsyth, John Malcolm; Liu, Simon; Maslov, Sergei

Publisher / Repository:: Cell Press

Date Published:: 2024-08-01

Journal Name:: Structure

Volume:: 32

Issue:: 8

ISSN:: 0969-2126

Page Range / eLocation ID:: 1260 to 1268.e3

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Journal Article:
https://doi.org/10.1016/j.str.2024.04.010

More Like this