NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

NEFFy: A Versatile Tool for Computing the Number of Effective Sequences

https://doi.org/10.1093/bioinformatics/btaf222

Haghani, Maryam; Bhattacharya, Debswapna; Murali, T M (June 2025, Bioinformatics)
Cheng, Jianlin (Ed.)
Abstract MotivationA Multiple Sequence Alignment (MSA) contains fundamental evolutionary information that is useful in the prediction of structure and function of proteins and nucleic acids. The “Number of Effective Sequences” (NEFF) quantifies the diversity of sequences of an MSA. While several tools embed NEFF calculation with various options, none are standalone tools for this purpose, and they do not offer all the available options. ResultsWe developed NEFFy, the first software package to integrate all these options and calculate NEFF across diverse MSA formats for proteins, RNAs, and DNAs. It surpasses existing tools in functionality without compromising computational efficiency and scalability. NEFFy also offers per-residue NEFF calculation and supports NEFF computation for MSAs of multimeric proteins, with the capability to be extended to DNAs and RNAs. Availability and ImplementationNEFFy is released as open-source software under the GNU Public License v3.0. The source code in C ++ and a Python wrapper are available at https://github.com/Maryam-Haghani/NEFFy. To ensure users can fully leverage these capabilities, comprehensive documentation and examples are provided at https://Maryam-Haghani.github.io/NEFFy. Supplementary InformationSupplementary data are available at Bioinformatics online.
more » « less
Free, publicly-accessible full text available June 3, 2026
To pack or not to pack: revisiting protein side-chain packing in the post-AlphaFold era

https://doi.org/10.1093/bib/bbaf297

Vangaru, Sriniketh; Bhattacharya, Debswapna (June 2025, Briefings in Bioinformatics)

Abstract Protein side-chain packing (PSCP), the problem of predicting side-chain conformations given a fixed backbone structure, has important implications in the modeling of structures and interactions. However, despite the groundbreaking progress in protein structure prediction pioneered by AlphaFold, the existing PSCP methods still rely on experimental inputs, and do not leverage AlphaFold-predicted backbone coordinates to enable PSCP at scale. Here, we perform a large-scale benchmarking of the predictive performance of various PSCP methods on public datasets from multiple rounds of the Critical Assessment of Structure Prediction challenges using a diverse set of evaluation metrics. Empirical results demonstrate that the PSCP methods perform well in packing the side-chains with experimental inputs, but they fail to generalize in repacking AlphaFold-generated structures. We additionally explore the effectiveness of leveraging the self-assessment confidence scores from AlphaFold by implementing a backbone confidence-aware integrative approach. While such a protocol often leads to performance improvement by attaining modest yet statistically significant accuracy gains over the AlphaFold baseline, it does not yield consistent and pronounced improvements. Our study highlights the recent advances and remaining challenges in PSCP in the post-AlphaFold era.
more » « less
The landscape of RNA 3D structure modeling with transformer networks

https://doi.org/10.1093/biomethods/bpae047

Tarafder, Sumit; Roche, Rahmatullah; Bhattacharya, Debswapna (July 2024, Biology Methods and Protocols)

Abstract Transformers are a powerful subclass of neural networks catalyzing the development of a growing number of computational methods for RNA structure modeling. Here, we conduct an objective and empirical study of the predictive modeling accuracy of the emerging transformer-based methods for RNA structure prediction. Our study reveals multi-faceted complementarity between the methods and underscores some key aspects that affect the prediction accuracy.
more » « less
PIQLE: protein–protein interface quality estimation by deep graph learning of multimeric interaction geometries

https://doi.org/10.1093/bioadv/vbad070

Shuvo, Md Hossain; Karim, Mohimenul; Roche, Rahmatullah; Bhattacharya, Debswapna; Gromiha, ed., Michael (June 2023, Bioinformatics Advances)

Abstract MotivationAccurate modeling of protein–protein interaction interface is essential for high-quality protein complex structure prediction. Existing approaches for estimating the quality of a predicted protein complex structural model utilize only the physicochemical properties or energetic contributions of the interacting atoms, ignoring evolutionarily information or inter-atomic multimeric geometries, including interaction distance and orientations. ResultsHere, we present PIQLE, a deep graph learning method for protein–protein interface quality estimation. PIQLE leverages multimeric interaction geometries and evolutionarily information along with sequence- and structure-derived features to estimate the quality of individual interactions between the interfacial residues using a multi-head graph attention network and then probabilistically combines the estimated quality for scoring the overall interface. Experimental results show that PIQLE consistently outperforms existing state-of-the-art methods including DProQA, TRScore, GNN-DOVE and DOVE on multiple independent test datasets across a wide range of evaluation metrics. Our ablation study and comparison with the self-assessment module of AlphaFold-Multimer repurposed for protein complex scoring reveal that the performance gains are connected to the effectiveness of the multi-head graph attention network in leveraging multimeric interaction geometries and evolutionary information along with other sequence- and structure-derived features adopted in PIQLE. Availability and implementationAn open-source software implementation of PIQLE is freely available at https://github.com/Bhattacharya-Lab/PIQLE. Supplementary informationSupplementary data are available at Bioinformatics Advances online.
more » « less
EquiPNAS: improved protein–nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks

https://doi.org/10.1093/nar/gkae039

Roche, Rahmatullah; Moussad, Bernard; Shuvo, Md Hossain; Tarafder, Sumit; Bhattacharya, Debswapna (January 2024, Nucleic Acids Research)

Abstract Protein language models (pLMs) trained on a large corpus of protein sequences have shown unprecedented scalability and broad generalizability in a wide range of predictive modeling tasks, but their power has not yet been harnessed for predicting protein–nucleic acid binding sites, critical for characterizing the interactions between proteins and nucleic acids. Here, we present EquiPNAS, a new pLM-informed E(3) equivariant deep graph neural network framework for improved protein–nucleic acid binding site prediction. By combining the strengths of pLM and symmetry-aware deep graph learning, EquiPNAS consistently outperforms the state-of-the-art methods for both protein–DNA and protein–RNA binding site prediction on multiple datasets across a diverse set of predictive modeling scenarios ranging from using experimental input to AlphaFold2 predictions. Our ablation study reveals that the pLM embeddings used in EquiPNAS are sufficiently powerful to dramatically reduce the dependence on the availability of evolutionary information without compromising on accuracy, and that the symmetry-aware nature of the E(3) equivariant graph-based neural architecture offers remarkable robustness and performance resilience. EquiPNAS is freely available at https://github.com/Bhattacharya-Lab/EquiPNAS.
more » « less
DisCovER : distance‐ and orientation‐based covariational threading for weakly homologous proteins

https://doi.org/10.1002/prot.26254

Bhattacharya, Sutanu; Roche, Rahmatullah; Moussad, Bernard; Bhattacharya, Debswapna (October 2021, Proteins: Structure, Function, and Bioinformatics)

Abstract Threading a query protein sequence onto a library of weakly homologous structural templates remains challenging, even when sequence‐based predicted contact or distance information is used. Contact‐assisted or distance‐assisted threading methods utilize only the spatial proximity of the interacting residue pairs for template selection and alignment, ignoring their orientation. Moreover, existing threading methods fail to consider the neighborhood effect induced by the query–template alignment. We present a new distance‐ and orientation‐based covariational threading method called DisCovER by effectively integrating information from inter‐residue distance and orientation along with the topological network neighborhood of a query–template alignment. Our method first selects a subset of templates using standard profile‐based threading coupled with topological network similarity terms to account for the neighborhood effect and subsequently performs distance‐ and orientation‐based query–template alignment using an iterative double dynamic programming framework. Multiple large‐scale benchmarking results on query proteins classified as weakly homologous from the continuous automated model evaluation experiment and from the current literature show that our method outperforms several existing state‐of‐the‐art threading approaches, and that the integration of the neighborhood effect with the inter‐residue distance and orientation information synergistically contributes to the improved performance of DisCovER. DisCovER is freely available athttps://github.com/Bhattacharya-Lab/DisCovER.
more » « less
Modeling SARS‐CoV‐2 proteins in the CASP‐commons experiment

https://doi.org/10.1002/prot.26231

Kryshtafovych, Andriy; Moult, John; Billings, Wendy M.; Della Corte, Dennis; Fidelis, Krzysztof; Kwon, Sohee; Olechnovič, Kliment; Seok, Chaok; Venclovas, Česlovas; Won, Jonghun; et al (October 2021, Proteins: Structure, Function, and Bioinformatics)

Abstract Critical Assessment of Structure Prediction (CASP) is an organization aimed at advancing the state of the art in computing protein structure from sequence. In the spring of 2020, CASP launched a community project to compute the structures of the most structurally challenging proteins coded for in the SARS‐CoV‐2 genome. Forty‐seven research groups submitted over 3000 three‐dimensional models and 700 sets of accuracy estimates on 10 proteins. The resulting models were released to the public. CASP community members also worked together to provide estimates of local and global accuracy and identify structure‐based domain boundaries for some proteins. Subsequently, two of these structures (ORF3a and ORF8) have been solved experimentally, allowing assessment of both model quality and the accuracy estimates. Models from the AlphaFold2 group were found to have good agreement with the experimental structures, with main chain GDT_TS accuracy scores ranging from 63 (a correct topology) to 87 (competitive with experiment).
more » « less
Sifting through the noise: A survey of diffusion probabilistic models and their applications to biomolecules

https://doi.org/10.1016/j.jmb.2024.168818

Norton, Trevor; Bhattacharya, Debswapna (March 2025, Journal of Molecular Biology)

Free, publicly-accessible full text available March 1, 2026
EquiRank: Improved protein-protein interface quality estimation using protein language-model-informed equivariant graph neural networks

https://doi.org/10.1016/j.csbj.2024.12.015

Shuvo, Md Hossain; Bhattacharya, Debswapna (January 2025, Computational and Structural Biotechnology Journal)

Full Text Available
Advances in Language-Model-Informed Protein–Nucleic Acid Binding Site Prediction

https://doi.org/10.1007/978-1-0716-4623-6_9

Tarafder, Sumit; Wang, Xinyu; Roche, Rahmatullah; Bhattacharya, Debswapna (January 2025, Springer US)

Full Text Available

« Prev Next »

Search for: All records