skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Effective prediction of short hydrogen bonds in proteins via machine learning method
Abstract Short hydrogen bonds (SHBs), whose donor and acceptor heteroatoms lie within 2.7 Å, exhibit prominent quantum mechanical characters and are connected to a wide range of essential biomolecular processes. However, exact determination of the geometry and functional roles of SHBs requires a protein to be at atomic resolution. In this work, we analyze 1260 high-resolution peptide and protein structures from the Protein Data Bank and develop a boosting based machine learning model to predict the formation of SHBs between amino acids. This model, which we name as machine learning assisted prediction of short hydrogen bonds (MAPSHB), takes into account 21 structural, chemical and sequence features and their interaction effects and effectively categorizes each hydrogen bond in a protein to a short or normal hydrogen bond. The MAPSHB model reveals that the type of the donor amino acid plays a major role in determining the class of a hydrogen bond and that the side chain Tyr-Asp pair demonstrates a significant probability of forming a SHB. Combining electronic structure calculations and energy decomposition analysis, we elucidate how the interplay of competing intermolecular interactions stabilizes the Tyr-Asp SHBs more than other commonly observed combinations of amino acid side chains. The MAPSHB model, which is freely available on our web server, allows one to accurately and efficiently predict the presence of SHBs given a protein structure with moderate or low resolution and will facilitate the experimental and computational refinement of protein structures.  more » « less
Award ID(s):
1904800
PAR ID:
10361616
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Reports
Volume:
12
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The three-dimensional architecture of biomolecules often creates specialized structural elements, notably short hydrogen bonds that have donor–acceptor separations below 2.7 Å. In this work, we statistically analyze 1663 high-resolution biomolecular structures from the Protein Data Bank and demonstrate that short hydrogen bonds are prevalent in proteins, protein–ligand complexes and nucleic acids. From these biological macromolecules, we characterize the preferred location, connectivity and amino acid composition in short hydrogen bonds and hydrogen bond networks, and assess their possible functional importance. Using electronic structure calculations, we further uncover how the interplay of the structural and chemical features determines the proton potential energy surfaces and proton sharing conditions in biological short hydrogen bonds. 
    more » « less
  2. Abstract There are continuous efforts to elucidate the structure and biological functions of short hydrogen bonds (SHBs), whose donor and acceptor heteroatoms reside more than 0.3 Å closer than the sum of their van der Waals radii. In this work, we evaluate 1070 atomic-resolution protein structures and characterize the common chemical features of SHBs formed between the side chains of amino acids and small molecule ligands. We then develop a machine learning assisted prediction of protein-ligand SHBs (MAPSHB-Ligand) model and reveal that the types of amino acids and ligand functional groups as well as the sequence of neighboring residues are essential factors that determine the class of protein-ligand hydrogen bonds. The MAPSHB-Ligand model and its implementation on our web server enable the effective identification of protein-ligand SHBs in proteins, which will facilitate the design of biomolecules and ligands that exploit these close contacts for enhanced functions. 
    more » « less
  3. null (Ed.)
    Hydrogen bonds (HBs) play an essential role in the structure and catalytic action of enzymes, but a complete understanding of HBs in proteins challenges the resolution of modern structural ( i.e. , X-ray diffraction) techniques and mandates computationally demanding electronic structure methods from correlated wavefunction theory for predictive accuracy. Numerous amino acid sidechains contain functional groups ( e.g. , hydroxyls in Ser/Thr or Tyr and amides in Asn/Gln) that can act as either HB acceptors or donors (HBA/HBD) and even form simultaneous, ambifunctional HB interactions. To understand the relative energetic benefit of each interaction, we characterize the potential energy surfaces of representative model systems with accurate coupled cluster theory calculations. To reveal the relationship of these energetics to the balance of these interactions in proteins, we curate a set of 4000 HBs, of which >500 are ambifunctional HBs, in high-resolution protein structures. We show that our model systems accurately predict the favored HB structural properties. Differences are apparent in HBA/HBD preference for aromatic Tyr versus aliphatic Ser/Thr hydroxyls because Tyr forms significantly stronger O–H⋯O HBs than N–H⋯O HBs in contrast to comparable strengths of the two for Ser/Thr. Despite this residue-specific distinction, all models of residue pairs indicate an energetic benefit for simultaneous HBA and HBD interactions in an ambifunctional HB. Although the stabilization is less than the additive maximum due both to geometric constraints and many-body electronic effects, a wide range of ambifunctional HB geometries are more favorable than any single HB interaction. 
    more » « less
  4. Abstract Cysteamine dioxygenase (ADO) is a thiol dioxygenase whose study has been stagnated by the ambiguity as to whether or not it possesses an anticipated protein‐derived cofactor. Reported herein is the discovery and elucidation of a Cys‐Tyr cofactor in human ADO, crosslinked between Cys220 and Tyr222 through a thioether (C−S) bond. By genetically incorporating an unnatural amino acid, 3,5‐difluoro‐tyrosine (F2‐Tyr), specifically into Tyr222 of human ADO, an autocatalytic oxidative carbon–fluorine bond activation and fluoride release were identified by mass spectrometry and19F NMR spectroscopy. These results suggest that the cofactor biogenesis is executed by a powerful oxidant during an autocatalytic process. Unlike that of cysteine dioxygenase, the crosslinking results in a minimal structural change of the protein and it is not detectable by routine low‐resolution techniques. Finally, a new sequence motif, C‐X‐Y‐Y(F), is proposed for identifying the Cys‐Tyr crosslink. 
    more » « less
  5. Abstract Cysteamine dioxygenase (ADO) is a thiol dioxygenase whose study has been stagnated by the ambiguity as to whether or not it possesses an anticipated protein‐derived cofactor. Reported herein is the discovery and elucidation of a Cys‐Tyr cofactor in human ADO, crosslinked between Cys220 and Tyr222 through a thioether (C−S) bond. By genetically incorporating an unnatural amino acid, 3,5‐difluoro‐tyrosine (F2‐Tyr), specifically into Tyr222 of human ADO, an autocatalytic oxidative carbon–fluorine bond activation and fluoride release were identified by mass spectrometry and19F NMR spectroscopy. These results suggest that the cofactor biogenesis is executed by a powerful oxidant during an autocatalytic process. Unlike that of cysteine dioxygenase, the crosslinking results in a minimal structural change of the protein and it is not detectable by routine low‐resolution techniques. Finally, a new sequence motif, C‐X‐Y‐Y(F), is proposed for identifying the Cys‐Tyr crosslink. 
    more » « less