skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 10:00 PM ET on Friday, December 8 until 2:00 AM ET on Saturday, December 9 due to maintenance. We apologize for the inconvenience.

Title: Chemical features and machine learning assisted predictions of protein-ligand short hydrogen bonds

There are continuous efforts to elucidate the structure and biological functions of short hydrogen bonds (SHBs), whose donor and acceptor heteroatoms reside more than 0.3 Å closer than the sum of their van der Waals radii. In this work, we evaluate 1070 atomic-resolution protein structures and characterize the common chemical features of SHBs formed between the side chains of amino acids and small molecule ligands. We then develop a machine learning assisted prediction of protein-ligand SHBs (MAPSHB-Ligand) model and reveal that the types of amino acids and ligand functional groups as well as the sequence of neighboring residues are essential factors that determine the class of protein-ligand hydrogen bonds. The MAPSHB-Ligand model and its implementation on our web server enable the effective identification of protein-ligand SHBs in proteins, which will facilitate the design of biomolecules and ligands that exploit these close contacts for enhanced functions.

more » « less
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Reports
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Short hydrogen bonds (SHBs), whose donor and acceptor heteroatoms lie within 2.7 Å, exhibit prominent quantum mechanical characters and are connected to a wide range of essential biomolecular processes. However, exact determination of the geometry and functional roles of SHBs requires a protein to be at atomic resolution. In this work, we analyze 1260 high-resolution peptide and protein structures from the Protein Data Bank and develop a boosting based machine learning model to predict the formation of SHBs between amino acids. This model, which we name as machine learning assisted prediction of short hydrogen bonds (MAPSHB), takes into account 21 structural, chemical and sequence features and their interaction effects and effectively categorizes each hydrogen bond in a protein to a short or normal hydrogen bond. The MAPSHB model reveals that the type of the donor amino acid plays a major role in determining the class of a hydrogen bond and that the side chain Tyr-Asp pair demonstrates a significant probability of forming a SHB. Combining electronic structure calculations and energy decomposition analysis, we elucidate how the interplay of competing intermolecular interactions stabilizes the Tyr-Asp SHBs more than other commonly observed combinations of amino acid side chains. The MAPSHB model, which is freely available on our web server, allows one to accurately and efficiently predict the presence of SHBs given a protein structure with moderate or low resolution and will facilitate the experimental and computational refinement of protein structures.

    more » « less
  2. The three-dimensional architecture of biomolecules often creates specialized structural elements, notably short hydrogen bonds that have donor–acceptor separations below 2.7 Å. In this work, we statistically analyze 1663 high-resolution biomolecular structures from the Protein Data Bank and demonstrate that short hydrogen bonds are prevalent in proteins, protein–ligand complexes and nucleic acids. From these biological macromolecules, we characterize the preferred location, connectivity and amino acid composition in short hydrogen bonds and hydrogen bond networks, and assess their possible functional importance. Using electronic structure calculations, we further uncover how the interplay of the structural and chemical features determines the proton potential energy surfaces and proton sharing conditions in biological short hydrogen bonds. 
    more » « less
  3. Abstract

    Broadly useful chiroptical enantiomeric excess (ee) sensing remains challenging and typically involves carefully designed molecular receptors or supramolecular assemblies. Herein, we report on the enantioselective sensing of 35 amino acids, amino phosphonic acids, hydroxy acids, amino alcohols, and diamines with an auxiliary‐free cobalt probe. Chiroptical analysis of the enantiomeric composition and concentration of minute sample amounts was achieved with high accuracy by using earth‐abundant cobalt salts and hydrogen peroxide as the oxidant. Despite the absence of an auxiliary ligand, the cobalt assay is applicable to aromatic and aliphatic compounds and yields strong CD signals at high wavelengths. This method obviates the general prerequisite for chromophoric metal ligands to generate chiroptical signals through ECCD (exciton‐coupled circular dichroism) effects or through analyte‐to‐ligand chirality induction, and it offers operational simplicity, cost efficiency, waste reduction, and speed.

    more » « less
  4. Hydrogen bonds (HB)s are the most abundant motifs in biological systems. They play a key role in determining protein–ligand binding affinity and selectivity. We designed two pharmaceutically beneficial HB databases, database A including ca. 12,000 protein–ligand complexes with ca. 22,000 HBs and their geometries, and database B including ca. 400 protein–ligand complexes with ca. 2200 HBs, their geometries, and bond strengths determined via our local vibrational mode analysis. We identified seven major HB patterns, which can be utilized as a de novo QSAR model to predict the binding affinity for a specific protein–ligand complex. Glycine was reported as the most abundant amino acid residue in both donor and acceptor profiles, and N–H⋯O was the most frequent HB type found in database A. HBs were preferred to be in the linear range, and linear HBs were identified as the strongest. HBs with HB angles in the range of 100–110°, typically forming intramolecular five-membered ring structures, showed good hydrophobic properties and membrane permeability. Utilizing database B, we found a generalized Badger’s relationship for more than 2200 protein–ligand HBs. In addition, the strength and occurrence maps between each amino acid residue and ligand functional groups open an attractive possibility for a novel drug-design approach and for determining drug selectivity and affinity, and they can also serve as an important tool for the hit-to-lead process. 
    more » « less
  5. Abstract

    Proline residues within proteins lack a traditional hydrogen bond donor. However, the hydrogens of the proline ring are all sterically accessible, with polarized C−H bonds at Hα and Hδ that exhibit greater partial positive character and can be utilized as alternative sites for molecular recognition. C−H/O interactions, between proline C−H bonds and oxygen lone pairs, have been previously identified as modes of recognition within protein structures and for higher‐order assembly of protein structures. In order to better understand intermolecular recognition of proline residues, a series of proline derivatives was synthesized, including 4R‐hydroxyproline nitrobenzoate methyl ester, acylated on the proline nitrogen with bromoacetyl and glycolyl groups, and Boc‐4S‐(4‐iodophenyl)hydroxyproline methyl amide. All three derivatives exhibited multiple close intermolecular C−H/O interactions in the crystallographic state, with H⋅⋅⋅O distances as close as 2.3 Å. These observed distances are well below the 2.72 Å sum of the van der Waals radii of H and O, and suggest that these interactions are particularly favorable. In order to generalize these results, we further analyzed the role of C−H/O interactions in all previously crystallized derivatives of these amino acids, and found that all 26 structures exhibited close intermolecular C−H/O interactions. Finally, we analyzed all proline residues in the Cambridge Structural Database of small‐molecule crystal structures. We found that the majority of these structures exhibited intermolecular C−H/O interactions at proline C−H bonds, suggesting that C−H/O interactions are an inherent and important mode for recognition of and higher‐order assembly at proline residues. Due to steric accessibility and multiple polarized C−H bonds, proline residues are uniquely positioned as sites for binding and recognition via C−H/O interactions.

    more » « less