A database of calculated solution parameters for the AlphaFold predicted protein structures

Brookes, Emre; Rocco, Mattia

doi:10.1038/s41598-022-10607-z

Abstract

Recent spectacular advances by AI programs in 3D structure predictions from protein sequences have revolutionized the field in terms of accuracy and speed. The resulting “folding frenzy” has already produced predicted protein structure databases for the entire human and other organisms’ proteomes. However, rapidly ascertaining a predicted structure’s reliability based on measured properties in solution should be considered. Shape-sensitive hydrodynamic parameters such as the diffusion and sedimentation coefficients ($${D_{t(20,w)}^{0}}$$ $D_{t (20, w)}^{0}$ ,$${s_{{\left( {{20},w} \right)}}^{{0}} }$$ $s_{(20, w)}^{0}$ ) and the intrinsic viscosity ([η]) can provide a rapid assessment of the overall structure likeliness, and SAXS would yield the structure-related pair-wise distance distribution functionp(r) vs.r. Using the extensively validated UltraScan SOlution MOdeler (US-SOMO) suite, a database was implemented calculating from AlphaFold structures the corresponding$${D_{t(20,w)}^{0}}$$ $D_{t (20, w)}^{0}$ ,$${s_{{\left( {{20},w} \right)}}^{{0}} }$$ $s_{(20, w)}^{0}$ , [η],p(r) vs.r, and other parameters. Circular dichroism spectra were computed using the SESCA program. Some of AlphaFold’s drawbacks were mitigated, such as generating whenever possible a protein’s mature form. Others, like the AlphaFold direct applicability to single-chain structures only, the absence of prosthetic groups, or flexibility issues, are discussed. Overall, this implementation of the US-SOMO-AF database should already aid in rapidly evaluating the consistency in solution of a relevant portion of AlphaFold predicted protein structures.

More Like this