The recent breakthroughs in structure prediction, where methods such as AlphaFold demonstrated near‐atomic accuracy, herald a paradigm shift in structural biology. The 200 million high‐accuracy models released in the AlphaFold Database are expected to guide protein science in the coming decades. Partitioning these AlphaFold models into domains and assigning them to an evolutionary hierarchy provide an efficient way to gain functional insights into proteins. However, classifying such a large number of predicted structures challenges the infrastructure of current structure classifications, including our Evolutionary Classification of protein Domains (ECOD). Better computational tools are urgently needed to parse and classify domains from AlphaFold models automatically. Here we present a Domain Parser for AlphaFold Models (DPAM) that can automatically recognize globular domains from these models based on inter‐residue distances in 3D structures, predicted aligned errors, and ECOD domains found by sequence (HHsuite) and structural (Dali) similarity searches. Based on a benchmark of 18,759 AlphaFold models, we demonstrate that DPAM can recognize 98.8% of domains and assign correct boundaries for 87.5%, significantly outperforming structure‐based domain parsers and homology‐based domain assignment using ECOD domains found by HHsuite or Dali. Application of DPAM to the massive AlphaFold models will enable efficient classification of domains, providing evolutionary contexts and facilitating functional studies.
- Award ID(s):
- 1825254
- NSF-PAR ID:
- 10312846
- Editor(s):
- Estrada, Ernesto
- Date Published:
- Journal Name:
- Journal of Complex Networks
- Volume:
- 10
- Issue:
- 1
- ISSN:
- 2051-1310
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
Abstract Cellular studies indicate that endocannabinoid type‐1 retrograde signaling plays a major role in synaptic plasticity. Disruption of these processes by delta‐9‐tetrahydrocannabinol (THC) could produce alterations either in structural and functional brain connectivity or in their association in cannabis (CB) users. Graph theoretic structural and functional networks were generated with diffusion tensor imaging and resting‐state functional imaging in 37 current CB users and 31 healthy non‐users. The primary outcome measures were coupling between structural and functional connectivity, global network characteristics, association between the coupling and network properties, and measures of rich‐club organization. Structural–functional (SC–FC) coupling was globally preserved showing a positive association in current CB users. However, the users had disrupted associations between SC–FC coupling and network topological characteristics, most perturbed for shorter connections implying region‐specific disruption by CB use. Rich‐club analysis revealed impaired SC–FC coupling in the hippocampus and caudate of users. This study provides evidence of the abnormal SC–FC association in CB users. The effect was predominant in shorter connections of the brain network, suggesting that the impact of CB use or predispositional factors may be most apparent in local interconnections. Notably, the hippocampus and caudate specifically showed aberrant structural and functional coupling. These structures have high CB1 receptor density and may also be associated with changes in learning and habit formation that occur with chronic cannabis use.
-
Abstract Although first principles based anharmonic lattice dynamics is one of the most common methods to obtain phonon properties, such method is impractical for high-throughput search of target thermal materials. We develop an elemental spatial density neural network force field as a bottom-up approach to accurately predict atomic forces of ~80,000 cubic crystals spanning 63 elements. The primary advantage of our indirect machine learning model is the accessibility of phonon transport physics at the same level as first principles, allowing simultaneous prediction of comprehensive phonon properties from a single model. Training on 3182 first principles data and screening 77,091 unexplored structures, we identify 13,461 dynamically stable cubic structures with ultralow lattice thermal conductivity below 1 Wm −1 K −1 , among which 36 structures are validated by first principles calculations. We propose mean square displacement and bonding-antibonding as two low-cost descriptors to ease the demand of expensive first principles calculations for fast screening ultralow thermal conductivity. Our model also quantitatively reveals the correlation between off-diagonal coherence and diagonal populations and identifies the distinct crossover from particle-like to wave-like heat conduction. Our algorithm is promising for accelerating discovery of novel phononic crystals for emerging applications, such as thermoelectrics, superconductivity, and topological phonons for quantum information technology.more » « less
-
Cryo-electron microscopy (cryo-EM) is a structural technique that has played a significant role in protein structure determination in recent years. Compared to the traditional methods of X-ray crystallography and NMR spectroscopy, cryo-EM is capable of producing images of much larger protein complexes. However, cryo-EM reconstructions are limited to medium-resolution (~4–10 Å) for some cases. At this resolution range, a cryo-EM density map can hardly be used to directly determine the structure of proteins at atomic level resolutions, or even at their amino acid residue backbones. At such a resolution, only the position and orientation of secondary structure elements (SSEs) such as α-helices and β-sheets are observable. Consequently, finding the mapping of the secondary structures of the modeled structure (SSEs-A) to the cryo-EM map (SSEs-C) is one of the primary concerns in cryo-EM modeling. To address this issue, this study proposes a novel automatic computational method to identify SSEs correspondence in three-dimensional (3D) space. Initially, through a modeling of the target sequence with the aid of extracting highly reliable features from a generated 3D model and map, the SSEs matching problem is formulated as a 3D vector matching problem. Afterward, the 3D vector matching problem is transformed into a 3D graph matching problem. Finally, a similarity-based voting algorithm combined with the principle of least conflict (PLC) concept is developed to obtain the SSEs correspondence. To evaluate the accuracy of the method, a testing set of 25 experimental and simulated maps with a maximum of 65 SSEs is selected. Comparative studies are also conducted to demonstrate the superiority of the proposed method over some state-of-the-art techniques. The results demonstrate that the method is efficient, robust, and works well in the presence of errors in the predicted secondary structures of the cryo-EM images.more » « less
-
Abstract Structures of proteins and protein–protein complexes are determined by the same physical principles and thus share a number of similarities. At the same time, there could be differences because in order to function, proteins interact with other molecules, undergo conformations changes, and so forth, which might impose different restraints on the tertiary versus quaternary structures. This study focuses on structural properties of protein–protein interfaces in comparison with the protein core, based on the wealth of currently available structural data and new structure‐based approaches. The results showed that physicochemical characteristics, such as amino acid composition, residue–residue contact preferences, and hydrophilicity/hydrophobicity distributions, are similar in protein core and protein–protein interfaces. On the other hand, characteristics that reflect the evolutionary pressure, such as structural composition and packing, are largely different. The results provide important insight into fundamental properties of protein structure and function. At the same time, the results contribute to better understanding of the ways to dock proteins. Recent progress in predicting structures of individual proteins follows the advancement of deep learning techniques and new approaches to residue coevolution data. Protein core could potentially provide large amounts of data for application of the deep learning to docking. However, our results showed that the core motifs are significantly different from those at protein–protein interfaces, and thus may not be directly useful for docking. At the same time, such difference may help to overcome a major obstacle in application of the coevolutionary data to docking—discrimination of the intramolecular information not directly relevant to docking.