Abstract Complex biological, neuroscience, geoscience and social networks exhibit heterogeneous self-similar higher order topological structures that are usually characterized as being multifractal in nature. However, describing their topological complexity through a compact mathematical description and deciphering their topological governing rules has remained elusive and prevented a comprehensive understanding of networks. To overcome this challenge, we propose a weighted multifractal graph model capable of capturing the underlying generating rules of complex systems and characterizing their node heterogeneity and pairwise interactions. To infer the generating measure with hidden information, we introduce a variational expectation maximization framework. We demonstrate the robustness of the network generator reconstruction as a function of model properties, especially in noisy and partially observed scenarios. The proposed network generator inference framework is able to reproduce network properties, differentiate varying structures in brain networks and chromosomal interactions, and detect topologically associating domain regions in conformation maps of the human genome.
more »
« less
Could network structures generated with simple rules imposed on a cubic lattice reproduce the structural descriptors of globular proteins?
Abstract A direct way to spot structural features that are universally shared among proteins is to find analogues from simpler condensed matter systems. In the current study, the feasibility of creating ensembles of artificial structures that can automatically reproduce a large number of geometrical and topological descriptors of globular proteins is investigated. Towards this aim, a simple cubic (SC) arrangement is shown to provide the best background lattice after a careful analysis of the residue packing trends from 210 globular proteins. It is shown that a minimalistic set of rules imposed on this lattice is sufficient to generate structures that can mimic real proteins. In the proposed method, 210 such structures are generated by randomly removing residues (beads) from clusters that have a SC lattice arrangement such that all the generated structures have single connected components. Two additional sets are prepared from the initial structures via random relaxation and a reverse Monte Carlo simulated annealing algorithm, which targets the average radial distribution function (RDF) of 210 globular proteins. The initial and relaxed structures are compared to real proteins via RDF, bond orientational order parameters and several descriptors of network topology. Based on these features, results indicate that the structures generated with 40% occupancy closely resemble real residue networks. The structure generation mechanism automatically produces networks that are in the same topological class as globular proteins and reproduce small-world characteristics of high clustering and small shortest path lengths. Most notably, the established correspondence rules out icosahedral order as a relevant structural feature for residue networks in contrast to other amorphous systems where it is an inherent characteristic. The close correspondence is also observed in the vibrational characteristics as computed from the Anisotropic Network Model, therefore hinting at a non-superficial link between the proteins and the defect laden cubic crystalline order.
more »
« less
- Award ID(s):
- 1825254
- PAR ID:
- 10312846
- Editor(s):
- Estrada, Ernesto
- Date Published:
- Journal Name:
- Journal of Complex Networks
- Volume:
- 10
- Issue:
- 1
- ISSN:
- 2051-1310
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Although first principles based anharmonic lattice dynamics is one of the most common methods to obtain phonon properties, such method is impractical for high-throughput search of target thermal materials. We develop an elemental spatial density neural network force field as a bottom-up approach to accurately predict atomic forces of ~80,000 cubic crystals spanning 63 elements. The primary advantage of our indirect machine learning model is the accessibility of phonon transport physics at the same level as first principles, allowing simultaneous prediction of comprehensive phonon properties from a single model. Training on 3182 first principles data and screening 77,091 unexplored structures, we identify 13,461 dynamically stable cubic structures with ultralow lattice thermal conductivity below 1 Wm −1 K −1 , among which 36 structures are validated by first principles calculations. We propose mean square displacement and bonding-antibonding as two low-cost descriptors to ease the demand of expensive first principles calculations for fast screening ultralow thermal conductivity. Our model also quantitatively reveals the correlation between off-diagonal coherence and diagonal populations and identifies the distinct crossover from particle-like to wave-like heat conduction. Our algorithm is promising for accelerating discovery of novel phononic crystals for emerging applications, such as thermoelectrics, superconductivity, and topological phonons for quantum information technology.more » « less
-
Abstract Recombination directionality factors (RDFs) for large serine integrases (LSIs) are cofactor proteins that control the directionality of recombination to favour excision over insertion. Although RDFs are predicted to bind their cognate LSIs in similar ways, there is no overall common structural theme across LSI RDFs, leading to the suggestion that some of them may be moonlighting proteins with other primary functions. To test this hypothesis, we searched for characterized proteins with structures similar to the predicted structures of known RDFs. Our search shows that the RDFs for two LSIs, TG1 integrase and Bxb1 integrase, show high similarities to a single-stranded DNA binding (SSB) protein and an editing exonuclease, respectively. We present experimental data to show that Bxb1 RDF is probably an exonuclease and TG1 RDF is a functional SSB protein. We used mutational analysis to validate the integrase-RDF interface predicted by AlphaFold2 multimer for TG1 integrase and its RDF, and establish that control of recombination directionality is mediated via protein–protein interaction at the junction of recombinase’s second DNA binding domain and the base of the coiled-coil domain.more » « less
-
Abstract The recent breakthroughs in structure prediction, where methods such as AlphaFold demonstrated near‐atomic accuracy, herald a paradigm shift in structural biology. The 200 million high‐accuracy models released in the AlphaFold Database are expected to guide protein science in the coming decades. Partitioning these AlphaFold models into domains and assigning them to an evolutionary hierarchy provide an efficient way to gain functional insights into proteins. However, classifying such a large number of predicted structures challenges the infrastructure of current structure classifications, including our Evolutionary Classification of protein Domains (ECOD). Better computational tools are urgently needed to parse and classify domains from AlphaFold models automatically. Here we present a Domain Parser for AlphaFold Models (DPAM) that can automatically recognize globular domains from these models based on inter‐residue distances in 3D structures, predicted aligned errors, and ECOD domains found by sequence (HHsuite) and structural (Dali) similarity searches. Based on a benchmark of 18,759 AlphaFold models, we demonstrate that DPAM can recognize 98.8% of domains and assign correct boundaries for 87.5%, significantly outperforming structure‐based domain parsers and homology‐based domain assignment using ECOD domains found by HHsuite or Dali. Application of DPAM to the massive AlphaFold models will enable efficient classification of domains, providing evolutionary contexts and facilitating functional studies.more » « less
-
Abstract The molecular basis underlying the rich phase behavior of globular proteins remains poorly understood. We use atomistic multiscale molecular simulations to model the solution‐state conformational dynamics and interprotein interactions of D‐crystallin and its P23T‐R36S mutant, which drastically limits the protein solubility, at both infinite dilution and at a concentration where the mutant fluid phase and crystalline phase coexist. We find that while the mutant conserves the protein fold, changes to the surface exposure of residues in the neighborhood of residue‐36 enhance protein–protein interactions and develop specific protein–protein contacts found in the protein crystal lattice.more » « less
An official website of the United States government

