skip to main content


Title: Implementation of adaptive integration method for free energy calculations in molecular systems
Estimating free energy differences by computer simulation is useful for a wide variety of applications such as virtual screening for drug design and for understanding how amino acid mutations modify protein interactions. However, calculating free energy differences remains challenging and often requires extensive trial and error and very long simulation times in order to achieve converged results. Here, we present an implementation of the adaptive integration method (AIM). We tested our implementation on two molecular systems and compared results from AIM to those from a suite of other methods. The model systems tested here include calculating the solvation free energy of methane, and the free energy of mutating the peptide GAG to GVG. We show that AIM is more efficient than other tested methods for these systems, that is, AIM results converge to a higher level of accuracy and precision for a given simulation time.  more » « less
Award ID(s):
1736253
NSF-PAR ID:
10291357
Author(s) / Creator(s):
;
Date Published:
Journal Name:
PeerJ
ISSN:
2167-8359
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Context. Ambipolar diffusion is a physical mechanism related to the drift between charged and neutral particles in a partially ionized plasma that is key to many different astrophysical systems. However, understanding its effects is challenging due to basic uncertainties concerning relevant microphysical aspects and the strong constraints it imposes on the numerical modeling. Aims. Our aim is to introduce a numerical tool that allows us to address complex problems involving ambipolar diffusion in which, additionally, departures from ionization equilibrium are important or high resolution is needed. The primary application of this tool is for solar atmosphere calculations, but the methods and results presented here may also have a potential impact on other astrophysical systems. Methods. We have developed a new module for the stellar atmosphere Bifrost code that improves its computational capabilities of the ambipolar diffusion term in the generalized Ohm’s law. This module includes, among other things, collision terms adequate to processes in the coolest regions in the solar chromosphere. As the main feature of the module, we have implemented the super time stepping (STS) technique, which allows an important acceleration of the calculations. We have also introduced hyperdiffusion terms to guarantee the stability of the code. Results. We show that to have an accurate value for the ambipolar diffusion coefficient in the solar atmosphere it is necessary to include as atomic elements in the equation of state not only hydrogen and helium, but also the main electron donors like sodium, silicon, and potassium. In addition, we establish a range of criteria to set up an automatic selection of the free parameters of the STS method that guarantees the best performance, optimizing the stability and speed for the ambipolar diffusion calculations. We validate the STS implementation by comparison with a self-similar analytical solution. 
    more » « less
  2. Abstract Aim

    Nitrogen (N)‐fixing plants are an important component of global plant communities, but the drivers of N‐fixing plant diversity, especially in temperate regions, remain underexplored. Here, we examined broad‐scale patterns of N‐fixing and non‐fixing plant phylogenetic diversity (PD) and species richness (SR) across a wide portion of temperate North America, focusing on relationships with soil N and aridity. We also tested whether exotic species, with and without N‐fixing symbiosis, have fewer abiotic limitations compared with native species.

    Location

    USA and Puerto Rico.

    Time period

    Current.

    Major taxa studied

    Vascular plants, focusing on N‐fixing groups (orders Fabales, Fagales, Rosales and Cucurbitales).

    Methods

    We subset National Ecological Observatory Network (NEON) plant plot data from all sites along two axes (N fixing–non‐N fixing and native–exotic), calculating plot‐level SR, PD and mean pairwise phylogenetic distance (MPD). We then used linear mixed models to investigate relationships between diversity values and key soil measurements, along with aridity, temperature and fire frequency.

    Results

    Aridity was the sole predictor of proportional phylogenetic diversity of N fixers. The SR of N fixers still decreased marginally in arid regions, whereas native N‐fixer MPD increased with aridity, indicative of unique lineages of N fixers in the driest conditions, in contrast to native non‐N fixers. The SR of both native N fixers and non‐N fixers increased in low‐N soils. Aridity did not affect SR of exotic non‐N fixers, unlike other groups, whereas exotic N fixers showed lower MPD in increasingly high‐N soils, suggesting filtering, contrary what was found for native N fixers.

    Main conclusions

    Our results suggest that it is not nitrogen, or any soil nutrient, that has the strongest effect on the relative success of N fixers in plant communities. Rather, aridity is the key driver, at least for native species, in line with empirical results from other biomes and increased understanding of N fixation as a key mechanism to avoid water loss.

     
    more » « less
  3. Abstract

    The estimation of changes in free energy upon mutation is central to the problem of protein design. Modern protein design methods have had remarkable success over a wide range of design targets, but are reaching their limits in ligand binding and enzyme design due to insufficient accuracy in mutational free energies. Alchemical free energy calculations have the potential to supplement modern design methods through more accurate molecular dynamics based prediction of free energy changes, but suffer from high computational cost. Multisiteλdynamics (MSλD) is a particularly efficient and scalable free energy method with potential to explore combinatorially large sequence spaces inaccessible with other free energy methods. This work aims to quantify the accuracy of MSλD and demonstrate its scalability. We apply MSλD to the classic problem of calculating folding free energies in T4 lysozyme, a system with a wealth of experimental measurements. Single site mutants considering 32 mutations show remarkable agreement with experiment with a Pearson correlation of 0.914 and mean unsigned error of 1.19 kcal/mol. Multisite mutants in systems with up to five concurrent mutations spanning 240 different sequences show comparable agreement with experiment. These results demonstrate the promise of MSλD in exploring large sequence spaces for protein design.

     
    more » « less
  4. Abstract Background

    Repetitive action, resistance to environmental change and fine motor disruptions are hallmarks of autism spectrum disorder (ASD) and other neurodevelopmental disorders, and vary considerably from individual to individual. In animal models, conventional behavioral phenotyping captures such fine-scale variations incompletely. Here we observed male and female C57BL/6J mice to methodically catalog adaptive movement over multiple days and examined two rodent models of developmental disorders against this dynamic baseline. We then investigated the behavioral consequences of a cerebellum-specific deletion in Tsc1 protein and a whole-brain knockout in Cntnap2 protein in mice. Both of these mutations are found in clinical conditions and have been associated with ASD.

    Methods

    We used advances in computer vision and deep learning, namely a generalized form of high-dimensional statistical analysis, to develop a framework for characterizing mouse movement on multiple timescales using a single popular behavioral assay, the open-field test. The pipeline takes virtual markers from pose estimation to find behavior clusters and generate wavelet signatures of behavior classes. We measured spatial and temporal habituation to a new environment across minutes and days, different types of self-grooming, locomotion and gait.

    Results

    Both Cntnap2 knockouts and L7-Tsc1 mutants showed forelimb lag during gait. L7-Tsc1 mutants and Cntnap2 knockouts showed complex defects in multi-day adaptation, lacking the tendency of wild-type mice to spend progressively more time in corners of the arena. In L7-Tsc1 mutant mice, failure to adapt took the form of maintained ambling, turning and locomotion, and an overall decrease in grooming. However, adaptation in these traits was similar between wild-type mice and Cntnap2 knockouts. L7-Tsc1 mutant and Cntnap2 knockout mouse models showed different patterns of behavioral state occupancy.

    Limitations

    Genetic risk factors for autism are numerous, and we tested only two. Our pipeline was only done under conditions of free behavior. Testing under task or social conditions would reveal more information about behavioral dynamics and variability.

    Conclusions

    Our automated pipeline for deep phenotyping successfully captures model-specific deviations in adaptation and movement as well as differences in the detailed structure of behavioral dynamics. The reported deficits indicate that deep phenotyping constitutes a robust set of ASD symptoms that may be considered for implementation in clinical settings as quantitative diagnosis criteria.

     
    more » « less
  5. Abstract Background

    Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods.

    Methods

    Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n = 94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n = 40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction.

    Results

    Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC) = 0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance.

    Conclusions

    We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.

     
    more » « less