Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available March 25, 2026
-
This study presents an enhanced protein design algorithm that aims to emulate natural heterogeneity of protein sequences. Initial analysis revealed that natural proteins exhibit a permutation composition lower than the theoretical maximum, suggesting a selective utilization of the 20-letter amino acid alphabet. By not constraining the amino acid composition of the protein sequence but instead allowing random reshuffling of the composition, the resulting design algorithm generates sequences that maintain lower permutation compositions in equilibrium, aligning closely with natural proteins. Folding free energy computations demonstrated that the designed sequences refold to their native structures with high precision, except for proteins with large disordered regions. In addition, direct coupling analysis showed a strong correlation between predicted and actual protein contacts, with accuracy exceeding 82% for a large number of top pairs (>4L). The algorithm also resolved biases in previous designs, ensuring a more accurate representation of protein interactions. Overall, it not only mimics the natural heterogeneity of proteins but also ensures correct folding, marking a significant advancement in protein design and engineering.more » « less
-
Machine learning has been proposed as an alternative to theoretical modeling when dealing with complex problems in biological physics. However, in this perspective, we argue that a more successful approach is a proper combination of these two methodologies. We discuss how ideas coming from physical modeling neuronal processing led to early formulations of computational neural networks, e.g., Hopfield networks. We then show how modern learning approaches like Potts models, Boltzmann machines, and the transformer architecture are related to each other, specifically, through a shared energy representation. We summarize recent efforts to establish these connections and provide examples on how each of these formulations integrating physical modeling and machine learning have been successful in tackling recent problems in biomolecular structure, dynamics, function, evolution, and design. Instances include protein structure prediction; improvement in computational complexity and accuracy of molecular dynamics simulations; better inference of the effects of mutations in proteins leading to improved evolutionary modeling and finally how machine learning is revolutionizing protein engineering and design. Going beyond naturally existing protein sequences, a connection to protein design is discussed where synthetic sequences are able to fold to naturally occurring motifs driven by a model rooted in physical principles. We show that this model is “learnable” and propose its future use in the generation of unique sequences that can fold into a target structure.more » « less
-
Protein evolution is guided by structural, functional, and dynamical constraints ensuring organismal viability. Pseudogenes are genomic sequences identified in many eukaryotes that lack translational activity due to sequence degradation and thus over time have undergone “devolution.” Previously pseudogenized genes sometimes regain their protein-coding function, suggesting they may still encode robust folding energy landscapes despite multiple mutations. We study both the physical folding landscapes of protein sequences corresponding to human pseudogenes using the Associative Memory, Water Mediated, Structure and Energy Model, and the evolutionary energy landscapes obtained using direct coupling analysis (DCA) on their parent protein families. We found that generally mutations that have occurred in pseudogene sequences have disrupted their native global network of stabilizing residue interactions, making it harder for them to fold if they were translated. In some cases, however, energetic frustration has apparently decreased when the functional constraints were removed. We analyzed this unexpected situation for Cyclophilin A, Profilin-1, and Small Ubiquitin-like Modifier 2 Protein. Our analysis reveals that when such mutations in the pseudogene ultimately stabilize folding, at the same time, they likely alter the pseudogenes’ former biological activity, as estimated by DCA. We localize most of these stabilizing mutations generally to normally frustrated regions required for binding to other partners.more » « less
-
Abstract. The recently developed average latitudinal displacement (ALD) methodology is applied to assess the waviness of the austral-winter subtropical and polar jets using three different reanalysis data sets. As in the wintertime Northern Hemisphere, both jets in the Southern Hemisphere have become systematically wavier over the time series and the waviness of each jet evolves quite independently of the other during most cold seasons. Also, like its Northern Hemisphere equivalent, the Southern Hemisphere polar jet exhibits no trend in speed (though it is notably slower), while its poleward shift is statistically significant. In contrast to its Northern Hemisphere counterpart, the austral subtropical jet has undergone both a systematic increase in speed and a statistically significant poleward migration. Composite differences between the waviest and least wavy seasons for each species suggest that the Southern Hemisphere's lower-stratospheric polar vortex is negatively impacted by unusually wavy tropopause-level jets of either species. These results are considered in the context of trends in the Southern Annular Mode as well as the findings of other related studies.more » « less
-
Abstract Variational autoencoders are unsupervised learning models with generative capabilities, when applied to protein data, they classify sequences by phylogeny and generate de novo sequences which preserve statistical properties of protein composition. While previous studies focus on clustering and generative features, here, we evaluate the underlying latent manifold in which sequence information is embedded. To investigate properties of the latent manifold, we utilize direct coupling analysis and a Potts Hamiltonian model to construct a latent generative landscape. We showcase how this landscape captures phylogenetic groupings, functional and fitness properties of several systems including Globins,β-lactamases, ion channels, and transcription factors. We provide support on how the landscape helps us understand the effects of sequence variability observed in experimental data and provides insights on directed and natural protein evolution. We propose that combining generative properties and functional predictive power of variational autoencoders and coevolutionary analysis could be beneficial in applications for protein engineering and design.more » « less
-
Abstract Sensitivity of ecosystem productivity to climate variability is a critical component of ecosystem resilience to climate change. Variation in ecosystem sensitivity is influenced by many variables. Here we investigate the effect of bedrock lithology and weathering products on the sensitivity of ecosystem productivity to variation in climate water deficit using Bayesian statistical models. Two thirds of terrestrial ecosystems exhibit negative sensitivity, where productivity decreases with increased climate water deficit, while the other third exhibit positive sensitivity. Variation in ecosystem sensitivity is significantly affected by regolith porosity and permeability and regolith and soil thickness, indicating that lithology, through its control on water holding capacity, exerts important controls on ecosystem sensitivity. After accounting for effects of these four variables, significant differences in sensitivity remain among ecosystems on different rock types, indicating the complexity of bedrock effects. Our analysis suggests that regolith affects ecosystem sensitivity to climate change worldwide and thus their resilience.more » « less
-
Previous research regarding the intraseasonal variability of the wintertime Pacific jet has employed empiri- cal orthogonal function (EOF)/principal component (PC) analysis to characterize two leading modes of variability: a zonal extension or retraction and a ;208 meridional shift of the jet exit region. These leading modes are intimately tied to the large-scale structure, sensible weather phenomena, and forecast skill in and around the vast North Pacific basin. However, variability within the wintertime Pacific jet and the relative importance of tropical and extratropical processes in driving such variability, is poorly understood. Here, a self-organizing maps (SOM) analysis is applied to 73 Northern Hemisphere cold seasons of 250-hPa zonal winds from the NCEP–NCAR reanalysis data to identify 12 characteristic physical jet states, some of which resemble the leading EOF Pacific jet patterns and combinations of them. Examination of teleconnection patterns such as El Nin ̃o–Southern Oscillation (ENSO) and the Madden–Julian oscillation (MJO) provide insight into the varying nature of the 12 SOM nodes at inter- and intraseasonal time scales. These relationships suggest that the hitherto more common EOF/PC analysis of jet variability obscures important subtleties of jet structure, revealed by the SOM analy- sis, which bear on the underlying physical processes associated with Pacific jet variability as well as the nature of its down- stream impacts.more » « less
An official website of the United States government

Full Text Available