NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Frustraevo: a web server to localize and quantify the conservation of local energetic frustration in protein families

https://doi.org/10.1093/nar/gkae244

Parra, R Gonzalo; Freiberger, Maria I; Poley-Gil, Miriam; Fernandez-Martin, Miguel; Radusky, Leandro G; Ruiz-Serra, Victoria; Wolynes, Peter G; Ferreiro, Diego U; Valencia, Alfonso (April 2024, Nucleic Acids Research)

Abstract According to the Principle of Minimal Frustration, folded proteins can only have a minimal number of strong energetic conflicts in their native states. However, not all interactions are energetically optimized for folding but some remain in energetic conflict, i.e. they are highly frustrated. This remaining local energetic frustration has been shown to be statistically correlated with distinct functional aspects such as protein-protein interaction sites, allosterism and catalysis. Fuelled by the recent breakthroughs in efficient protein structure prediction that have made available good quality models for most proteins, we have developed a strategy to calculate local energetic frustration within large protein families and quantify its conservation over evolutionary time. Based on this evolutionary information we can identify how stability and functional constraints have appeared at the common ancestor of the family and have been maintained over the course of evolution. Here, we present FrustraEvo, a web server tool to calculate and quantify the conservation of local energetic frustration in protein families.
more » « less
Full Text Available
Local energetic frustration conservation in protein families and superfamilies

https://doi.org/10.1038/s41467-023-43801-2

Freiberger, Maria I; Ruiz-Serra, Victoria; Pontes, Camila; Romero-Durana, Miguel; Galaz-Davison, Pablo; Ramírez-Sarmiento, Cesar A; Schuster, Claudio D; Marti, Marcelo A; Wolynes, Peter G; Ferreiro, Diego U; et al (December 2023, Nature Communications)

Energetic local frustration offers a biophysical perspective to interpret the effects of sequence variability on protein families. Here we present a methodology to analyze local frustration patterns within protein families and superfamilies that allows us to uncover constraints related to stability and function, and identify differential frustration patterns in families with a common ancestry. We analyze these signals in very well studied protein families such as PDZ, SH3, ɑ and β globins and RAS families. Recent advances in protein structure prediction make it possible to analyze a vast majority of the protein space. An automatic and unsupervised proteome-wide analysis on the SARS-CoV-2 virus demonstrates the potential of our approach to enhance our understanding of the natural phenotypic diversity of protein families beyond single protein instances. We apply our method to modify biophysical properties of natural proteins based on their family properties, as well as perform unsupervised analysis of large datasets to shed light on the physicochemical signatures of poorly characterized proteins such as the ones belonging to emergent pathogens.
more » « less
Full Text Available
Immune digital twins for complex human pathologies: applications, limitations, and challenges

https://doi.org/10.1038/s41540-024-00450-5

Niarakis, Anna; Laubenbacher, Reinhard; An, Gary; Ilan, Yaron; Fisher, Jasmin; Flobak, Åsmund; Reiche, Kristin; Rodríguez_Martínez, María; Geris, Liesbet; Ladeira, Luiz; et al (December 2024, npj Systems Biology and Applications)

Abstract Digital twins represent a key technology for precision health. Medical digital twins consist of computational models that represent the health state of individual patients over time, enabling optimal therapeutics and forecasting patient prognosis. Many health conditions involve the immune system, so it is crucial to include its key features when designing medical digital twins. The immune response is complex and varies across diseases and patients, and its modelling requires the collective expertise of the clinical, immunology, and computational modelling communities. This review outlines the initial progress on immune digital twins and the various initiatives to facilitate communication between interdisciplinary communities. We also outline the crucial aspects of an immune digital twin design and the prerequisites for its implementation in the clinic. We propose some initial use cases that could serve as “proof of concept” regarding the utility of immune digital technology, focusing on diseases with a very different immune response across spatial and temporal scales (minutes, days, months, years). Lastly, we discuss the use of digital twins in drug discovery and point out emerging challenges that the scientific community needs to collectively overcome to make immune digital twins a reality.
more » « less
Full Text Available
A probabilistic graphical model for system-wide analysis of gene regulatory networks

https://doi.org/10.1093/bioinformatics/btaa122

Kotiang, Stephen; Eslami, Ali; Valencia, Alfonso (February 2020, Bioinformatics)

Abstract Motivation The inference of gene regulatory networks (GRNs) from DNA microarray measurements forms a core element of systems biology-based phenotyping. In the recent past, numerous computational methodologies have been formalized to enable the deduction of reliable and testable predictions in today’s biology. However, little focus has been aimed at quantifying how well existing state-of-the-art GRNs correspond to measured gene-expression profiles. Results Here, we present a computational framework that combines the formulation of probabilistic graphical modeling, standard statistical estimation, and integration of high-throughput biological data to explore the global behavior of biological systems and the global consistency between experimentally verified GRNs and corresponding large microarray compendium data. The model is represented as a probabilistic bipartite graph, which can handle highly complex network systems and accommodates partial measurements of diverse biological entities, e.g. messengerRNAs, proteins, metabolites and various stimulators participating in regulatory networks. This method was tested on microarray expression data from the M3D database, corresponding to sub-networks on one of the best researched model organisms, Escherichia coli. Results show a surprisingly high correlation between the observed states and the inferred system’s behavior under various experimental conditions. Availability and implementation Processed data and software implementation using Matlab are freely available at https://github.com/kotiang54/PgmGRNs. Full dataset available from the M3D database.
more » « less
Full Text Available
Discovering and interpreting transcriptomic drivers of imaging traits using neural networks

https://doi.org/10.1093/bioinformatics/btaa126

Smedley, Nova F; El-Saden, Suzie; Hsu, William; Valencia, Alfonso (February 2020, Bioinformatics)

Abstract Motivation Cancer heterogeneity is observed at multiple biological levels. To improve our understanding of these differences and their relevance in medicine, approaches to link organ- and tissue-level information from diagnostic images and cellular-level information from genomics are needed. However, these ‘radiogenomic’ studies often use linear or shallow models, depend on feature selection, or consider one gene at a time to map images to genes. Moreover, no study has systematically attempted to understand the molecular basis of imaging traits based on the interpretation of what the neural network has learned. These studies are thus limited in their ability to understand the transcriptomic drivers of imaging traits, which could provide additional context for determining clinical outcomes. Results We present a neural network-based approach that takes high-dimensional gene expression data as input and performs non-linear mapping to an imaging trait. To interpret the models, we propose gene masking and gene saliency to extract learned relationships from radiogenomic neural networks. In glioblastoma patients, our models outperformed comparable classifiers (>0.10 AUC) and our interpretation methods were validated using a similar model to identify known relationships between genes and molecular subtypes. We found that tumor imaging traits had specific transcription patterns, e.g. edema and genes related to cellular invasion, and 10 radiogenomic traits were significantly predictive of survival. We demonstrate that neural networks can model transcriptomic heterogeneity to reflect differences in imaging and can be used to derive radiogenomic traits with clinical value. Availability and implementation https://github.com/novasmedley/deepRadiogenomics. Contact whsu@mednet.ucla.edu Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins

https://doi.org/10.1093/bioinformatics/btz863

Zhang, Chengxin; Zheng, Wei; Mortuza, S M; Li, Yang; Zhang, Yang; Valencia, Alfonso (November 2019, Bioinformatics)

Abstract Motivation The success of genome sequencing techniques has resulted in rapid explosion of protein sequences. Collections of multiple homologous sequences can provide critical information to the modeling of structure and function of unknown proteins. There are however no standard and efficient pipeline available for sensitive multiple sequence alignment (MSA) collection. This is particularly challenging when large whole-genome and metagenome databases are involved. Results We developed DeepMSA, a new open-source method for sensitive MSA construction, which has homologous sequences and alignments created from multi-sources of whole-genome and metagenome databases through complementary hidden Markov model algorithms. The practical usefulness of the pipeline was examined in three large-scale benchmark experiments based on 614 non-redundant proteins. First, DeepMSA was utilized to generate MSAs for residue-level contact prediction by six coevolution and deep learning-based programs, which resulted in an accuracy increase in long-range contacts by up to 24.4% compared to the default programs. Next, multiple threading programs are performed for homologous structure identification, where the average TM-score of the template alignments has over 7.5% increases with the use of the new DeepMSA profiles. Finally, DeepMSA was used for secondary structure prediction and resulted in statistically significant improvements in the Q3 accuracy. It is noted that all these improvements were achieved without re-training the parameters and neural-network models, demonstrating the robustness and general usefulness of the DeepMSA in protein structural bioinformatics applications, especially for targets without homologous templates in the PDB library. Availability and implementation https://zhanglab.ccmb.med.umich.edu/DeepMSA/. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
Meltos: multi-sample tumor phylogeny reconstruction for structural variants

https://doi.org/10.1093/bioinformatics/btz737

Ricketts, Camir; Seidman, Daniel; Popic, Victoria; Hormozdiari, Fereydoun; Batzoglou, Serafim; Hajirasouliha, Iman; Valencia, Alfonso (October 2019, Bioinformatics)

Abstract Motivation We propose Meltos, a novel computational framework to address the challenging problem of building tumor phylogeny trees using somatic structural variants (SVs) among multiple samples. Meltos leverages the tumor phylogeny tree built on somatic single nucleotide variants (SNVs) to identify high confidence SVs and produce a comprehensive tumor lineage tree, using a novel optimization formulation. While we do not assume the evolutionary progression of SVs is necessarily the same as SNVs, we show that a tumor phylogeny tree using high-quality somatic SNVs can act as a guide for calling and assigning somatic SVs on a tree. Meltos utilizes multiple genomic read signals for potential SV breakpoints in whole genome sequencing data and proposes a probabilistic formulation for estimating variant allele fractions (VAFs) of SV events. Results In order to assess the ability of Meltos to correctly refine SNV trees with SV information, we tested Meltos on two simulated datasets with five genomes in both. We also assessed Meltos on two real cancer datasets. We tested Meltos on multiple samples from a liposarcoma tumor and on a multi-sample breast cancer data (Yates et al., 2015), where the authors provide validated structural variation events together with deep, targeted sequencing for a collection of somatic SNVs. We show Meltos has the ability to place high confidence validated SV calls on a refined tumor phylogeny tree. We also showed the flexibility of Meltos to either estimate VAFs directly from genomic data or to use copy number corrected estimates. Availability and implementation Meltos is available at https://github.com/ih-lab/Meltos. Contact imh2003@med.cornell.edu Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
iScore: A novel graph kernel-based function for scoring protein-protein docking models

https://doi.org/10.1093/bioinformatics/btz496

Geng, Cunliang; Jung, Yong; Renaud, Nicolas; Honavar, Vasant; Bonvin, Alexandre M; Xue, Li C; Valencia, Alfonso (June 2019, Bioinformatics)

Full Text Available

Search for: All records