skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Contextual AI models for single-cell protein biology
Abstract Understanding protein function and developing molecular therapies require deciphering the cell types in which proteins act as well as the interactions between proteins. However, modeling protein interactions across biological contexts remains challenging for existing algorithms. Here we introduce PINNACLE, a geometric deep learning approach that generates context-aware protein representations. Leveraging a multiorgan single-cell atlas,PINNACLElearns on contextualized protein interaction networks to produce 394,760 protein representations from 156 cell type contexts across 24 tissues.PINNACLE’s embedding space reflects cellular and tissue organization, enabling zero-shot retrieval of the tissue hierarchy. Pretrained protein representations can be adapted for downstream tasks: enhancing 3D structure-based representations for resolving immuno-oncological protein interactions, and investigating drugs’ effects across cell types.PINNACLEoutperforms state-of-the-art models in nominating therapeutic targets for rheumatoid arthritis and inflammatory bowel diseases and pinpoints cell type contexts with higher predictive capability than context-free models.PINNACLE’s ability to adjust its outputs on the basis of the context in which it operates paves the way for large-scale context-specific predictions in biology.  more » « less
Award ID(s):
2339524
PAR ID:
10572217
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
Nature
Date Published:
Journal Name:
Nature Methods
Volume:
21
Issue:
8
ISSN:
1548-7091
Page Range / eLocation ID:
1546 to 1557
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary Quantitative information on the spatiotemporal distribution of polarised proteins is central for understanding cell‐fate determination, yet collecting sufficient data for statistical analysis is difficult to accomplish with manual measurements.Here we present Polarity Measurement (Pome), a semi‐automated pipeline for the quantification of cell polarity and demonstrate its application to a variety of developmental contexts.Pomeanalysis reveals that, during asymmetric cell divisions in theArabidopsis thalianastomatal lineage, polarity proteins BASL and BRXL2 are more asynchronous and less mutually dependent than previously thought. A similar analysis of the linearly arrayed stomatal lineage ofBrachypodium distachyonrevealed that the MAPKKK BdYDA1 is segregated and polarised following asymmetrical divisions.Our results demonstrate that Pomeis a versatile tool, which by itself or combined with tissue‐level studies and advanced microscopy techniques can help to uncover new mechanisms of cell polarity. 
    more » « less
  2. Abstract Structural information of protein–protein interactions is essential for characterization of life processes at the molecular level. While a small fraction of known protein interactions has experimentally determined structures, computational modeling of protein complexes (protein docking) has to fill the gap. TheDockgroundresource (http://dockground.compbio.ku.edu) provides a collection of datasets for the development and testing of protein docking techniques. Currently,Dockgroundcontains datasets for the bound and the unbound (experimentally determined and simulated) protein structures, model–model complexes, docking decoys of experimentally determined and modeled proteins, and templates for comparative docking. TheDockgroundbound proteins dataset is a core set, from which otherDockgrounddatasets are generated. It is devised as a relational PostgreSQL database containing information on experimentally determined protein–protein complexes. This report on theDockgroundresource describes current status of the datasets, new automated update procedures and further development of the core datasets. We also present a newDockgroundinteractive web interface, which allows search by various parameters, such as release date, multimeric state, complex type, structure resolution, and so on, visualization of the search results with a number of customizable parameters, as well as downloadable datasets with predefined levels of sequence and structure redundancy. 
    more » « less
  3. Abstract To understand phenotypic variations and key factors which affect disease susceptibility of complex traits, it is important to decipher cell‐type tissue compositions. To study cellular compositions of bulk tissue samples, one can evaluate cellular abundances and cell‐type‐specific gene expression patterns from the tissue transcriptome profiles. We develop both fixed and mixed models to reconstruct cellular expression fractions for bulk‐profiled samples by using reference single‐cell (sc) RNA‐sequencing (RNA‐seq) reference data. In benchmark evaluations of estimating cellular expression fractions, the mixed‐effect models provide similar results as an elegant machine learning algorithm named cell‐type identification by estimating relative subsets of RNA transcripts (CIBERSORTx), which is a well‐known and reliable procedure to reconstruct cell‐type abundances and cell‐type‐specific gene expression profiles. In real data analysis, the mixed‐effect models outperform or perform similarly as CIBERSORTx. The mixed models perform better than the fixed models in both benchmark evaluations and data analysis. In simulation studies, we show that if the heterogeneity exists in scRNA‐seq data, it is better to use mixed models with heterogeneous mean and variance–covariance. As a byproduct, the mixed models provide fractions of covariance between subject‐specific gene expression and cell types to measure their correlations. The proposed mixed models provide a complementary tool to dissect bulk tissues using scRNA‐seq data. 
    more » « less
  4. Summary The non-muscle actomyosin cytoskeleton generates contractile force through the dynamic rearrangement of its constituent parts. Actomyosin rings are a specialization of the non-muscle actomyosin cytoskeleton that drive cell shape changes during division, wound healing, and other events. Contractile rings throughout phylogeny and in a range of cellular contexts are built from conserved components including non-muscle myosin II (NMMII), actin filaments (F-actin), and crosslinking proteins. However, it is unknown whether diverse actomyosin rings close via a single unifying mechanism. To explore how contractile forces are generated by actomyosin rings, we studied three instances of ring closure within the common cytoplasm of theC. elegansoogenic germline: mitotic cytokinesis of germline stem cells (GSCs), apoptosis of meiotic compartments, and cellularization of oocytes. We found that each ring type closed with unique kinetics, protein density and abundance dynamics. These measurements suggested that the mechanism of contractile force generation varied across the subcellular contexts. Next, we formulated a physical model that related the forces generated by filament-filament interactions to the material properties of these rings that dictate the kinetics of their closure. Using this framework, we related the density of conserved cytoskeletal proteins anillin and NMMII to the kinematics of ring closure. We fitted model rings to in situ measurements to estimate parameters that are currently experimentally inaccessible, such as the asymmetric distribution of protein along the length of F-actin, which occurs naturally due to differences in the dimensions of the crosslinker and NMMII filaments. Our work predicted that the role of NMMII varies across these ring types, due in part to its distribution along F-actin and motoring. Our model also predicted that the degree of contractility and the impact of ring material properties on contractility differs among ring types. 
    more » « less
  5. Abstract The auxin-inducible degradation system has been widely adopted in the Caenorhabditis elegans research community for its ability to empirically control the spatiotemporal expression of target proteins. This system can efficiently degrade auxin-inducible degron (AID)-tagged proteins via the expression of a ligand-activatable AtTIR1 protein derived from A. thaliana that adapts target proteins to the endogenous C. elegans proteasome. While broad expression of AtTIR1 using strong, ubiquitous promoters can lead to rapid degradation of AID-tagged proteins, cell type-specific expression of AtTIR1 using spatially restricted promoters often results in less efficient target protein degradation. To circumvent this limitation, we have developed an FLP/FRT3-based system that functions to reanimate a dormant, high-powered promoter that can drive sufficient AtTIR1 expression in a cell type-specific manner. We benchmark the utility of this system by generating a number of tissue-specific FLP-ON::TIR1 drivers to reveal genetically separable cell type-specific phenotypes for several target proteins. We also demonstrate that the FLP-ON::TIR1 system is compatible with enhanced degron epitopes. Finally, we provide an expandable toolkit utilizing the basic FLP-ON::TIR1 system that can be adapted to drive optimized AtTIR1 expression in any tissue or cell type of interest. 
    more » « less