skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on April 14, 2026

Title: IgStrand: A universal residue numbering scheme for the immunoglobulin-fold (Ig-fold) to study Ig-proteomes and Ig-interactomes
The Immunoglobulin fold (Ig-fold) is found in proteins from all domains of life and represents the most populous fold in the human genome, with current estimates ranging from 2 to 3% of protein coding regions. That proportion is much higher in the surfaceome where Ig and Ig-like domains orchestrate cell-cell recognition, adhesion and signaling. The ability of Ig-domains to reliably fold and self-assemble through highly specific interfaces represents a remarkable property of these domains, making them key elements of molecular interaction systems: the immune system, the nervous system, the vascular system and the muscular system. We define a universal residue numbering scheme, common to all domains sharing the Ig-fold in order to study the wide spectrum of Ig-domain variants constituting the Ig-proteome and Ig-Ig interactomes at the heart of thesesystems. The “IgStrand numbering scheme” enables the identification of Ig structural proteomes and interactomes in and between any species, and comparative structural, functional, and evolutionary analyses. We review how Ig-domains are classified today as topological and structural variants and highlight the“Ig-fold irreducible structural signature”shared by all of them. The IgStrand numbering scheme lays the foundation for the systematic annotation of structural proteomes by detecting and accurately labeling Ig-, Ig-like and Ig-extended domains in proteins, which are poorly annotated in current databases and opens the door to accurate machine learning. Importantly, it sheds light on the robustIg protein folding algorithmused by nature to form beta sandwich supersecondary structures. The numbering scheme powers an algorithm implemented in the interactive structural analysis software iCn3D to systematically recognize Ig-domains, annotate them and perform detailed analyses comparing any domain sharing the Ig-fold in sequence, topology and structure, regardless of their diverse topologies or origin. The scheme provides a robust fold detection and labeling mechanism that reveals unsuspected structural homologies among protein structures beyond currently identified Ig- and Ig-like domain variants. Indeed, multiple folds classified independently contain a common structural signature, in particular jelly-rolls. Examples of folds that harbor an “Ig-extended” architecture are given. Applications in protein engineering around the Ig-architecture are straightforward based on the universal numbering.  more » « less
Award ID(s):
2346274
PAR ID:
10594315
Author(s) / Creator(s):
; ; ; ; ; ;
Editor(s):
Friedberg, Iddo
Publisher / Repository:
PLoS
Date Published:
Journal Name:
PLOS Computational Biology
Volume:
21
Issue:
4
ISSN:
1553-7358
Page Range / eLocation ID:
e1012813
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract We examine the influence of cellular interactions in all‐atom models of a section of theHomo sapienscytoplasm on the early folding events of the three‐helix bundle protein B (PB). While genetically engineered PB is known to fold in dilute water box simulations in three microseconds, the three initially unfolded PB copies in our two cytoplasm models using a similar force field did not reach the native state during 30‐microsecond simulations. We did however capture the formation of all three helices in a compact native‐like topology. Folding in vivo is delayed because intramolecular contact formation within PB is in direct competition with intermolecular contacts between PB and surrounding macromolecules. In extreme cases, intermolecular beta‐sheets are formed. Interactions with other macromolecules are also observed to promote structure formation, for example when a PB helix in our simulations is shielded from solvent by macromolecular crowding. Sticking and crowding in our models initiate sampling of helix/sheet structural plasticity of PB. Relatedly, in past in vitro experiments, similar GA domains were shown to switch between two different folds. Finally, we also observed that stickiness between PB and the cellular environment can be modulated in our simulations through the reduction in protein hydrophobicity when we reversed PB back to the wild‐type sequence. This study demonstrates that even fast‐folding proteins can get stuck in non‐native states in the cell, making them useful models for protein–chaperone interactions and early stages of aggregate formation relevant to cellular disease. 
    more » « less
  2. Dunbrack, Roland L (Ed.)
    Protein structure prediction has now been deployed widely across several different large protein sets. Large-scale domain annotation of these predictions can aid in the development of biological insights. Using our Evolutionary Classification of Protein Domains (ECOD) from experimental structures as a basis for classification, we describe the detection and cataloging of domains from 48 whole proteomes deposited in the AlphaFold Database. On average, we can provide positive classification (either of domains or other identifiable non-domain regions) for 90% of residues in all proteomes. We classified 746,349 domains from 536,808 proteins comprised of over 226,424,000 amino acid residues. We examine the varying populations of homologous groups in both eukaryotes and bacteria. In addition to containing a higher fraction of disordered regions and unassigned domains, eukaryotes show a higher proportion of repeated proteins, both globular and small repeats. We enumerate those highly populated domains that are shared in both eukaryotes and bacteria, such as the Rossmann domains, TIM barrels, and P-loop domains. Additionally, we compare the sampling of homologous groups from this whole proteome set against our stable ECOD reference and discuss groups that have been enriched by structure predictions. Finally, we discuss the implication of these results for protein target selection for future classification strategies for very large protein sets. 
    more » « less
  3. With the growth of the PDB and simultaneous slowing of the discovery of new protein folds, we may be able to answer the question of how discrete protein fold space is. Studies by Skolnick et al. (PNAS, 106, 15690, 2009) have concluded that it is in fact continuous. In the present work we extend our initial observation (PNAS, 106(51) E137, 2009) that this conclusion depends upon the resolution with which structures are considered, making the determination of what resolution is most useful of importance. We utilize graph theoretical approaches to investigate the connectedness of the protein structure universe, showing that the modularity of protein domain architecture is of fundamental importance for future improvements in structure matching, impacting our understanding of protein domain evolution and modification. We show that state-of-the-art structure superimposition algorithms are unable to distinguish between conformational and topological variation. This work is not only important for our understanding of the discreteness of protein fold space, but informs the more critical question of what precisely should be spatially aligned in structure superimposition. The metric-dependence is also investigated leading to the conclusion that fold usage in homology reduced datasets is very similar to usage across all of PDB and should not be ignored in large scale studies of protein structure similarity. 
    more » « less
  4. Recent advances in protein structure prediction have generated accurate structures of previously uncharacterized human proteins. Identifying domains in these predicted structures and classifying them into an evolutionary hierarchy can reveal biological insights. Here, we describe the detection and classification of domains from the human proteome. Our classification indicates that only 62% of residues are located in globular domains. We further classify these globular domains and observe that the majority (65%) can be classified among known folds by sequence, with a smaller fraction (33%) requiring structural data to refine the domain boundaries and/or to support their homology. A relatively small number (966 domains) cannot be confidently assigned using our automatic pipelines, thus demanding manual inspection. We classify 47,576 domains, of which only 23% have been included in experimental structures. A portion (6.3%) of these classified globular domains lack sequence-based annotation in InterPro. A quarter (23%) have not been structurally modeled by homology, and they contain 2,540 known disease-causing single amino acid variations whose pathogenesis can now be inferred using AF models. A comparison of classified domains from a series of model organisms revealed expansions of several immune response-related domains in humans and a depletion of olfactory receptors. Finally, we use this classification to expand well-known protein families of biological significance. These classifications are presented on the ECOD website ( http://prodata.swmed.edu/ecod/index_human.php ). 
    more » « less
  5. King, Nicole (Ed.)
    The discovery that sponges (Porifera) can fully regenerate from aggregates of dissociated cells launched them as one of the earliest experimental models to study the evolution of cell adhesion and allorecognition in animals. This process depends on an extracellular glycoprotein complex called the Aggregation Factor (AF), which is composed of proteins thought to be unique to sponges. We used quantitative proteomics to identify additional AF components and interacting proteins in the classical model,Clathria prolifera, and compared them to proteins involved in cell interactions in Bilateria. Our results confirm MAFp3/p4 proteins as the primary components of the AF but implicate related proteins with calx-beta and wreath domains as additional components. Using AlphaFold, we unveiled close structural similarities of AF components to protein domains in other animals, previously masked by the mutational decay of sequence similarity. The wreath domain, believed to be unique to the AF, was predicted to contain a central beta-sandwich of the same organization as the vWFD domain (also found in extracellular, gel-forming glycoproteins in other animals). Additionally, many copurified proteins share a conserved C-terminus, containing divergent immunoglobulin (Ig) and Fn3 domains predicted to serve as an AF–interaction interface. One of these proteins, MAF-associated protein 1, resembles Ig superfamily cell adhesion molecules and we hypothesize that it may function to link the AF to the surface of cells. Our results highlight the existence of an ancient toolkit of conserved protein domains regulating cell–cell and cell–extracellular matrix protein interactions in all animals, and likely reflect a common origin of cell adhesion and allorecognition. 
    more » « less