skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 31, 2026

Title: Surrogate selection oversamples expanded T cell clonotypes
Surrogate selection is an experimental design that without sequencing any DNA can restrict a sample of cells to those carrying certain genomic mutations. In immunological disease studies, this design may provide a relatively easy approach to enrich a lymphocyte sample with cells relevant to the disease response because the emergence of neutral mutations associates with the proliferation history of clonal subpopulations. A statistical analysis of clonotype sizes provides a structured, quantitative perspective on this useful property of surrogate selection. Our model specification couples within-clonotype birth-death processes with an exchangeable model across clonotypes. Beyond enrichment questions about the surrogate selection design, our framework enables a study of sampling properties of elementary sample diversity statistics; it also points to new statistics that may usefully measure the burden of somatic genomic alterations associated with clonal expansion. We examine statistical properties of immunological samples governed by the coupled model specification, and we illustrate calculations in surrogate selection studies of melanoma and in single-cell genomic studies of T cell repertoires.  more » « less
Award ID(s):
2023239
PAR ID:
10625761
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
The Institute of Mathematical Statistics
Date Published:
Journal Name:
The annals of applied statistics
ISSN:
1932-6157
Subject(s) / Keyword(s):
Bayes’s rule clonal expansion diversity statistic enrichment exchangeable birth-death processes experimental design single cell sequencing size bias somatic mutation
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. An organism becomes genetically mosaic through the accumulation of somatic mutations. Genetic mosaicism is a commonality of multicellular life and has been studied extensively in humans due to its associations with aging and diseases. In humans, somatic selection shapes the accumulation of somatic mutations, with strong signatures of positive somatic selection in cancer cell lineages. So far, evidence for somatic selection in plants has been inconsistent. The evolutionary implications of genetic mosaicism in humans and other animals are limited by early specification of germline cells, preventing transmission of somatic mutations to progeny. In contrast, many plant lineages reproduce asexually with clonal progeny derived from vegetative tissues. We describe the patterns and processes shaping somatic mutation accumulation within a single, 149-year-old historic sweet orange (Citrus sinensis) tree and within a clonal lineage of sweet orange. More than 12,000 somatic mutations were identified in the historic tree and 28,000 somatic mutations were identified across 199 clonally related sweet orange accessions. Both the spatial and genomic distributions of somatic mutations are non-random. The spatial patterns of somatic mutations across the historic tree depend on tree growth and development and their accumulation across the tree canopy recapitulates branching topology. Analysis of the genomic distribution of somatic mutations revealed that the subtelomeres, which are large arrays of ~180 bp repeats, are mutation hotspots. Finally, there was genomic evidence that somatic selection shapes the accumulation of somatic mutations both within the historic tree and also during clonal propagation. 
    more » « less
  2. Abstract The evolutionary transition to multicellularity requires shifting the primary unit of selection from cells to multicellular collectives. How this occurs in aggregative organisms remains poorly understood. Clonal development provides a direct path to multicellular adaptation through genetic identity between cells, but aggregative organisms face a constraint: selection on collective-level traits cannot drive adaptation without positive genetic assortment. We leveraged experimental evolution of flocculatingSaccharomyces cerevisiaeto examine the evolution and role of genetic assortment in multicellular adaptation. After 840 generations of selection for rapid settling, 13 of 19 lineages evolved increased positive assortment relative to their ancestor. However, assortment provided no competitive advantage during settling selection, suggesting it arose as an indirect effect of selection on cell-level traits rather than through direct selection on collective-level properties. Genetic reconstruction experiments and protein structure modeling revealed two distinct pathways to assortment: kin recognition mediated by mutations in theFLO1adhesion gene and generally enhanced cellular adhesion that improved flocculation efficiency independent of partner genotype. The evolution of assortment without immediate adaptive benefit suggests that key innovations enabling multicellular adaptation may arise indirectly through cell-level selection. Our results demonstrate fundamental constraints on aggregative multicellularity and help explain why aggregative lineages have remained simple. 
    more » « less
  3. T cells represent a crucial component of the adaptive immune system and mediate anti-tumoral immunity as well as protection against infections, including respiratory viruses such as SARS-CoV-2. Next-generation sequencing of the T-cell receptors (TCRs) can be used to profile the T-cell repertoire. We developed a customized pipeline for Network Analysis of Immune Repertoire (NAIR) with advanced statistical methods to characterize and investigate changes in the landscape of TCR sequences. We first performed network analysis on the TCR sequence data based on sequence similarity. We then quantified the repertoire network by network properties and correlated it with clinical outcomes of interest. In addition, we identified (1) disease-specific/associated clusters and (2) shared clusters across samples based on our customized search algorithms and assessed their relationship with clinical outcomes such as recovery from COVID-19 infection. Furthermore, to identify disease-specific TCRs, we introduced a new metric that incorporates the clonal generation probability and the clonal abundance by using the Bayes factor to filter out the false positives. TCR-seq data from COVID-19 subjects and healthy donors were used to illustrate that the proposed approach to analyzing the network architecture of the immune repertoire can reveal potential disease-specific TCRs responsible for the immune response to infection. 
    more » « less
  4. When students think of evolution, they might imagine T. rex, or perhaps an abiotic scene of sizzling electrical storms and harsh reducing atmospheres, an Earth that looks like a lunar landscape. Natural selection automatically elicits responses that include “survival of the fittest,” and “descent with modification,” and with these historical biological catch phrases, one conjures up images of large animals battling it out on the Mesozoic plane. Rarely do teachers or students apply these same ideas to cancer and the evolution of somatic cells, which have accrued mutations and epigenetic imprinting and relentlessly survive and proliferate. Our questions in this paper include the following: Can cancer become an important teaching model for students to explore fundamental hypotheses about evolutionary process? Can the multi- step somatic cancer model encourage visualizations that enable students to revisit and reenter previous primary concepts in general biology such as the cell, mitosis, chromosomes, genetic diversity, ecological diversity, immune function, and of course evolution, continually integrating their biology knowledge into process and pattern knowledge? Can the somatic cancer model expose similar patterns and protagonists, linking Darwinian observations of the natural world to our body? And, can the cancer clone model excite critical thinking and student hypotheses about what cancer is as a biological process? Does this visually simple model assist students in recognizing patterns, connecting their biological curriculum dots into a more coherent learning experience? These biological dynamics and intercepting aptitudes of cells are amplified through the cancer model and can help shape the way biology students begin to appreciate the interrelatedness of all biological systems while they continue to explore pivotal points of biological fuzziness, such as the microbiome, limitations of models, and the complex coordination of genomic networks required for the function of even a single cell and the realization of phenotypes. In this paper we use clonal evolution of cancer as a model experience for students to recreate how a single, non-germline cell appears to shadow the classic pattern of natural selection in body cells that have gone awry. With authentic STEAM activities students can easily crossover and revisit previous biological topics and the ubiquitous nature of natural selection as seen in the example of somatic cells that result in a metastasizing tumor, giving students insight into natural selection’s accommodating and tractable patterns throughout the planet. 
    more » « less
  5. Affinity maturation (AM) of B cells through somatic hypermutations (SHMs) enables the immune system to evolve to recognize diverse pathogens. The accumulation of SHMs leads to the formation of clonal lineages of antibody-secreting b cells that have evolved from a common naïve B cell. Advances in high-throughput sequencing have enabled deep scans of B cell receptor repertoires, paving the way for reconstructing clonal trees. However, it is not clear if clonal trees, which capture microevolutionary time scales, can be reconstructed using traditional phylogenetic reconstruction methods with adequate accuracy. In fact, several clonal tree reconstruction methods have been developed to fix supposed shortcomings of phylogenetic methods. Nevertheless, no consensus has been reached regarding the relative accuracy of these methods, partially because evaluation is challenging. Benchmarking the performance of existing methods and developing better methods would both benefit from realistic models of clonal lineage evolution specifically designed for emulating B cell evolution. In this paper, we propose a model for modeling B cell clonal lineage evolution and use this model to benchmark several existing clonal tree reconstruction methods. Our model, designed to be extensible, has several features: by evolving the clonal tree and sequences simultaneously, it allows modeling selective pressure due to changes in affinity binding; it enables scalable simulations of large numbers of cells; it enables several rounds of infection by an evolving pathogen; and, it models building of memory. In addition, we also suggest a set of metrics for comparing clonal trees and measuring their properties. Our results show that while maximum likelihood phylogenetic reconstruction methods can fail to capture key features of clonal tree expansion if applied naively, a simple post-processing of their results, where short branches are contracted, leads to inferences that are better than alternative methods. 
    more » « less