skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Novel Ensemble Feature Selection Approach and Application in Repertoire Sequencing Data
The T and B cell repertoire make up the adaptive immune system and is mainly generated through somatic V(D)J gene recombination. Thus, the VJ gene usage may be a potential prognostic or predictive biomarker. However, analysis of the adaptive immune system is challenging due to the heterogeneity of the clonotypes that make up the repertoire. To address the heterogeneity of the T and B cell repertoire, we proposed a novel ensemble feature selection approach and customized statistical learning algorithm focusing on the VJ gene usage. We applied the proposed approach to T cell receptor sequences from recovered COVID-19 patients and healthy donors, as well as a group of lung cancer patients who received immunotherapy. Our approach identified distinct VJ genes used in the COVID-19 recovered patients comparing to the healthy donors and the VJ genes associated with the clinical response in the lung cancer patients. Simulation studies show that the ensemble feature selection approach outperformed other state-of-the-art feature selection methods based on both efficiency and accuracy. It consistently yielded higher stability and sensitivity with lower false discovery rates. When integrated with different classification methods, the ensemble feature selection approach had the best prediction accuracy. In conclusion, the proposed novel approach and the integration procedure is an effective feature selection technique to aid in correctly classifying different subtypes to better understand the signatures in the adaptive immune response associated with disease or the treatment in order to improve treatment strategies.  more » « less
Award ID(s):
2137983
PAR ID:
10659501
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
Frontiers
Date Published:
Journal Name:
Frontiers in Genetics
Volume:
13
ISSN:
1664-8021
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. T cells represent a crucial component of the adaptive immune system and mediate anti-tumoral immunity as well as protection against infections, including respiratory viruses such as SARS-CoV-2. Next-generation sequencing of the T-cell receptors (TCRs) can be used to profile the T-cell repertoire. We developed a customized pipeline for Network Analysis of Immune Repertoire (NAIR) with advanced statistical methods to characterize and investigate changes in the landscape of TCR sequences. We first performed network analysis on the TCR sequence data based on sequence similarity. We then quantified the repertoire network by network properties and correlated it with clinical outcomes of interest. In addition, we identified (1) disease-specific/associated clusters and (2) shared clusters across samples based on our customized search algorithms and assessed their relationship with clinical outcomes such as recovery from COVID-19 infection. Furthermore, to identify disease-specific TCRs, we introduced a new metric that incorporates the clonal generation probability and the clonal abundance by using the Bayes factor to filter out the false positives. TCR-seq data from COVID-19 subjects and healthy donors were used to illustrate that the proposed approach to analyzing the network architecture of the immune repertoire can reveal potential disease-specific TCRs responsible for the immune response to infection. 
    more » « less
  2. null (Ed.)
    Abstract Patients with chronic lung disease (CLD) have an increased risk for severe coronavirus disease-19 (COVID-19) and poor outcomes. Here, we analyze the transcriptomes of 611,398 single cells isolated from healthy and CLD lungs to identify molecular characteristics of lung cells that may account for worse COVID-19 outcomes in patients with chronic lung diseases. We observe a similar cellular distribution and relative expression of SARS-CoV-2 entry factors in control and CLD lungs. CLD AT2 cells express higher levels of genes linked directly to the efficiency of viral replication and the innate immune response. Additionally, we identify basal differences in inflammatory gene expression programs that highlight how CLD alters the inflammatory microenvironment encountered upon viral exposure to the peripheral lung. Our study indicates that CLD is accompanied by changes in cell-type-specific gene expression programs that prime the lung epithelium for and influence the innate and adaptive immune responses to SARS-CoV-2 infection. 
    more » « less
  3. Abstract The successful development and implementation of precision immuno-oncology therapies requires a deeper understanding of the immune architecture at a patient level. T-cell receptor (TCR) repertoire sequencing is a relatively new technology that enables monitoring of T-cells, a subset of immune cells that play a central role in modulating immune response. These immunologic relationships are complex and are governed by various distributional aspects of an individual patient's tumor profile. We propose Bayesian QUANTIle regression for hierarchical COvariates (QUANTICO) that allows simultaneous modeling of hierarchical relationships between multilevel covariates, conducts explicit variable selection, estimates quantile and patient-specific coefficient effects, to induce individualized inference. We show QUANTICO outperforms existing approaches in multiple simulation scenarios. We demonstrate the utility of QUANTICO to investigate the effect of TCR variables on immune response in a cohort of lung cancer patients. At population level, our analyses reveal the mechanistic role of T-cell proportion on the immune cell abundance, with tumor mutation burden as an important factor modulating this relationship. At a patient level, we find several outlier patients based on their quantile-specific coefficient functions, who have higher mutational rates and different smoking history. 
    more » « less
  4. null (Ed.)
    The recovery process of COVID-19 patients is unclear. Some recovered patients complain of continued shortness of breath. Vasculopathy has been reported in COVID-19, stressing the importance of probing pulmonary microstructure and function at the alveolar-capillary interface. While computed tomography (CT) detects structural abnormalities, little is known about the impact of disease on lung function. 129 Xe magnetic resonance imaging (MRI) is a technique uniquely capable of assessing ventilation, microstructure, and gas exchange. Using 129 Xe MRI, we found that COVID-19 patients show a higher rate of ventilation defects (5.9% versus 3.7%), unchanged microstructure, and longer gas-blood exchange time (43.5 ms versus 32.5 ms) compared with healthy individuals. These findings suggest that regional ventilation and alveolar airspace dimensions are relatively normal around the time of discharge, while gas-blood exchange function is diminished. This study establishes the feasibility of localized lung function measurements in COVID-19 patients and their potential usefulness as a supplement to structural imaging. 
    more » « less
  5. The ability to identify and track T-cell receptor (TCR) sequences from patient samples is becoming central to the field of cancer research and immunotherapy. Tracking genetically engineered T cells expressing TCRs that target specific tumor antigens is important to determine the persistence of these cells and quantify tumor responses. The available high-throughput method to profile TCR repertoires is generally referred to as TCR sequencing (TCR-Seq). However, the available TCR-Seq data are limited compared with RNA sequencing (RNA-Seq). In this paper, we have benchmarked the ability of RNA-Seq-based methods to profile TCR repertoires by examining 19 bulk RNA-Seq samples across 4 cancer cohorts including both T-cell-rich and T-cell-poor tissue types. We have performed a comprehensive evaluation of the existing RNA-Seq-based repertoire profiling methods using targeted TCR-Seq as the gold standard. We also highlighted scenarios under which the RNA-Seq approach is suitable and can provide comparable accuracy to the TCR-Seq approach. Our results show that RNA-Seq-based methods are able to effectively capture the clonotypes and estimate the diversity of TCR repertoires, as well as provide relative frequencies of clonotypes in T-cell-rich tissues and low-diversity repertoires. However, RNA-Seq-based TCR profiling methods have limited power in T-cell-poor tissues, especially in highly diverse repertoires of T-cell-poor tissues. The results of our benchmarking provide an additional appealing argument to incorporate RNA-Seq into the immune repertoire screening of cancer patients as it offers broader knowledge into the transcriptomic changes that exceed the limited information provided by TCR-Seq. 
    more » « less