skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Friday, February 6 until 10:00 AM ET on Saturday, February 7 due to maintenance. We apologize for the inconvenience.


Title: Novel Ensemble Feature Selection Approach and Application in Repertoire Sequencing Data
The T and B cell repertoire make up the adaptive immune system and is mainly generated through somatic V(D)J gene recombination. Thus, the VJ gene usage may be a potential prognostic or predictive biomarker. However, analysis of the adaptive immune system is challenging due to the heterogeneity of the clonotypes that make up the repertoire. To address the heterogeneity of the T and B cell repertoire, we proposed a novel ensemble feature selection approach and customized statistical learning algorithm focusing on the VJ gene usage. We applied the proposed approach to T cell receptor sequences from recovered COVID-19 patients and healthy donors, as well as a group of lung cancer patients who received immunotherapy. Our approach identified distinct VJ genes used in the COVID-19 recovered patients comparing to the healthy donors and the VJ genes associated with the clinical response in the lung cancer patients. Simulation studies show that the ensemble feature selection approach outperformed other state-of-the-art feature selection methods based on both efficiency and accuracy. It consistently yielded higher stability and sensitivity with lower false discovery rates. When integrated with different classification methods, the ensemble feature selection approach had the best prediction accuracy. In conclusion, the proposed novel approach and the integration procedure is an effective feature selection technique to aid in correctly classifying different subtypes to better understand the signatures in the adaptive immune response associated with disease or the treatment in order to improve treatment strategies.  more » « less
Award ID(s):
2137983
PAR ID:
10659501
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
Frontiers
Date Published:
Journal Name:
Frontiers in Genetics
Volume:
13
ISSN:
1664-8021
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. T cells represent a crucial component of the adaptive immune system and mediate anti-tumoral immunity as well as protection against infections, including respiratory viruses such as SARS-CoV-2. Next-generation sequencing of the T-cell receptors (TCRs) can be used to profile the T-cell repertoire. We developed a customized pipeline for Network Analysis of Immune Repertoire (NAIR) with advanced statistical methods to characterize and investigate changes in the landscape of TCR sequences. We first performed network analysis on the TCR sequence data based on sequence similarity. We then quantified the repertoire network by network properties and correlated it with clinical outcomes of interest. In addition, we identified (1) disease-specific/associated clusters and (2) shared clusters across samples based on our customized search algorithms and assessed their relationship with clinical outcomes such as recovery from COVID-19 infection. Furthermore, to identify disease-specific TCRs, we introduced a new metric that incorporates the clonal generation probability and the clonal abundance by using the Bayes factor to filter out the false positives. TCR-seq data from COVID-19 subjects and healthy donors were used to illustrate that the proposed approach to analyzing the network architecture of the immune repertoire can reveal potential disease-specific TCRs responsible for the immune response to infection. 
    more » « less
  2. null (Ed.)
    Abstract Patients with chronic lung disease (CLD) have an increased risk for severe coronavirus disease-19 (COVID-19) and poor outcomes. Here, we analyze the transcriptomes of 611,398 single cells isolated from healthy and CLD lungs to identify molecular characteristics of lung cells that may account for worse COVID-19 outcomes in patients with chronic lung diseases. We observe a similar cellular distribution and relative expression of SARS-CoV-2 entry factors in control and CLD lungs. CLD AT2 cells express higher levels of genes linked directly to the efficiency of viral replication and the innate immune response. Additionally, we identify basal differences in inflammatory gene expression programs that highlight how CLD alters the inflammatory microenvironment encountered upon viral exposure to the peripheral lung. Our study indicates that CLD is accompanied by changes in cell-type-specific gene expression programs that prime the lung epithelium for and influence the innate and adaptive immune responses to SARS-CoV-2 infection. 
    more » « less
  3. null (Ed.)
    The recovery process of COVID-19 patients is unclear. Some recovered patients complain of continued shortness of breath. Vasculopathy has been reported in COVID-19, stressing the importance of probing pulmonary microstructure and function at the alveolar-capillary interface. While computed tomography (CT) detects structural abnormalities, little is known about the impact of disease on lung function. 129 Xe magnetic resonance imaging (MRI) is a technique uniquely capable of assessing ventilation, microstructure, and gas exchange. Using 129 Xe MRI, we found that COVID-19 patients show a higher rate of ventilation defects (5.9% versus 3.7%), unchanged microstructure, and longer gas-blood exchange time (43.5 ms versus 32.5 ms) compared with healthy individuals. These findings suggest that regional ventilation and alveolar airspace dimensions are relatively normal around the time of discharge, while gas-blood exchange function is diminished. This study establishes the feasibility of localized lung function measurements in COVID-19 patients and their potential usefulness as a supplement to structural imaging. 
    more » « less
  4. The ability to identify and track T-cell receptor (TCR) sequences from patient samples is becoming central to the field of cancer research and immunotherapy. Tracking genetically engineered T cells expressing TCRs that target specific tumor antigens is important to determine the persistence of these cells and quantify tumor responses. The available high-throughput method to profile TCR repertoires is generally referred to as TCR sequencing (TCR-Seq). However, the available TCR-Seq data are limited compared with RNA sequencing (RNA-Seq). In this paper, we have benchmarked the ability of RNA-Seq-based methods to profile TCR repertoires by examining 19 bulk RNA-Seq samples across 4 cancer cohorts including both T-cell-rich and T-cell-poor tissue types. We have performed a comprehensive evaluation of the existing RNA-Seq-based repertoire profiling methods using targeted TCR-Seq as the gold standard. We also highlighted scenarios under which the RNA-Seq approach is suitable and can provide comparable accuracy to the TCR-Seq approach. Our results show that RNA-Seq-based methods are able to effectively capture the clonotypes and estimate the diversity of TCR repertoires, as well as provide relative frequencies of clonotypes in T-cell-rich tissues and low-diversity repertoires. However, RNA-Seq-based TCR profiling methods have limited power in T-cell-poor tissues, especially in highly diverse repertoires of T-cell-poor tissues. The results of our benchmarking provide an additional appealing argument to incorporate RNA-Seq into the immune repertoire screening of cancer patients as it offers broader knowledge into the transcriptomic changes that exceed the limited information provided by TCR-Seq. 
    more » « less
  5. Over the last decade, both early diagnosis and targeted therapy have improved the survival rates of many cancer patients. Most recently, immunotherapy has revolutionized the treatment options for cancers such as melanoma. Unfortunately, a significant portion of cancers (including lung and breast cancers) do not respond to immunotherapy, and many of them develop resistance to chemotherapy. Molecular characterization of non-responsive cancers suggest that an embryonic program known as epithelial-mesenchymal transition (EMT), which is mostly latent in adults, can be activated under selective pressures, rendering these cancers resistant to chemo- and immunotherapies. EMT can also drive tumor metastases, which in turn also suppress the cancer-fighting activity of cytotoxic T cells that traffic into the tumor, causing immunotherapy to fail. In this review, we compare and contrast immunotherapy treatment options of non-small cell lung cancer (NSCLC) and triple negative breast cancer (TNBC). We discuss why, despite breakthrough progress in immunotherapy, attaining predictable outcomes in the clinic is mostly an unsolved problem for these tumors. Although these two cancer types appear different based upon their tissues of origin and molecular classification, gene expression indicate that they possess many similarities. Patient tumors exhibit activation of EMT, and resulting stem cell properties in both these cancer types associate with metastasis and resistance to existing cancer therapies. In addition, the EMT transition in both these cancers plays a crucial role in immunosuppression, which exacerbates treatment resistance. To improve cancer-related survival we need to understand and circumvent, the mechanisms through which these tumors become therapy resistant. In this review, we discuss new information and complementary perspectives to inform combination treatment strategies to expand and improve the anti-tumor responses of currently available clinical immune checkpoint inhibitors. 
    more » « less