skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Data Availability of Open T-Cell Receptor Repertoire Data, a Systematic Assessment
Modern data-driven research has the power to promote novel biomedical discoveries through secondary analyses of raw data. Therefore, it is important to ensure data-driven research with great reproducibility and robustness for promoting a precise and accurate secondary analysis of the immunogenomics data. In scientific research, rigorous conduct in designing and conducting experiments is needed, specifically in scientific writing and reporting results. It is also crucial to make raw data available, discoverable, and well described or annotated in order to promote future re-analysis of the data. In order to assess the data availability of published T cell receptor (TCR) repertoire data, we examined 11,918 TCR-Seq samples corresponding to 134 TCR-Seq studies ranging from 2006 to 2022. Among the 134 studies, only 38.1% had publicly available raw TCR-Seq data shared in public repositories. We also found a statistically significant association between the presence of data availability statements and the increase in raw data availability ( p = 0.014). Yet, 46.8% of studies with data availability statements failed to share the raw TCR-Seq data. There is a pressing need for the biomedical community to increase awareness of the importance of promoting raw data availability in scientific research and take immediate action to improve its raw data availability enabling cost-effective secondary analysis of existing immunogenomics data by the larger scientific community.  more » « less
Award ID(s):
2135954 2041984
PAR ID:
10362744
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Frontiers in Systems Biology
Volume:
2
ISSN:
2674-0702
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. T cell receptor (TCR) studies have grown substantially with the advancement in the sequencing techniques of T cell receptor repertoire sequencing (TCR-Seq). The analysis of the TCR-Seq data requires computational skills to run the computational analysis of TCR repertoire tools. However biomedical researchers with limited computational backgrounds face numerous obstacles to properly and efficiently utilizing bioinformatics tools for analyzing TCR-Seq data. Here we report pyTCR, a computational notebook-based solution for comprehensive and scalable TCR-Seq data analysis. Computational notebooks, which combine code, calculations, and visualization, are able to provide users with a high level of flexibility and transparency for the analysis. Additionally, computational notebooks are demonstrated to be user-friendly and suitable for researchers with limited computational skills. Our tool has a rich set of functionalities including various TCR metrics, statistical analysis, and customizable visualizations. The application of pyTCR on large and diverse TCR-Seq datasets will enable the effective analysis of large-scale TCR-Seq data with flexibility, and eventually facilitate new discoveries. 
    more » « less
  2. RNA sequencing (RNA-seq) has become an exemplary technology in modern biology and clinical science. Its immense popularity is due in large part to the continuous efforts of the bioinformatics community to develop accurate and scalable computational tools to analyze the enormous amounts of transcriptomic data that it produces. RNA-seq analysis enables genes and their corresponding transcripts to be probed for a variety of purposes, such as detecting novel exons or whole transcripts, assessing expression of genes and alternative transcripts, and studying alternative splicing structure. It can be a challenge, however, to obtain meaningful biological signals from raw RNA-seq data because of the enormous scale of the data as well as the inherent limitations of different sequencing technologies, such as amplification bias or biases of library preparation . The need to overcome these technical challenges has pushed the rapid development of novel computational tools, which have evolved and diversified in accordance with technological advancements, leading to the current myriad of RNA-seq tools. These tools, combined with the diverse computational skill sets of biomedical researchers, help to unlock the full potential of RNA-seq. The purpose of this review is to explain basic concepts in the computational analysis of RNA-seq data and define discipline-specific jargon. 
    more » « less
  3. RNA sequencing (RNA-seq) has emerged as a prominent resource for transcriptomic analysis due to its ability to measure gene expression in a highly sensitive and accurate manner. With the increasing availability of RNA-seq data analysis from clinical studies and patient samples, the development of effective visualization tools for RNA-seq analysis has become increasingly important to help clinicians and biomedical researchers better understand the complex patterns of gene expression associated with health and disease. This review aims to outline the current state-of-the-art data visualization techniques and tools commonly used to frame clinical inferences from RNA-seq data and point out their benefits, applications, and limitations. A systematic review of English articles using PubMed, Scopus, Web of Science, and IEEE Xplore databases was performed. Search terms included “RNA-seq”, “visualization”, “plots”, and “clinical”. Only full-text studies reported between 2017 and 2024 were included for analysis. Following PRISMA guidelines, a total of 126 studies were identified, of which 33 studies met the inclusion criteria. We found that 18% of studies have visualization techniques and tools for circular RNA-seq data, 56% for single-cell RNA-seq data, 23% for bulk RNA-seq data, and 3% for long non-coding RNA-seq data. Overall, this review provides a comprehensive overview of the common visualization tools and their potential applications, which is a useful resource for researchers and clinicians interested in using RNA-seq data for various clinical purposes (e.g., diagnosis or prognosis). 
    more » « less
  4. null (Ed.)
    Biomedical research results in the collection and storage of increasingly large and complex data sets. Preserving those data so that they are discoverable, accessible, and interpretable accelerates scientific discovery and improves health outcomes, but requires that researchers, data curators, and data archivists consider the long-term disposition of data and the costs of preserving, archiving, and promoting access to them. Life Cycle Decisions for Biomedical Data examines and assesses approaches and considerations for forecasting costs for preserving, archiving, and promoting access to biomedical research data. This report provides a comprehensive conceptual framework for cost-effective decision making that encourages data accessibility and reuse for researchers, data managers, data archivists, data scientists, and institutions that support platforms that enable biomedical research data preservation, discoverability, and use. 
    more » « less
  5. In this paper, we introduce a creative pipeline to incorporate physiological and behavioral data from contemporary marine mammal research into data-driven animations, leveraging functionality from industry tools and custom scripts to promote scientific insights, public awareness, and conservation outcomes. Our framework can flexibly transform data describing animals’ orientation, position, heart rate, and swimming stroke rate to control the position, rotation, and behavior of 3D models, to render animations, and to drive data sonification. Additionally, we explore the challenges of unifying disparate datasets gathered by an interdisciplinary team of researchers, and outline our design process for creating meaningful data visualization tools and animations. As part of our pipeline, we clean and process raw acceleration and electrophysiological signals to expedite complex multi-stream data analysis and the identification of critical foraging and escape behaviors. We provide details about four animation projects illustrating marine mammal datasets. These animations, commissioned by scientists to achieve outreach and conservation outcomes, have successfully increased the reach and engagement of the scientific projects they describe. These impactful visualizations help scientists identify behavioral responses to disturbance, increase public awareness of human-caused disturbance, and help build momentum for targeted conservation efforts backed by scientific evidence. 
    more » « less