skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on July 1, 2026

Title: Defining and benchmarking open problems in single-cell analysis
Single-cell genomics has enabled the study of biological processes at an unprecedented scale and resolution. These studies were enabled by innovative data generation technologies coupled with emerging computational tools specialized for single-cell data. As single-cell technologies have become more prevalent, so has the development of new analysis tools, which has resulted in over 1,700 published algorithms1 (as of February 2024). Thus, there is an increasing need to continually evaluate which algorithm performs best in which context to inform best practices2,3 that evolve with the field. In many fields of quantitative science, public competitions and benchmarks address this need by evaluating state-of-the-art methods against known criteria, following the concept of a common task framework4. Here, we present Open Problems, a living, extensive, community-guided platform including 12 current single-cell tasks that we envisage raising standards for the selection, evaluation and development of methods in single-cell analysis.  more » « less
Award ID(s):
2047856
PAR ID:
10617661
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; « less
Corporate Creator(s):
Publisher / Repository:
Springer Nature
Date Published:
Journal Name:
Nature Biotechnology
Volume:
43
Issue:
7
ISSN:
1087-0156
Page Range / eLocation ID:
1035 to 1040
Subject(s) / Keyword(s):
Genomics, Single Cell, Benchmarking
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Progress in sequencing, microfluidics, and analysis strategies has revolutionized the granularity at which multicellular organisms can be studied. In particular, single-cell transcriptomics has led to fundamental new insights into animal biology, such as the discovery of new cell types and cell type-specific disease processes. However, the application of single-cell approaches to plants, fungi, algae, or bacteria (environmental organisms) has been far more limited, largely due to the challenges posed by polysaccharide walls surrounding these species’ cells. In this perspective, we discuss opportunities afforded by single-cell technologies for energy and environmental science and grand challenges that must be tackled to apply these approaches to plants, fungi and algae. We highlight the need to develop better and more comprehensive single-cell technologies, analysis and visualization tools, and tissue preparation methods. We advocate for the creation of a centralized, open-access database to house plant single-cell data. Finally, we consider how such efforts should balance the need for deep characterization of select model species while still capturing the diversity in the plant kingdom. Investments into the development of methods, their application to relevant species, and the creation of resources to support data dissemination will enable groundbreaking insights to propel energy and environmental science forward. 
    more » « less
  2. null (Ed.)
    Abstract Motivation While single-cell DNA sequencing (scDNA-seq) has enabled the study of intratumor heterogeneity at an unprecedented resolution, current technologies are error-prone and often result in doublets where two or more cells are mistaken for a single cell. Not only do doublets confound downstream analyses, but the increase in doublet rate is also a major bottleneck preventing higher throughput with current single-cell technologies. Although doublet detection and removal are standard practice in scRNA-seq data analysis, options for scDNA-seq data are limited. Current methods attempt to detect doublets while also performing complex downstream analyses tasks, leading to decreased efficiency and/or performance. Results We present doubletD, the first standalone method for detecting doublets in scDNA-seq data. Underlying our method is a simple maximum likelihood approach with a closed-form solution. We demonstrate the performance of doubletD on simulated data as well as real datasets, outperforming current methods for downstream analysis of scDNA-seq data that jointly infer doublets as well as standalone approaches for doublet detection in scRNA-seq data. Incorporating doubletD in scDNA-seq analysis pipelines will reduce complexity and lead to more accurate results. Availability and implementation https://github.com/elkebir-group/doubletD. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  3. Methods for detecting and monitoring known and emerging viral pathogens in the environment are imperative for understanding risk and establishing regulatory standards in environmental and public health sectors. Next-generation sequencing (NGS) has uncovered the diversity of entire microbial populations, enabled discovery of novel organisms, and allowed pathogen surveillance. Metagenomics, the sequencing and analysis of all genetic material in a sample, is a detection method that circumvents the need for cell culturing and prior understanding of microbial assemblies, which are necessary in traditional detection methods. Advancements in NGS technologies have led to subsequent advancements in data analysis methodologies and practices to increase specificity, and accuracy of metagenomic studies. This paper highlights applications of metagenomics inviral pathogen detection, discusses suggested best practices for detecting the diversity of viruses in environmental systems (specifically water environments), and addresses the limitations of virus detection using NGS methods. Information presented in this paper will assist researchers in selecting an appropriate metagenomics approach for obtaining a comprehensive view of viruses in water systems. 
    more » « less
  4. Exciting advances in technologies to measure biological systems are currently at the forefront of research. The ability to gather data along an increasing number of omic dimensions has created a need for tools to analyze all of this information together, rather than siloing each technology into separate analysis pipelines. To advance this goal, we introduce a framework called the Single-Cell Multi-Modal GAN (scMMGAN) that integrates data from multiple modalities into a unified representation in the ambient data space for downstream analysis using a combination of adversarial learning and data geometry techniques. The framework’s key improvement is an additional diffusion geometry loss with a new kernel that constrains the otherwise over-parameterized GAN network. We demonstrate scMMGAN’s ability to produce more meaningful alignments than alternative methods on a wide variety of data modalities, and that its output can be used to draw conclusions from real-world biological experimental data. We highlight data from an experiment studying the development of triple negative breast cancer, where we show how scMMGAN can be used to identify novel gene associations and we demonstrate that cell clusters identified only on the scRNAseq data occur in localized spatial patterns that reveal insights on the spatial transcriptomic images. 
    more » « less
  5. on mobility-mass spectrometry (IM-MS) has become a technology deployed across a wide range of structural biology applications despite the challenges in characterizing closely related protein structures. Collision-induced unfolding (CIU) has emerged as a valuable technique for distinguishing closely related, iso-cross-sectional protein and protein complex ions through their distinct unfolding pathways in the gas phase. With the speed and sensitivity of CIU analyses, there has been a rapid growth of CIU-based assays, especially regarding biomolecular targets that remain challenging to assess and characterize with other structural biology tools. With information-rich CIU data, many software tools have been developed to automate laborious data analysis. However, with the recent development of new IM-MS technologies, such as cyclic IM-MS, CIU continues to evolve, necessitating improved data analysis tools to keep pace with new technologies and facilitating the automation of various data processing tasks. Here, we present CIUSuite 3, a software package that contains updated algorithms that support various IM-MS platforms and supports the automation of various data analysis tasks such as peak detection, multidimensional classification, and collision cross section (CCS) calibration. CIUSuite 3 uses local maxima searches along with peak width and prominence filters to detect peaks to automate CIU data extraction. To support both the primary CIU (CIU1) and secondary CIU (CIU2) experiments enabled by cyclic IM-MS, two-dimensional data preprocessing is deployed, which allows multidimensional classification. Our data suggest that additional dimensions in classification improve the overall accuracy of class assignments. CIUSuite 3 also supports CCS calibration for both traveling wave and drift tube IM-MS, and we demonstrate the accuracy of a new single-field CCS calibration method designed for drift tube IM-MS leveraging calibrant CIU data. Overall, CIUSuite 3 is positioned to support current and next-generation IM-MS and CIU assay development deployed in an automated format. 
    more » « less