skip to main content


Title: CNAsim: improved simulation of single-cell copy number profiles and DNA-seq data from tumors
Abstract Summary

CNAsim is a software package for improved simulation of single-cell copy number alteration (CNA) data from tumors. CNAsim can be used to efficiently generate single-cell copy number profiles for thousands of simulated tumor cells under a more realistic error model and a broader range of possible CNA mechanisms compared with existing simulators. The error model implemented in CNAsim accounts for the specific biases of single-cell sequencing that leads to read count fluctuation and poor resolution of CNA detection. For improved realism over existing simulators, CNAsim can (i) generate WGD, whole-chromosomal CNAs, and chromosome-arm CNAs, (ii) simulate subclonal population structure defined by the accumulation of chromosomal CNAs, and (iii) dilute the sampled cell population with both normal diploid cells and pseudo-diploid cells. The software can also generate DNA-seq data for sampled cells.

Availability and implementation

CNAsim is written in Python and is freely available open-source from https://github.com/samsonweiner/CNAsim.

 
more » « less
Award ID(s):
2212511
NSF-PAR ID:
10434086
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
39
Issue:
7
ISSN:
1367-4811
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Przytycka, Teresa M. (Ed.)
    Copy-number aberrations (CNAs) are genetic alterations that amplify or delete the number of copies of large genomic segments. Although they are ubiquitous in cancer and, thus, a critical area of current cancer research, CNA identification from DNA sequencing data is challenging because it requires partitioning of the genome into complex segments with the same copy-number states that may not be contiguous. Existing segmentation algorithms address these challenges either by leveraging the local information among neighboring genomic regions, or by globally grouping genomic regions that are affected by similar CNAs across the entire genome. However, both approaches have limitations: overclustering in the case of local segmentation, or the omission of clusters corresponding to focal CNAs in the case of global segmentation. Importantly, inaccurate segmentation will lead to inaccurate identification of CNAs. For this reason, most pan-cancer research studies rely on manual procedures of quality control and anomaly correction. To improve copy-number segmentation, we introduce CNAV iz , a web-based tool that enables the user to simultaneously perform local and global segmentation, thus overcoming the limitations of each approach. Using simulated data, we demonstrate that by several metrics, CNAV iz allows the user to obtain more accurate segmentation relative to existing local and global segmentation methods. Moreover, we analyze six bulk DNA sequencing samples from three breast cancer patients. By validating with parallel single-cell DNA sequencing data from the same samples, we show that by using CNAV iz , our user was able to obtain more accurate segmentation and improved accuracy in downstream copy-number calling. 
    more » « less
  2. Abstract Background

    Alterations in genes encoding chromatin regulatory proteins are prevalent in cancers and may confer oncogenic properties and molecular changes linked to therapy resistance. However, the impact of copy number alterations (CNAs) of the SWItch/Sucrose NonFermentable (SWI/SNF) complex on the oncogenic and immunologic properties has not been systematically explored across human cancer types.

    Methods

    We comprehensively analyzed the genomic, transcriptomic and clinical data of The Cancer Genome Atlas (TCGA) dataset across 33 solid cancers.

    Results

    CNAs of the SWI/SNF components were identified in more than 25% of all queried cancers, and tumors harboring SWI/SNF CNAs demonstrated a worse overall survival (OS) than others in several cancer types. Mechanistically, the SCNA events in the SWI/SNF complex are correlated with dysregulated genomic features and oncogenic pathways, including the cell cycle, DNA damage and repair. Notably, the SWI/SNF CNAs were associated with homologous recombination deficiency (HRD) and improved clinical outcomes of platinum-treated ovarian cancer. Furthermore, we observed distinct immune infiltrating patterns and immunophenotypes associated with SWI/SNF CNAs in different cancer types.

    Conclusion

    The CNA events of the SWI/SNF components are a key process linked to oncogenesis, immune infiltration and therapeutic responsiveness across human cancers.

     
    more » « less
  3. Abstract Background Every tumor is composed of heterogeneous clones, each corresponding to a distinct subpopulation of cells that accumulated different types of somatic mutations, ranging from single-nucleotide variants (SNVs) to copy-number aberrations (CNAs). As the analysis of this intra-tumor heterogeneity has important clinical applications, several computational methods have been introduced to identify clones from DNA sequencing data. However, due to technological and methodological limitations, current analyses are restricted to identifying tumor clones only based on either SNVs or CNAs, preventing a comprehensive characterization of a tumor’s clonal composition. Results To overcome these challenges, we formulate the identification of clones in terms of both SNVs and CNAs as a integration problem while accounting for uncertainty in the input SNV and CNA proportions. We thus characterize the computational complexity of this problem and we introduce PACTION (PArsimonious Clone Tree integratION), an algorithm that solves the problem using a mixed integer linear programming formulation. On simulated data, we show that tumor clones can be identified reliably, especially when further taking into account the ancestral relationships that can be inferred from the input SNVs and CNAs. On 49 tumor samples from 10 prostate cancer patients, our integration approach provides a higher resolution view of tumor evolution than previous studies. Conclusion PACTION is an accurate and fast method that reconstructs clonal architecture of cancer tumors by integrating SNV and CNA clones inferred using existing methods. 
    more » « less
  4. Abstract Motivation

    Cancer is characterized by intra-tumor heterogeneity, the presence of distinct cell populations with distinct complements of somatic mutations, which include single-nucleotide variants (SNVs) and copy-number aberrations (CNAs). Single-cell sequencing technology enables one to study these cell populations at single-cell resolution. Phylogeny estimation algorithms that employ appropriate evolutionary models are key to understanding the evolutionary mechanisms behind intra-tumor heterogeneity.

    Results

    We introduce Single-cell Phylogeny Reconstruction (SPhyR), a method for tumor phylogeny estimation from single-cell sequencing data. In light of frequent loss of SNVs due to CNAs in cancer, SPhyR employs the k-Dollo evolutionary model, where a mutation can only be gained once but lost k times. Underlying SPhyR is a novel combinatorial characterization of solutions as constrained integer matrix completions, based on a connection to the cladistic multi-state perfect phylogeny problem. SPhyR outperforms existing methods on simulated data and on a metastatic colorectal cancer.

    Availability and implementation

    SPhyR is available on https://github.com/elkebir-group/SPhyR.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. Abstract Summary

    We report on a new single-cell DNA sequence simulator, SimSCSnTree, which generates an evolutionary tree of cells and evolves single nucleotide variants (SNVs) and copy number aberrations (CNAs) along its branches. Data generated by the simulator can be used to benchmark tools for single-cell genomic analyses, particularly in cancer where SNVs and CNAs are ubiquitous.

    Availability and implementation

    SimSCSnTree is now on BioConda and also is freely available for download at https://github.com/compbiofan/SimSCSnTree.git with detailed documentation.

     
    more » « less