skip to main content


Title: SPhyR: tumor phylogeny estimation from single-cell sequencing data under loss and error
Abstract Motivation

Cancer is characterized by intra-tumor heterogeneity, the presence of distinct cell populations with distinct complements of somatic mutations, which include single-nucleotide variants (SNVs) and copy-number aberrations (CNAs). Single-cell sequencing technology enables one to study these cell populations at single-cell resolution. Phylogeny estimation algorithms that employ appropriate evolutionary models are key to understanding the evolutionary mechanisms behind intra-tumor heterogeneity.

Results

We introduce Single-cell Phylogeny Reconstruction (SPhyR), a method for tumor phylogeny estimation from single-cell sequencing data. In light of frequent loss of SNVs due to CNAs in cancer, SPhyR employs the k-Dollo evolutionary model, where a mutation can only be gained once but lost k times. Underlying SPhyR is a novel combinatorial characterization of solutions as constrained integer matrix completions, based on a connection to the cladistic multi-state perfect phylogeny problem. SPhyR outperforms existing methods on simulated data and on a metastatic colorectal cancer.

Availability and implementation

SPhyR is available on https://github.com/elkebir-group/SPhyR.

Supplementary information

Supplementary data are available at Bioinformatics online.

 
more » « less
NSF-PAR ID:
10421944
Author(s) / Creator(s):
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
34
Issue:
17
ISSN:
1367-4803
Page Range / eLocation ID:
p. i671-i679
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background Every tumor is composed of heterogeneous clones, each corresponding to a distinct subpopulation of cells that accumulated different types of somatic mutations, ranging from single-nucleotide variants (SNVs) to copy-number aberrations (CNAs). As the analysis of this intra-tumor heterogeneity has important clinical applications, several computational methods have been introduced to identify clones from DNA sequencing data. However, due to technological and methodological limitations, current analyses are restricted to identifying tumor clones only based on either SNVs or CNAs, preventing a comprehensive characterization of a tumor’s clonal composition. Results To overcome these challenges, we formulate the identification of clones in terms of both SNVs and CNAs as a integration problem while accounting for uncertainty in the input SNV and CNA proportions. We thus characterize the computational complexity of this problem and we introduce PACTION (PArsimonious Clone Tree integratION), an algorithm that solves the problem using a mixed integer linear programming formulation. On simulated data, we show that tumor clones can be identified reliably, especially when further taking into account the ancestral relationships that can be inferred from the input SNVs and CNAs. On 49 tumor samples from 10 prostate cancer patients, our integration approach provides a higher resolution view of tumor evolution than previous studies. Conclusion PACTION is an accurate and fast method that reconstructs clonal architecture of cancer tumors by integrating SNV and CNA clones inferred using existing methods. 
    more » « less
  2. Abstract Motivation We propose Meltos, a novel computational framework to address the challenging problem of building tumor phylogeny trees using somatic structural variants (SVs) among multiple samples. Meltos leverages the tumor phylogeny tree built on somatic single nucleotide variants (SNVs) to identify high confidence SVs and produce a comprehensive tumor lineage tree, using a novel optimization formulation. While we do not assume the evolutionary progression of SVs is necessarily the same as SNVs, we show that a tumor phylogeny tree using high-quality somatic SNVs can act as a guide for calling and assigning somatic SVs on a tree. Meltos utilizes multiple genomic read signals for potential SV breakpoints in whole genome sequencing data and proposes a probabilistic formulation for estimating variant allele fractions (VAFs) of SV events. Results In order to assess the ability of Meltos to correctly refine SNV trees with SV information, we tested Meltos on two simulated datasets with five genomes in both. We also assessed Meltos on two real cancer datasets. We tested Meltos on multiple samples from a liposarcoma tumor and on a multi-sample breast cancer data (Yates et al., 2015), where the authors provide validated structural variation events together with deep, targeted sequencing for a collection of somatic SNVs. We show Meltos has the ability to place high confidence validated SV calls on a refined tumor phylogeny tree. We also showed the flexibility of Meltos to either estimate VAFs directly from genomic data or to use copy number corrected estimates. Availability and implementation Meltos is available at https://github.com/ih-lab/Meltos. Contact imh2003@med.cornell.edu Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  3. Abstract Motivation

    Single-nucleotide variants (SNVs) are the most common variations in the human genome. Recently developed methods for SNV detection from single-cell DNA sequencing data, such as SCIΦ and scVILP, leverage the evolutionary history of the cells to overcome the technical errors associated with single-cell sequencing protocols. Despite being accurate, these methods are not scalable to the extensive genomic breadth of single-cell whole-genome (scWGS) and whole-exome sequencing (scWES) data.

    Results

    Here, we report on a new scalable method, Phylovar, which extends the phylogeny-guided variant calling approach to sequencing datasets containing millions of loci. Through benchmarking on simulated datasets under different settings, we show that, Phylovar outperforms SCIΦ in terms of running time while being more accurate than Monovar (which is not phylogeny-aware) in terms of SNV detection. Furthermore, we applied Phylovar to two real biological datasets: an scWES triple-negative breast cancer data consisting of 32 cells and 3375 loci as well as an scWGS data of neuron cells from a normal human brain containing 16 cells and approximately 2.5 million loci. For the cancer data, Phylovar detected somatic SNVs with high or moderate functional impact that were also supported by bulk sequencing dataset and for the neuron dataset, Phylovar identified 5745 SNVs with non-synonymous effects some of which were associated with neurodegenerative diseases.

    Availability and implementation

    Phylovar is implemented in Python and is publicly available at https://github.com/NakhlehLab/Phylovar.

     
    more » « less
  4. Przytycka, Teresa M. (Ed.)

    Emerging ultra-low coverage single-cell DNA sequencing (scDNA-seq) technologies have enabled high resolution evolutionary studies of copy number aberrations (CNAs) within tumors. While these sequencing technologies are well suited for identifying CNAs due to the uniformity of sequencing coverage, the sparsity of coverage poses challenges for the study of single-nucleotide variants (SNVs). In order to maximize the utility of increasingly available ultra-low coverage scDNA-seq data and obtain a comprehensive understanding of tumor evolution, it is important to also analyze the evolution of SNVs from the same set of tumor cells. We presentPhertilizer, a method to infer a clonal tree from ultra-low coverage scDNA-seq data of a tumor. Based on a probabilistic model, our method recursively partitions the data by identifying key evolutionary events in the history of the tumor. We demonstrate the performance ofPhertilizeron simulated data as well as on two real datasets, finding thatPhertilizereffectively utilizes the copy-number signal inherent in the data to more accurately uncover clonal structure and genotypes compared to previous methods.

     
    more » « less
  5. null (Ed.)
    Abstract Background Cancer progression reconstruction is an important development stemming from the phylogenetics field. In this context, the reconstruction of the phylogeny representing the evolutionary history presents some peculiar aspects that depend on the technology used to obtain the data to analyze: Single Cell DNA Sequencing data have great specificity, but are affected by moderate false negative and missing value rates. Moreover, there has been some recent evidence of back mutations in cancer: this phenomenon is currently widely ignored. Results We present a new tool, , that reconstructs a tumor phylogeny from Single Cell Sequencing data, allowing each mutation to be lost at most a fixed number of times. The General Parsimony Phylogeny from Single cell () tool is open source and available at https://github.com/AlgoLab/gpps . Conclusions provides new insights to the analysis of intra-tumor heterogeneity by proposing a new progression model to the field of cancer phylogeny reconstruction on Single Cell data. 
    more » « less