skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: MoTERNN: Classifying the Mode of Cancer Evolution Using Recursive Neural Networks
With the advent of single-cell DNA sequencing, it is now possible to infer the evolutionary history of thousands of tumor cells obtained from a single patient. This evolutionary history, which takes the shape of a tree, reveals the mode of evolution of the specific cancer under study and, in turn, helps with clinical diagnosis, prognosis, and therapeutic treatment. In this study we focus on the question of determining the mode of evolution of tumor cells from their inferred evolutionary history. In particular, we employ recursive neural networks that capture tree structures to classify the evolutionary history of tumor cells into one of four modes—linear, branching, neutral, and punctuated. We trained our model, MoTERNN, using simulated data in a supervised fashion and applied it to a real phylogenetic tree obtained from single-cell DNA sequencing data. MoTERNN is implemented in Python and is publicly available at https://github.com/NakhlehLab/MoTERNN.  more » « less
Award ID(s):
2106837
PAR ID:
10466073
Author(s) / Creator(s):
Date Published:
Journal Name:
RECOMB Comparative Genomics
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Przytycka, Teresa M. (Ed.)
    Emerging ultra-low coverage single-cell DNA sequencing (scDNA-seq) technologies have enabled high resolution evolutionary studies of copy number aberrations (CNAs) within tumors. While these sequencing technologies are well suited for identifying CNAs due to the uniformity of sequencing coverage, the sparsity of coverage poses challenges for the study of single-nucleotide variants (SNVs). In order to maximize the utility of increasingly available ultra-low coverage scDNA-seq data and obtain a comprehensive understanding of tumor evolution, it is important to also analyze the evolution of SNVs from the same set of tumor cells. We presentPhertilizer, a method to infer a clonal tree from ultra-low coverage scDNA-seq data of a tumor. Based on a probabilistic model, our method recursively partitions the data by identifying key evolutionary events in the history of the tumor. We demonstrate the performance ofPhertilizeron simulated data as well as on two real datasets, finding thatPhertilizereffectively utilizes the copy-number signal inherent in the data to more accurately uncover clonal structure and genotypes compared to previous methods. 
    more » « less
  2. In a multicellular organism, cell lineages share a common evolutionary history. Knowing this history can facilitate the study of development, aging, and cancer. Cell lineage trees represent the evolutionary history of cells sampled from an organism. Recent developments in single-cell sequencing have greatly facilitated the inference of cell lineage trees. However, single-cell data are sparse and noisy, and the size of single-cell data is increasing rapidly. Accurate inference of cell lineage tree from large single-cell data is computationally challenging. In this paper, we present ScisTree2, a fast and accurate cell lineage tree inference and genotype calling approach based on the infinite-sites model. ScisTree2 relies on an efficient local search approach to find optimal trees. ScisTree2 also calls single-cell genotypes based on the inferred cell lineage tree. Experiments on simulated and real biological data show that ScisTree2 achieves better overall accuracy while being significantly more efficient than existing methods. To the best of our knowledge, ScisTree2 is the first model-based cell lineage tree inference and genotype calling approach that is capable of handling datasets from tens of thousands of cells or more. 
    more » « less
  3. null (Ed.)
    Abstract Background Cancer progression reconstruction is an important development stemming from the phylogenetics field. In this context, the reconstruction of the phylogeny representing the evolutionary history presents some peculiar aspects that depend on the technology used to obtain the data to analyze: Single Cell DNA Sequencing data have great specificity, but are affected by moderate false negative and missing value rates. Moreover, there has been some recent evidence of back mutations in cancer: this phenomenon is currently widely ignored. Results We present a new tool, , that reconstructs a tumor phylogeny from Single Cell Sequencing data, allowing each mutation to be lost at most a fixed number of times. The General Parsimony Phylogeny from Single cell () tool is open source and available at https://github.com/AlgoLab/gpps . Conclusions provides new insights to the analysis of intra-tumor heterogeneity by proposing a new progression model to the field of cancer phylogeny reconstruction on Single Cell data. 
    more » « less
  4. A number of methods have recently been proposed to reconstruct the evolutionary history of a tumor from noisy DNA sequencing data. We investigate when and how well these histories can be reconstructed from multi-sample bulk sequencing data when considering only single nucleotide variants (SNVs). We formalize this as the Enumeration Variant Allele Frequency Factorization Problem and provide a novel proof for an upper bound on the number of possible phylogenies consistent with a given dataset. In addition, we propose and assess two methods for increasing the robustness and performance of an existing graph based phylogenetic inference method. We apply our approaches to noisy simulated data and find that low coverage and high noise make it more difficult to identify phylogenies. We also apply our methods to both chronic lymphocytic leukemia and clear cell renal cell carcinoma datasets. 
    more » « less
  5. null (Ed.)
    Abstract Background Cancer arises from an evolutionary process where somatic mutations give rise to clonal expansions. Reconstructing this evolutionary process is useful for treatment decision-making as well as understanding evolutionary patterns across patients and cancer types. In particular, classifying a tumor’s evolutionary process as either linear or branched and understanding what cancer types and which patients have each of these trajectories could provide useful insights for both clinicians and researchers. While comprehensive cancer phylogeny inference from single-cell DNA sequencing data is challenging due to limitations with current sequencing technology and the complexity of the resulting problem, current data might provide sufficient signal to accurately classify a tumor’s evolutionary history as either linear or branched. Results We introduce the Linear Perfect Phylogeny Flipping (LPPF) problem as a means of testing two alternative hypotheses for the pattern of evolution, which we prove to be NP-hard. We develop Phyolin, which uses constraint programming to solve the LPPF problem. Through both in silico experiments and real data application, we demonstrate the performance of our method, outperforming a competing machine learning approach. Conclusion Phyolin is an accurate, easy to use and fast method for classifying an evolutionary trajectory as linear or branched given a tumor’s single-cell DNA sequencing data. 
    more » « less