skip to main content


Title: ClonArch: visualizing the spatial clonal architecture of tumors
Abstract Motivation Cancer is caused by the accumulation of somatic mutations that lead to the formation of distinct populations of cells, called clones. The resulting clonal architecture is the main cause of relapse and resistance to treatment. With decreasing costs in DNA sequencing technology, rich cancer genomics datasets with many spatial sequencing samples are becoming increasingly available, enabling the inference of high-resolution tumor clones and prevalences across different spatial coordinates. While temporal and phylogenetic aspects of tumor evolution, such as clonal evolution over time and clonal response to treatment, are commonly visualized in various clonal evolution diagrams, visual analytics methods that reveal the spatial clonal architecture are missing. Results This article introduces ClonArch, a web-based tool to interactively visualize the phylogenetic tree and spatial distribution of clones in a single tumor mass. ClonArch uses the marching squares algorithm to draw closed boundaries representing the presence of clones in a real or simulated tumor. ClonArch enables researchers to examine the spatial clonal architecture of a subset of relevant mutations at different prevalence thresholds and across multiple phylogenetic trees. In addition to simulated tumors with varying number of biopsies, we demonstrate the use of ClonArch on a hepatocellular carcinoma tumor with ∼280 sequencing biopsies. ClonArch provides an automated way to interactively examine the spatial clonal architecture of a tumor, facilitating clinical and biological interpretations of the spatial aspects of intra-tumor heterogeneity. Availability and implementation https://github.com/elkebir-group/ClonArch.  more » « less
Award ID(s):
1850502
NSF-PAR ID:
10289252
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Bioinformatics
Volume:
36
Issue:
Supplement_1
ISSN:
1367-4803
Page Range / eLocation ID:
i161 to i168
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background Every tumor is composed of heterogeneous clones, each corresponding to a distinct subpopulation of cells that accumulated different types of somatic mutations, ranging from single-nucleotide variants (SNVs) to copy-number aberrations (CNAs). As the analysis of this intra-tumor heterogeneity has important clinical applications, several computational methods have been introduced to identify clones from DNA sequencing data. However, due to technological and methodological limitations, current analyses are restricted to identifying tumor clones only based on either SNVs or CNAs, preventing a comprehensive characterization of a tumor’s clonal composition. Results To overcome these challenges, we formulate the identification of clones in terms of both SNVs and CNAs as a integration problem while accounting for uncertainty in the input SNV and CNA proportions. We thus characterize the computational complexity of this problem and we introduce PACTION (PArsimonious Clone Tree integratION), an algorithm that solves the problem using a mixed integer linear programming formulation. On simulated data, we show that tumor clones can be identified reliably, especially when further taking into account the ancestral relationships that can be inferred from the input SNVs and CNAs. On 49 tumor samples from 10 prostate cancer patients, our integration approach provides a higher resolution view of tumor evolution than previous studies. Conclusion PACTION is an accurate and fast method that reconstructs clonal architecture of cancer tumors by integrating SNV and CNA clones inferred using existing methods. 
    more » « less
  2. Cancer results from an evolutionary process that typically yields multiple clones with varying sets of mutations within the same tumor. Accurately modeling this process is key to understanding and predicting cancer evolution. Here, we introduce clone to mutation (CloMu), a flexible and low-parameter tree generative model of cancer evolution. CloMu uses a two-layer neural network trained via reinforcement learning to determine the probability of new mutations based on the existing mutations on a clone. CloMu supports several prediction tasks, including the determination of evolutionary trajectories, tree selection, causality and interchangeability between mutations, and mutation fitness. Importantly, previous methods support only some of these tasks, and many suffer from overfitting on data sets with a large number of mutations. Using simulations, we show that CloMu either matches or outperforms current methods on a wide variety of prediction tasks. In particular, for simulated data with interchangeable mutations, current methods are unable to uncover causal relationships as effectively as CloMu. On breast cancer and leukemia cohorts, we show that CloMu determines similarities and causal relationships between mutations as well as the fitness of mutations. We validate CloMu's inferred mutation fitness values for the leukemia cohort by comparing them to clonal proportion data not used during training, showing high concordance. In summary, CloMu's low-parameter model facilitates a wide range of prediction tasks regarding cancer evolution on increasingly available cohort-level data sets.

     
    more » « less
  3. Abstract Motivation Clinical sequencing aims to identify somatic mutations in cancer cells for accurate diagnosis and treatment. However, most widely used clinical assays lack patient-matched control DNA and additional analysis is needed to distinguish somatic and unfiltered germline variants. Such computational analyses require accurate assessment of tumor cell content in individual specimens. Histological estimates often do not corroborate with results from computational methods that are primarily designed for normal-tumor matched data and can be confounded by genomic heterogeneity and presence of sub-clonal mutations. Methods All-FIT is an iterative weighted least square method to estimate specimen tumor purity based on the allele frequencies of variants detected in high-depth, targeted, clinical sequencing data. Results Using simulated and clinical data, we demonstrate All-FIT’s accuracy and improved performance against leading computational approaches, highlighting the importance of interpreting purity estimates based on expected biology of tumors. Availability and Implementation Freely available at http://software.khiabanian-lab.org. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  4. Inspired by recent efforts to model cancer evolution with phylogenetic trees, we consider the problem of finding a consensus tumor evolution tree from a set of conflicting input trees. In contrast to traditional phylogenetic trees, the tumor trees we consider contain features such as mutation labels on internal vertices (in addition to the leaves) and allow multiple mutations to label a single vertex. We describe several distance measures between these tumor trees and present an algorithm to solve the consensus problem called GraPhyC. Our approach uses a weighted directed graph where vertices are sets of mutations and edges are weighted using a function that depends on the number of times a parental relationship is observed between their constituent mutations in the set of input trees. We find a minimum weight spanning arborescence in this graph and prove that the resulting tree minimizes the total distance to all input trees for one of our presented distance measures. We evaluate our GraPhyC method using both simulated and real data. On simulated data we show that our method outperforms a baseline method at finding an appropriate representative tree. Using a set of tumor trees derived from both whole-genome and deep sequencing data from a Chronic Lymphocytic Leukemia patient we find that our approach identifies a tree not included in the set of input trees, but that contains characteristics supported by other reported evolutionary reconstructions of this tumor. 
    more » « less
  5. Abstract Background and Aims

    Sphagnum (peatmoss) comprises a moss (Bryophyta) clade with ~300–500 species. The genus has unparalleled ecological importance because Sphagnum-dominated peatlands store almost a third of the terrestrial carbon pool and peatmosses engineer the formation and microtopography of peatlands. Genomic resources for Sphagnum are being actively expanded, but many aspects of their biology are still poorly known. Among these are the degree to which Sphagnum species reproduce asexually, and the relative frequencies of male and female gametophytes in these haploid-dominant plants. We assess clonality and gametophyte sex ratios and test hypotheses about the local-scale distribution of clones and sexes in four North American species of the S. magellanicum complex. These four species are difficult to distinguish morphologically and are very closely related. We also assess microbial communities associated with Sphagnum host plant clones and sexes at two sites.

    Methods

    Four hundred and five samples of the four species, representing 57 populations, were subjected to restriction site-associated DNA sequencing (RADseq). Analyses of population structure and clonality based on the molecular data utilized both phylogenetic and phenetic approaches. Multi-locus genotypes (genets) were identified using the RADseq data. Sexes of sampled ramets were determined using a molecular approach that utilized coverage of loci on the sex chromosomes after the method was validated using a sample of plants that expressed sex phenotypically. Sex ratios were estimated for each species, and populations within species. Difference in fitness between genets was estimated as the numbers of ramets each genet comprised. Degrees of clonality [numbers of genets/numbers of ramets (samples)] within species, among sites, and between gametophyte sexes were estimated. Sex ratios were estimated for each species, and populations within species. Sphagnum-associated microbial communities were assessed at two sites in relation to Sphagnum clonality and sex.

    Key Results

    All four species appear to engage in a mixture of sexual and asexual (clonal) reproduction. A single ramet represents most genets but two to eight ramets were dsumbers ansd text etected for some genets. Only one genet is represented by ramets in multiple populations; all other genets are restricted to a single population. Within populations ramets of individual genets are spatially clustered, suggesting limited dispersal even within peatlands. Sex ratios are male-biased in S. diabolicum but female-biased in the other three species, although significantly so only in S. divinum. Neither species nor males/females differ in levels of clonal propagation. At St Regis Lake (NY) and Franklin Bog (VT), microbial community composition is strongly differentiated between the sites, but differences between species, genets and sexes were not detected. Within S. divinum, however, female gametophytes harboured two to three times the number of microbial taxa as males.

    Conclusions

    These four Sphagnum species all exhibit similar reproductive patterns that result from a mixture of sexual and asexual reproduction. The spatial patterns of clonally replicated ramets of genets suggest that these species fall between the so-called phalanx patterns, where genets abut one another but do not extensively mix because of limited ramet fragmentation, and the guerrilla patterns, where extensive genet fragmentation and dispersal result in greater mixing of different genets. Although sex ratios in bryophytes are most often female-biased, both male and female biases occur in this complex of closely related species. The association of far greater microbial diversity for female gametophytes in S. divinum, which has a female-biased sex ratio, suggests additional research to determine if levels of microbial diversity are consistently correlated with differing patterns of sex ratio biases.

     
    more » « less