skip to main content


Title: A fast exact functional test for directional association and cancer biology applications
Directional association measured by functional dependency can answer important questions on relationships between variables, for example, in discovery of molecular interactions in biological systems. However, when one has no prior information about the functional form of a directional association, there is not a widely established statistical procedure to detect such an association. To address this issue, here we introduce an exact functional test for directional association by examining the strength of functional dependency. It is effective in promoting functional patterns by reducing statistical power on dependent non-functional patterns. We designed an algorithm to carry out the test using a fast branch-and-bound strategy, which achieved a substantial speedup over brute-force enumeration. On data from an epidemiological study of liver cancer, the test identified the hepatitis status of a subject as the most influential risk factor among others for the cancer phenotype. On human lung cancer transcriptome data, the test selected 1068 transcription start sites of putative noncoding RNAs directionally associated with the presence or absence of lung cancer, stronger than 95 percent transcription start sites of 694 curated cancer genes. These predictions include non-monotonic interaction patterns, to which other routine tests were insensitive. Complementing symmetric (non-directional) association methods such as Fisher’s exact test, the exact functional test is a unique exact statistical test for evaluating evidence for causal relationships.  more » « less
Award ID(s):
1661331
NSF-PAR ID:
10055967
Author(s) / Creator(s):
;
Date Published:
Journal Name:
IEEE/ACM Transactions on Computational Biology and Bioinformatics
ISSN:
1545-5963
Page Range / eLocation ID:
1 to 1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. RNA-binding proteins (RBPs) participate in all stages of RNA life cycle from transcription, splicing, to translation. Under the ENCODE project, a large number of RBPs were knocked down in human cancer cell lines, offering an excellent opportunity to infer targets of RBPs. Taking both RBP binding sites and RNA-seq profiles of RBP knockdown samples as input, we present a pipeline to identify causal RBP RNA interactions. The pipeline employs a recent functional chi-square test (FunChisq) that deciphers directional association, and utilizes a novel functional index that measures the effect size of functional dependency. We examined ∼45 million RBP RNA pairs in leukemia (K562) and liver cancer (HepG2) cell lines for functional patterns as causal interaction candidates. Here, we report a total of 936,707 RBP RNA pairs in the two cell lines that show statistically significant linear or nonlinear functional patterns. About 31% of these pairs have supportive biological evidence from other sources, suggesting the effectiveness of the pipeline. The interactions constitute RBP specific regulatory networks that may potentially represent core mechanisms in the two cancers. The pipeline is implemented through an R interface with pre-computed results and data libraries for users to query specific networks and visualize RBP RNA interactions. Such networks serve as a useful resource for studying RNA dysregulation in cancer. 
    more » « less
  2. null (Ed.)
    Abstract Background DNA methylation is an epigenetic event involving the addition of a methyl-group to a cytosine-guanine base pair (i.e., CpG site). It is associated with different cancers. Our research focuses on studying non-small cell lung cancer hemimethylation, which refers to methylation occurring on only one of the two DNA strands. Many studies often assume that methylation occurs on both DNA strands at a CpG site. However, recent publications show the existence of hemimethylation and its significant impact. Therefore, it is important to identify cancer hemimethylation patterns. Methods In this paper, we use the Wilcoxon signed rank test to identify hemimethylated CpG sites based on publicly available non-small cell lung cancer methylation sequencing data. We then identify two types of hemimethylated CpG clusters, regular and polarity clusters, and genes with large numbers of hemimethylated sites. Highly hemimethylated genes are then studied for their biological interactions using available bioinformatics tools. Results In this paper, we have conducted the first-ever investigation of hemimethylation in lung cancer. Our results show that hemimethylation does exist in lung cells either as singletons or clusters. Most clusters contain only two or three CpG sites. Polarity clusters are much shorter than regular clusters and appear less frequently. The majority of clusters found in tumor samples have no overlap with clusters found in normal samples, and vice versa. Several genes that are known to be associated with cancer are hemimethylated differently between the cancerous and normal samples. Furthermore, highly hemimethylated genes exhibit many different interactions with other genes that may be associated with cancer. Hemimethylation has diverse patterns and frequencies that are comparable between normal and tumorous cells. Therefore, hemimethylation may be related to both normal and tumor cell development. Conclusions Our research has identified CpG clusters and genes that are hemimethylated in normal and lung tumor samples. Due to the potential impact of hemimethylation on gene expression and cell function, these clusters and genes may be important to advance our understanding of the development and progression of non-small cell lung cancer. 
    more » « less
  3. null (Ed.)
    Functional dependency can lead to discoveries of new mechanisms not possible via symmetric association. Most asymmetric methods for causal direction inference are not driven by the function-versus-independence question. A recent exact functional test (EFT) was designed to detect functionally dependent patterns model-free with an exact null distribution. However, the EFT lacked a theoretical justification, had not been compared with other asymmetric methods, and was practically slow. Here, we prove the functional optimality of the EFT statistic, demonstrate its advantage in functional inference accuracy over five other methods, and develop a branch-and-bound algorithm with dynamic and quadratic programming to run at orders of magnitude faster than its previous implementation. Our results make it practical to answer the exact functional dependency question arising from discovery-driven artificial intelligence applications. Software that implements EFT is freely available in the R package 'FunChisq' (≥2.5.0) at https://cran.r-project.org/package=FunChisq 
    more » « less
  4. Abstract

    Transcription initiation is regulated in a highly organized fashion to ensure proper cellular functions. Accurate identification of transcription start sites (TSSs) and quantitative characterization of transcription initiation activities are fundamental steps for studies of regulated transcriptions and core promoter structures. Several high-throughput techniques have been developed to sequence the very 5′end of RNA transcripts (TSS sequencing) on the genome scale. Bioinformatics tools are essential for processing, analysis, and visualization of TSS sequencing data. Here, we present TSSr, an R package that provides rich functions for mapping TSS and characterizations of structures and activities of core promoters based on all types of TSS sequencing data. Specifically, TSSr implements several newly developed algorithms for accurately identifying TSSs from mapped sequencing reads and inference of core promoters, which are a prerequisite for subsequent functional analyses of TSS data. Furthermore, TSSr also enables users to export various types of TSS data that can be visualized by genome browser for inspection of promoter activities in association with other genomic features, and to generate publication-ready TSS graphs. These user-friendly features could greatly facilitate studies of transcription initiation based on TSS sequencing data. The source code and detailed documentations of TSSr can be freely accessed at https://github.com/Linlab-slu/TSSr.

     
    more » « less
  5. Abstract

    Telomerase reverse transcriptase (TERT) activation plays an important role in cancer development by enabling the immortalization of cells.TERTregulation is multifaceted, and its promoter methylation has been implicated in controlling expression through alteration in transcription factor binding. We have characterizedTERTpromoter methylation, transcription factor binding, andTERTexpression levels in five differentiated thyroid cancer (DTC) cell lines and six normal thyroid tissue samples by targeted bisulfite sequencing, ChIP‐qPCR, and qRT‐PCR. DTC cell lines express varying levels ofTERTand exhibitTERTpromoter methylation patterns similar to patterns seen in other telomerase positive cancer cell lines. The minimal promoter immediately surrounding the transcription start site is hypomethylated, while further upstream portions show dense methylation. In contrast, theTERTpromoter in normal thyroid tissue is largely unmethylated throughout and expressesTERTminimally. Transcription factor binding is also affected byTERTmutation status. The E‐twenty‐six (ETS) factor GABPA exhibitsTERTbinding in theTERTmutant DTC cells only, and allele‐specific methylation patterns at the minimal promoter were observed as well, which may indicate allele‐specific factor recruitment at the minimal promoter. Furthermore, we identified binding sites for activators MYC and GSC in the hypermethylated upstream region, pointing to its possible importance inTERTregulation. Overall,TERTexpression and telomerase activity depend on the interplay of multiple regulatory mechanisms includingTERTpromoter methylation, mutation status, and recruitment of transcription factors. This work explores of the interplay between these regulatory mechanisms and offers insight into cellular control of active telomerase in human cancer.

     
    more » « less