skip to main content

Title: Optimality, Accuracy, and Efficiency of an Exact Functional Test
Functional dependency can lead to discoveries of new mechanisms not possible via symmetric association. Most asymmetric methods for causal direction inference are not driven by the function-versus-independence question. A recent exact functional test (EFT) was designed to detect functionally dependent patterns model-free with an exact null distribution. However, the EFT lacked a theoretical justification, had not been compared with other asymmetric methods, and was practically slow. Here, we prove the functional optimality of the EFT statistic, demonstrate its advantage in functional inference accuracy over five other methods, and develop a branch-and-bound algorithm with dynamic and quadratic programming to run at orders of magnitude faster than its previous implementation. Our results make it practical to answer the exact functional dependency question arising from discovery-driven artificial intelligence applications. Software that implements EFT is freely available in the R package 'FunChisq' (≥2.5.0) at
; ;
Award ID(s):
Publication Date:
Journal Name:
Proceedings of the 29th Int'l Joint Conf on Artificial Intelligence, IJCAI-20
Page Range or eLocation-ID:
2683 - 2689
Sponsoring Org:
National Science Foundation
More Like this
  1. Directional association measured by functional dependency can answer important questions on relationships between variables, for example, in discovery of molecular interactions in biological systems. However, when one has no prior information about the functional form of a directional association, there is not a widely established statistical procedure to detect such an association. To address this issue, here we introduce an exact functional test for directional association by examining the strength of functional dependency. It is effective in promoting functional patterns by reducing statistical power on dependent non-functional patterns. We designed an algorithm to carry out the test using a fast branch-and-bound strategy, which achieved a substantial speedup over brute-force enumeration. On data from an epidemiological study of liver cancer, the test identified the hepatitis status of a subject as the most influential risk factor among others for the cancer phenotype. On human lung cancer transcriptome data, the test selected 1068 transcription start sites of putative noncoding RNAs directionally associated with the presence or absence of lung cancer, stronger than 95 percent transcription start sites of 694 curated cancer genes. These predictions include non-monotonic interaction patterns, to which other routine tests were insensitive. Complementing symmetric (non-directional) association methods such as Fisher’smore »exact test, the exact functional test is a unique exact statistical test for evaluating evidence for causal relationships.« less
  2. The complexity, dynamics, and scale of data acquired by modern biotechnology increasingly favor model-free computational methods that make minimal assumptions about underlying biological mechanisms. For example, single-cell transcriptome and proteome data have a throughput several orders more than bulk methods. Many model-free statistical methods for pattern discovery such as mutual information and chi-squared tests, however, require discrete data. Most discretization methods minimize squared errors for each variable independently, not necessarily retaining joint patterns. To address this issue, we present a joint grid discretization algorithm that preserves clusters in the original data. We evaluated this algorithm on simulated data to show its advantage over other methods in maintaining clusters as measured by the adjusted Rand index. We also show it promotes global functional patterns over independent patterns. On single-cell proteome and transcriptome of leukemia and healthy blood, joint grid discretization captured known protein-to-RNA regulatory relationships, while revealing previously unknown interactions. As such, the joint grid discretization is applicable as a data transformation step in associative, functional, and causal inference of molecular interactions fundamental to systems biology. The developed software is publicly available at
  3. Abstract Motivation

    Computer inference of biological mechanisms is increasingly approachable due to dynamically rich data sources such as single-cell genomics. Inferred molecular interactions can prioritize hypotheses for wet-lab experiments to expedite biological discovery. However, complex data often come with unwanted biological or technical variations, exposing biases over marginal distribution and sample size in current methods to favor spurious causal relationships.


    Considering function direction and strength as evidence for causality, we present an adapted functional chi-squared test (AdpFunChisq) that rewards functional patterns over non-functional or independent patterns. On synthetic and three biology datasets, we demonstrate the advantages of AdpFunChisq over 10 methods on overcoming biases that give rise to wide fluctuations in the performance of alternative approaches. On single-cell multiomics data of multiple phenotype acute leukemia, we found that the T-cell surface glycoprotein CD3 delta chain may causally mediate specific genes in the viral carcinogenesis pathway. Using the causality-by-functionality principle, AdpFunChisq offers a viable option for robust causal inference in dynamical systems.

    Availability and implementation

    The AdpFunChisq test is implemented in the R package ‘FunChisq’ (2.5.2 or above) at All other source code along with pre-processed data is available at Code Ocean

    Supplementary information

    Supplementary materials are available at Bioinformatics online.

  4. Finding node associations across different networks is the cornerstone behind a wealth of high-impact data mining applications. Traditional approaches are often, explicitly or implicitly, built upon the linearity and/or consistency assumptions. On the other hand, the recent network embedding based methods promise a natural way to handle the non-linearity, yet they could suffer from the disparate node embedding space of different networks. In this paper, we address these limitations and tackle cross-network node associations from a new angle, i.e., cross-network transformation. We ask a generic question: Given two different networks, how can we transform one network to another? We propose an end-to-end model that learns a composition of nonlinear operations so that one network can be transformed to another in a hierarchical manner. The proposed model bears three distinctive advantages. First (composite transformation), it goes beyond the linearity/consistency assumptions and performs the cross-network transformation through a composition of nonlinear computations. Second (representation power), it can learn the transformation of both network structures and node attributes at different resolutions while identifying the cross-network node associations. Third (generality), it can be applied to various tasks, including network alignment, recommendation, cross-layer dependency inference. Extensive experiments on different tasks validate and verify the effectivenessmore »of the proposed model.« less
  5. The Ty1 retrotransposon family is maintained in a functional but dormant state by its host, Saccharomyces cerevisiae . Several hundred RHF and RTT genes encoding co-factors and restrictors of Ty1 retromobility, respectively, have been identified. Well-characterized examples include MED3 and MED15 , encoding subunits of the Mediator transcriptional co-activator complex; control of retromobility by Med3 and Med15 requires the Ty1 promoter in the U3 region of the long terminal repeat. To characterize the U3-dependence of other Ty1 regulators, we screened a library of 188 known rhf and rtt mutants for altered retromobility of Ty1 his3AI expressed from the strong, TATA-less TEF1 promoter or the weak, TATA-containing U3 promoter. Two classes of genes, each including both RHF s and RTT s, were identified. The first class comprising 82 genes that regulated Ty1 his3AI retromobility independently of U3 is enriched for RHF genes that restrict the G1 phase of the cell cycle and those involved in transcriptional elongation and mRNA catabolism. The second class of 51 genes regulated retromobility of Ty1 his3AI driven only from the U3 promoter. Nineteen U3-dependent regulators (U3DRs) also controlled retromobility of Ty1 his3AI driven by the weak, TATA-less PSP2 promoter, suggesting reliance on the low activity ofmore »U3. Thirty-one U3DRs failed to modulate P PSP2 -Ty1 his3AI retromobility, suggesting dependence on the architecture of U3. To further investigate the U3-dependency of Ty1 regulators, we developed a novel fluorescence-based assay to monitor expression of p22-Gag, a restriction factor expressed from the internal Ty1i promoter. Many U3DRs had minimal effects on levels of Ty1 RNA, Ty1i RNA or p22-Gag. These findings uncover a role for the Ty1 promoter in integrating signals from diverse host factors to modulate Ty1 RNA biogenesis or fate.« less