- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Proceedings of the 29th Int'l Joint Conf on Artificial Intelligence, IJCAI-20
- Page Range or eLocation-ID:
- 2683 - 2689
- Sponsoring Org:
- National Science Foundation
More Like this
Directional association measured by functional dependency can answer important questions on relationships between variables, for example, in discovery of molecular interactions in biological systems. However, when one has no prior information about the functional form of a directional association, there is not a widely established statistical procedure to detect such an association. To address this issue, here we introduce an exact functional test for directional association by examining the strength of functional dependency. It is effective in promoting functional patterns by reducing statistical power on dependent non-functional patterns. We designed an algorithm to carry out the test using a fast branch-and-bound strategy, which achieved a substantial speedup over brute-force enumeration. On data from an epidemiological study of liver cancer, the test identified the hepatitis status of a subject as the most influential risk factor among others for the cancer phenotype. On human lung cancer transcriptome data, the test selected 1068 transcription start sites of putative noncoding RNAs directionally associated with the presence or absence of lung cancer, stronger than 95 percent transcription start sites of 694 curated cancer genes. These predictions include non-monotonic interaction patterns, to which other routine tests were insensitive. Complementing symmetric (non-directional) association methods such as Fisher’smore »
The complexity, dynamics, and scale of data acquired by modern biotechnology increasingly favor model-free computational methods that make minimal assumptions about underlying biological mechanisms. For example, single-cell transcriptome and proteome data have a throughput several orders more than bulk methods. Many model-free statistical methods for pattern discovery such as mutual information and chi-squared tests, however, require discrete data. Most discretization methods minimize squared errors for each variable independently, not necessarily retaining joint patterns. To address this issue, we present a joint grid discretization algorithm that preserves clusters in the original data. We evaluated this algorithm on simulated data to show its advantage over other methods in maintaining clusters as measured by the adjusted Rand index. We also show it promotes global functional patterns over independent patterns. On single-cell proteome and transcriptome of leukemia and healthy blood, joint grid discretization captured known protein-to-RNA regulatory relationships, while revealing previously unknown interactions. As such, the joint grid discretization is applicable as a data transformation step in associative, functional, and causal inference of molecular interactions fundamental to systems biology. The developed software is publicly available at https://cran.r-project.org/package=GridOnClusters
Computer inference of biological mechanisms is increasingly approachable due to dynamically rich data sources such as single-cell genomics. Inferred molecular interactions can prioritize hypotheses for wet-lab experiments to expedite biological discovery. However, complex data often come with unwanted biological or technical variations, exposing biases over marginal distribution and sample size in current methods to favor spurious causal relationships.
Considering function direction and strength as evidence for causality, we present an adapted functional chi-squared test (AdpFunChisq) that rewards functional patterns over non-functional or independent patterns. On synthetic and three biology datasets, we demonstrate the advantages of AdpFunChisq over 10 methods on overcoming biases that give rise to wide fluctuations in the performance of alternative approaches. On single-cell multiomics data of multiple phenotype acute leukemia, we found that the T-cell surface glycoprotein CD3 delta chain may causally mediate specific genes in the viral carcinogenesis pathway. Using the causality-by-functionality principle, AdpFunChisq offers a viable option for robust causal inference in dynamical systems.
Availability and implementation
The AdpFunChisq test is implemented in the R package ‘FunChisq’ (2.5.2 or above) at https://cran.r-project.org/package=FunChisq. All other source code along with pre-processed data is available at Code Ocean https://doi.org/10.24433/CO.2907738.v1
Supplementary materials are available at Bioinformatics online.
Finding node associations across different networks is the cornerstone behind a wealth of high-impact data mining applications. Traditional approaches are often, explicitly or implicitly, built upon the linearity and/or consistency assumptions. On the other hand, the recent network embedding based methods promise a natural way to handle the non-linearity, yet they could suffer from the disparate node embedding space of different networks. In this paper, we address these limitations and tackle cross-network node associations from a new angle, i.e., cross-network transformation. We ask a generic question: Given two different networks, how can we transform one network to another? We propose an end-to-end model that learns a composition of nonlinear operations so that one network can be transformed to another in a hierarchical manner. The proposed model bears three distinctive advantages. First (composite transformation), it goes beyond the linearity/consistency assumptions and performs the cross-network transformation through a composition of nonlinear computations. Second (representation power), it can learn the transformation of both network structures and node attributes at different resolutions while identifying the cross-network node associations. Third (generality), it can be applied to various tasks, including network alignment, recommendation, cross-layer dependency inference. Extensive experiments on different tasks validate and verify the effectivenessmore »
The Ty1 retrotransposon family is maintained in a functional but dormant state by its host, Saccharomyces cerevisiae . Several hundred RHF and RTT genes encoding co-factors and restrictors of Ty1 retromobility, respectively, have been identified. Well-characterized examples include MED3 and MED15 , encoding subunits of the Mediator transcriptional co-activator complex; control of retromobility by Med3 and Med15 requires the Ty1 promoter in the U3 region of the long terminal repeat. To characterize the U3-dependence of other Ty1 regulators, we screened a library of 188 known rhf and rtt mutants for altered retromobility of Ty1 his3AI expressed from the strong, TATA-less TEF1 promoter or the weak, TATA-containing U3 promoter. Two classes of genes, each including both RHF s and RTT s, were identified. The first class comprising 82 genes that regulated Ty1 his3AI retromobility independently of U3 is enriched for RHF genes that restrict the G1 phase of the cell cycle and those involved in transcriptional elongation and mRNA catabolism. The second class of 51 genes regulated retromobility of Ty1 his3AI driven only from the U3 promoter. Nineteen U3-dependent regulators (U3DRs) also controlled retromobility of Ty1 his3AI driven by the weak, TATA-less PSP2 promoter, suggesting reliance on the low activity ofmore »