Title: DeepLINK: Deep learning inference using knockoffs with applications to genomics
Significance Although practically attractive with high prediction and classification power, complicated learning methods often lack interpretability and reproducibility, limiting their scientific usage. A useful remedy is to select truly important variables contributing to the response of interest. We develop a method for deep learning inference using knockoffs, DeepLINK, to achieve the goal of variable selection with controlled error rate in deep learning models. We show that DeepLINK can also have high power in variable selection with a broad class of model designs. We then apply DeepLINK to three real datasets and produce statistical inference results with both reproducibility and biological meanings, demonstrating its promising usage to a broad range of scientific applications. more »« less
Andreassen, Anders; Komiske, Patrick T.; Metodiev, Eric. M.; Nachman, Benjamin; Suresh, Adi; Thaler, Jesse
(, ICLR 2021 SimDL Workshop)
null
(Ed.)
A common setting for scientific inference is the ability to sample from a high-fidelity forward model (simulation) without having an explicit probability density of the data. We propose a simulation-based maximum likelihood deconvolution approach in this setting called OMNIFOLD. Deep learning enables this approach to be naturally unbinned and (variable-, and) high-dimensional. In contrast to model parameter estimation, the goal of deconvolution is to remove detector distortions in order to enable a variety of down-stream inference tasks. Our approach is the deep learning generalization of the common Richardson-Lucy approach that is also called Iterative Bayesian Unfolding in particle physics. We show how OMNIFOLD can not only remove detector distortions, but it can also account for noise processes and acceptance effects.
Lu, Bingqian; Yang, Jianyi; Chen, Lydia Y.; Ren, Shaolei
(, IEEE First International Conference on Cognitive Machine Intelligence (CogMI))
The ever increasing size of deep neural network (DNN) models once implied that they were only limited to cloud data centers for runtime inference. Nonetheless, the recent plethora of DNN model compression techniques have successfully overcome this limit, turning into a reality that DNN-based inference can be run on numerous resource-constrained edge devices including mobile phones, drones, robots, medical devices, wearables, Internet of Things devices, among many others. Naturally, edge devices are highly heterogeneous in terms of hardware specification and usage scenarios. On the other hand, compressed DNN models are so diverse that they exhibit different tradeoffs in a multi-dimension space, and not a single model can achieve optimality in terms of all important metrics such as accuracy, latency and energy consumption. Consequently, how to automatically select a compressed DNN model for an edge device to run inference with optimal quality of experience (QoE) arises as a new challenge. The state-of-the-art approaches either choose a common model for all/most devices, which is optimal for a small fraction of edge devices at best, or apply device-specific DNN model compression, which is not scalable. In this paper, by leveraging the predictive power of machine learning and keeping end users in the loop, we envision an automated device-level DNN model selection engine for QoE-optimal edge inference. To concretize our vision, we formulate the DNN model selection problem into a contextual multi-armed bandit framework, where features of edge devices and DNN models are contexts and pre-trained DNN models are arms selected online based on the history of actions and users' QoE feedback. We develop an efficient online learning algorithm to balance exploration and exploitation. Our preliminary simulation results validate our algorithm and highlight the potential of machine learning for automating DNN model selection to achieve QoE-optimal edge inference.
Abstract BackgroundEstimating and accounting for hidden variables is widely practiced as an important step in molecular quantitative trait locus (molecular QTL, henceforth “QTL”) analysis for improving the power of QTL identification. However, few benchmark studies have been performed to evaluate the efficacy of the various methods developed for this purpose. ResultsHere we benchmark popular hidden variable inference methods including surrogate variable analysis (SVA), probabilistic estimation of expression residuals (PEER), and hidden covariates with prior (HCP) against principal component analysis (PCA)—a well-established dimension reduction and factor discovery method—via 362 synthetic and 110 real data sets. We show that PCA not only underlies the statistical methodology behind the popular methods but is also orders of magnitude faster, better-performing, and much easier to interpret and use. ConclusionsTo help researchers use PCA in their QTL analysis, we provide an R package along with a detailed guide, both of which are freely available athttps://github.com/heatherjzhou/PCAForQTL. We believe that using PCA rather than SVA, PEER, or HCP will substantially improve and simplify hidden variable inference in QTL mapping as well as increase the transparency and reproducibility of QTL research.
Sun, He; Bouman, Katherine L.; Tiede, Paul; Wang, Jason J.; Blunt, Sarah; Mawet, Dimitri
(, The Astrophysical Journal)
Abstract Inference is crucial in modern astronomical research, where hidden astrophysical features and patterns are often estimated from indirect and noisy measurements. Inferring the posterior of hidden features, conditioned on the observed measurements, is essential for understanding the uncertainty of results and downstream scientific interpretations. Traditional approaches for posterior estimation include sampling-based methods and variational inference (VI). However, sampling-based methods are typically slow for high-dimensional inverse problems, while VI often lacks estimation accuracy. In this paper, we proposeα-deep probabilistic inference, a deep learning framework that first learns an approximate posterior usingα-divergence VI paired with a generative neural network, and then produces more accurate posterior samples through importance reweighting of the network samples. It inherits strengths from both sampling and VI methods: it is fast, accurate, and more scalable to high-dimensional problems than conventional sampling-based approaches. We apply our approach to two high-impact astronomical inference problems using real data: exoplanet astrometry and black hole feature extraction.
Abstract BackgroundThe expanding usage of complex machine learning methods such as deep learning has led to an explosion in human activity recognition, particularly applied to health. However, complex models which handle private and sometimes protected data, raise concerns about the potential leak of identifiable data. In this work, we focus on the case of a deep network model trained on images of individual faces. Materials and methodsA previously published deep learning model, trained to estimate the gaze from full-face image sequences was stress tested for personal information leakage by a white box inference attack. Full-face video recordings taken from 493 individuals undergoing an eye-tracking- based evaluation of neurological function were used. Outputs, gradients, intermediate layer outputs, loss, and labels were used as inputs for a deep network with an added support vector machine emission layer to recognize membership in the training data. ResultsThe inference attack method and associated mathematical analysis indicate that there is a low likelihood of unintended memorization of facial features in the deep learning model. ConclusionsIn this study, it is showed that the named model preserves the integrity of training data with reasonable confidence. The same process can be implemented in similar conditions for different models.
Zhu, Zifan, Fan, Yingying, Kong, Yinfei, Lv, Jinchi, and Sun, Fengzhu. DeepLINK: Deep learning inference using knockoffs with applications to genomics. Proceedings of the National Academy of Sciences 118.36 Web. doi:10.1073/pnas.2104683118.
Zhu, Zifan, Fan, Yingying, Kong, Yinfei, Lv, Jinchi, & Sun, Fengzhu. DeepLINK: Deep learning inference using knockoffs with applications to genomics. Proceedings of the National Academy of Sciences, 118 (36). https://doi.org/10.1073/pnas.2104683118
Zhu, Zifan, Fan, Yingying, Kong, Yinfei, Lv, Jinchi, and Sun, Fengzhu.
"DeepLINK: Deep learning inference using knockoffs with applications to genomics". Proceedings of the National Academy of Sciences 118 (36). Country unknown/Code not available: Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.2104683118.https://par.nsf.gov/biblio/10307954.
@article{osti_10307954,
place = {Country unknown/Code not available},
title = {DeepLINK: Deep learning inference using knockoffs with applications to genomics},
url = {https://par.nsf.gov/biblio/10307954},
DOI = {10.1073/pnas.2104683118},
abstractNote = {Significance Although practically attractive with high prediction and classification power, complicated learning methods often lack interpretability and reproducibility, limiting their scientific usage. A useful remedy is to select truly important variables contributing to the response of interest. We develop a method for deep learning inference using knockoffs, DeepLINK, to achieve the goal of variable selection with controlled error rate in deep learning models. We show that DeepLINK can also have high power in variable selection with a broad class of model designs. We then apply DeepLINK to three real datasets and produce statistical inference results with both reproducibility and biological meanings, demonstrating its promising usage to a broad range of scientific applications.},
journal = {Proceedings of the National Academy of Sciences},
volume = {118},
number = {36},
publisher = {Proceedings of the National Academy of Sciences},
author = {Zhu, Zifan and Fan, Yingying and Kong, Yinfei and Lv, Jinchi and Sun, Fengzhu},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.