skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Transfer Learning in Genome-Wide Association Studies with Knockoffs
Abstract This paper presents and compares alternative transfer learning methods that can increase the power of conditional testing via knockoffs by leveraging prior information in external data sets collected from different populations or measuring related outcomes. The relevance of this methodology is explored in particular within the context of genome-wide association studies, where it can be helpful to address the pressing need for principled ways to suitably account for, and efficiently learn from the genetic variation associated to diverse ancestries. Finally, we apply these methods to analyze several phenotypes in the UK Biobank data set, demonstrating that transfer learning helps knockoffs discover more associations in the data collected from minority populations, potentially opening the way to the development of more accurate polygenic risk scores.  more » « less
Award ID(s):
1934578
PAR ID:
10382342
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Sankhya B
ISSN:
0976-8386
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    ABSTRACT Introduction Short response time is critical for future military medical operations in austere settings or remote areas. Such effective patient care at the point of injury can greatly benefit from the integration of semi-autonomous robotic systems. To achieve autonomy, robots would require massive libraries of maneuvers collected with the goal of training machine learning algorithms. Although this is attainable in controlled settings, obtaining surgical data in austere settings can be difficult. Hence, in this article, we present the Dexterous Surgical Skill (DESK) database for knowledge transfer between robots. The peg transfer task was selected as it is one of the six main tasks of laparoscopic training. In addition, we provide a machine learning framework to evaluate novel transfer learning methodologies on this database. Methods A set of surgical gestures was collected for a peg transfer task, composed of seven atomic maneuvers referred to as surgemes. The collected Dexterous Surgical Skill dataset comprises a set of surgical robotic skills using the four robotic platforms: Taurus II, simulated Taurus II, YuMi, and the da Vinci Research Kit. Then, we explored two different learning scenarios: no-transfer and domain-transfer. In the no-transfer scenario, the training and testing data were obtained from the same domain; whereas in the domain-transfer scenario, the training data are a blend of simulated and real robot data, which are tested on a real robot. Results Using simulation data to train the learning algorithms enhances the performance on the real robot where limited or no real data are available. The transfer model showed an accuracy of 81% for the YuMi robot when the ratio of real-tosimulated data were 22% to 78%. For the Taurus II and the da Vinci, the model showed an accuracy of 97.5% and 93%, respectively, training only with simulation data. Conclusions The results indicate that simulation can be used to augment training data to enhance the performance of learned models in real scenarios. This shows potential for the future use of surgical data from the operating room in deployable surgical robots in remote areas. 
    more » « less
  2. Summary In this article we develop a method based on model-X knockoffs to find conditional associations that are consistent across environments, while controlling the false discovery rate. The motivation for this problem is that large datasets may contain numerous associations that are statistically significant and yet misleading, as they are induced by confounders or sampling imperfections. However, associations replicated under different conditions may be more interesting. In fact, sometimes consistency provably leads to valid causal inferences even if conditional associations do not. Although the proposed method is widely applicable, in this paper we highlight its relevance to genome-wide association studies, in which robustness across populations with diverse ancestries mitigates confounding due to unmeasured variants. The effectiveness of this approach is demonstrated by simulations and applications to UK Biobank data. 
    more » « less
  3. Early detection of Alzheimer’s disease (AD) during the Mild Cognitive Impairment (MCI) stage could enable effective intervention to slow down disease progression. Computer-aided diagnosis of AD relies on a sufficient amount of biomarker data. When this requirement is not fulfilled, transfer learning can be used to transfer knowledge from a source domain with more amount of labeled data than available in the desired target domain. In this study, an instance-based transfer learning framework is presented based on the gradient boosting machine (GBM). In GBM, a sequence of base learners is built, and each learner focuses on the errors (residuals) of the previous learner. In our transfer learning version of GBM (TrGB), a weighting mechanism based on the residuals of the base learners is defined for the source instances. Consequently, instances with different distribution than the target data will have a lower impact on the target learner. The proposed weighting scheme aims to transfer as much information as possible from the source domain while avoiding negative transfer. The target data in this study was obtained from the Mount Sinai dataset which is collected and processed in a collaborative 5-year project at the Mount Sinai Medical Center. The Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset was used as the source domain. The experimental results showed that the proposed TrGB algorithm could improve the classification accuracy by 1.5 and 4.5% for CN vs. MCI and multiclass classification, respectively, as compared to the conventional methods. Also, using the TrGB model and transferred knowledge from the CN vs. AD classification of the source domain, the average score of early MCI vs. late MCI classification improved by 5%. 
    more » « less
  4. Significance Although practically attractive with high prediction and classification power, complicated learning methods often lack interpretability and reproducibility, limiting their scientific usage. A useful remedy is to select truly important variables contributing to the response of interest. We develop a method for deep learning inference using knockoffs, DeepLINK, to achieve the goal of variable selection with controlled error rate in deep learning models. We show that DeepLINK can also have high power in variable selection with a broad class of model designs. We then apply DeepLINK to three real datasets and produce statistical inference results with both reproducibility and biological meanings, demonstrating its promising usage to a broad range of scientific applications. 
    more » « less
  5. null (Ed.)
    Recent papers demonstrate that non-traditional data, from mobile phones and other digital sensors, can be used to roughly estimate the wealth of individual subscribers. This paper asks a question more directly relevant to development policy: Can non-traditional data be used to more efficiently target development aid? By combining rich survey data from a "big push" anti-poverty program in Afghanistan with detailed mobile phone logs from program beneficiaries, we study the extent to which machine learning methods can accurately differentiate ultra-poor households eligible for program benefits from other households deemed ineligible. We show that supervised learning methods leveraging mobile phone data can identify ultra-poor households as accurately as standard survey-based measures of poverty, including consumption and wealth; and that combining survey-based measures with mobile phone data produces classifications more accurate than those based on a single data source. We discuss the implications and limitations of these methods for targeting extreme poverty in marginalized populations. 
    more » « less