Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Lengauer, Thomas (Ed.)Abstract MotivationMicrobial signatures in the human microbiome are closely associated with various human diseases, driving the development of machine learning models for microbiome-based disease prediction. Despite progress, challenges remain in enhancing prediction accuracy, generalizability, and interpretability. Confounding factors, such as host’s gender, age, and body mass index, significantly influence the human microbiome, complicating microbiome-based predictions. ResultsTo address these challenges, we developed MicroKPNN-MT, a unified model for predicting human phenotype based on microbiome data, as well as additional metadata like age and gender. This model builds on our earlier MicroKPNN framework, which incorporates prior knowledge of microbial species into neural networks to enhance prediction accuracy and interpretability. In MicroKPNN-MT, metadata, when available, serves as additional input features for prediction. Otherwise, the model predicts metadata from microbiome data using additional decoders. We applied MicroKPNN-MT to microbiome data collected in mBodyMap, covering healthy individuals and 25 different diseases, and demonstrated its potential as a predictive tool for multiple diseases, which at the same time provided predictions for the missing metadata. Our results showed that incorporating real or predicted metadata helped improve the accuracy of disease predictions, and more importantly, helped improve the generalizability of the predictive models. Availability and implementationhttps://github.com/mgtools/MicroKPNN-MT.more » « less
-
Lengauer, Thomas (Ed.)Abstract SummaryThis article presents the Ensemble Nucleotide Byte-level Encoder-Decoder (ENBED) foundation model, analyzing DNA sequences at byte-level precision with an encoder–decoder Transformer architecture. ENBED uses a subquadratic implementation of attention to develop an efficient model capable of sequence-to-sequence transformations, generalizing previous genomic models with encoder-only or decoder-only architectures. We use Masked Language Modeling to pretrain the foundation model using reference genome sequences and apply it in the following downstream tasks: (i) identification of enhancers, promotors, and splice sites, (ii) recognition of sequences containing base call mismatches and insertion/deletion errors, an advantage over tokenization schemes involving multiple base pairs, which lose the ability to analyze with byte-level precision, (iii) identification of biological function annotations of genomic sequences, and (iv) generating mutations of the Influenza virus using the encoder–decoder architecture and validating them against real-world observations. In each of these tasks, we demonstrate significant improvement as compared to the existing state-of-the-art results. Availability and implementationThe source code used to develop and fine-tune the foundation model has been released on Github (https://github.itap.purdue.edu/Clan-labs/ENBED).more » « less
-
Lengauer, Thomas (Ed.)Abstract SummaryTarget identification by enzymes (TIE) problem aims to identify the set of enzymes in a given metabolic network, such that their inhibition eliminates a given set of target compounds associated with a disease while incurring minimum damage to the rest of the compounds. This is a NP-hard problem, and thus optimal solutions using classical computers fail to scale to large metabolic networks. In this article, we develop the first quantum optimization solution, called QuTIE (quantum optimization for target identification by enzymes), to this NP-hard problem. We do that by developing an equivalent formulation of the TIE problem in quadratic unconstrained binary optimization form. We then map it to a logical graph, and embed the logical graph on a quantum hardware graph. Our experimental results on 27 metabolic networks from Escherichia coli, Homo sapiens, and Mus musculus show that QuTIE yields solutions that are optimal or almost optimal. Our experiments also demonstrate that QuTIE can successfully identify enzyme targets already verified in wet-lab experiments for 14 major disease classes. Availability and implementationCode and sample data are available at: https://github.com/ngominhhoang/Quantum-Target-Identification-by-Enzymes.more » « less
An official website of the United States government
