skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Accelerated knowledge discovery from omics data by optimal experimental design
Abstract How to design experiments that accelerate knowledge discovery on complex biological landscapes remains a tantalizing question. We present an optimal experimental design method (coined OPEX) to identify informative omics experiments using machine learning models for both experimental space exploration and model training. OPEX-guided exploration ofEscherichia coli’s populations exposed to biocide and antibiotic combinations lead to more accurate predictive models of gene expression with 44% less data. Analysis of the proposed experiments shows that broad exploration of the experimental space followed by fine-tuning emerges as the optimal strategy. Additionally, analysis of the experimental data reveals 29 cases of cross-stress protection and 4 cases of cross-stress vulnerability. Further validation reveals the central role of chaperones, stress response proteins and transport pumps in cross-stress exposure. This work demonstrates how active learning can be used to guide omics data collection for training predictive models, making evidence-driven decisions and accelerating knowledge discovery in life sciences.  more » « less
Award ID(s):
1934568 1743101
PAR ID:
10196951
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Communications
Volume:
11
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Modern data mining methods have demonstrated effectiveness in comprehending and predicting materials properties. An essential component in the process of materials discovery is to know which material(s) will possess desirable properties. For many materials properties, performing experiments and density functional theory computations are costly and time-consuming. Hence, it is challenging to build accurate predictive models for such properties using conventional data mining methods due to the small amount of available data. Here we present a framework for materials property prediction tasks using structure information that leverages graph neural network-based architecture along with deep-transfer-learning techniques to drastically improve the model’s predictive ability on diverse materials (3D/2D, inorganic/organic, computational/experimental) data. We evaluated the proposed framework in cross-property and cross-materials class scenarios using 115 datasets to find that transfer learning models outperform the models trained from scratch in 104 cases, i.e., ≈90%, with additional benefits in performance for extrapolation problems. We believe the proposed framework can be widely useful in accelerating materials discovery in materials science. 
    more » « less
  2. Reconfigurable microrobots promise advancements in microsurgical tools, self‐healing materials, and environmental remediation by enabling precise, adaptive functionalities at small scales. However, predicting their behaviors a priori remains a significant challenge, limiting the pace of design and discovery. To address this, a Monte Carlo simulation framework is presented for predicting the folding behavior of self‐assembled, sequence‐encoded microrobot chains composed of magnetic particles, enabling efficient exploration of their large design space. This computational framework employs metrics of radius of gyration, tortuosity, and symmetry score to map the design space, identify functional sequences, and predict likely folding behaviors before fabrication. The framework through experiments to demonstrate accuracy in capturing folding behaviors is validated. Statistical analysis reveals adherence to self‐avoiding walk principles from polymer theory, providing a foundation for understanding how input sequences drive folding capabilities. Moreover, the simulation surpasses current experimental capabilities, enabling exploration of novel microrobot designs, such as sequences incorporating mixtures of cubes and triangular prism subunits, which exhibit distinct compressive behaviors. Beyond the sequence‐encoded microrobots investigated in this study, this framework offers broad utility for the design of reconfigurable microscale systems by reducing reliance on experimental prototyping and accelerating discovery of new functional microrobots for use in biomedicine, materials engineering, and sustainability. 
    more » « less
  3. Abstract The prediction of solar energetic particle (SEP) events garners increasing interest as space missions extend beyond Earth’s protective magnetosphere. These events, which are, in most cases, products of magnetic-reconnection-driven processes during solar flares or fast coronal-mass-ejection-driven shock waves, pose significant radiation hazards to aviation, space-based electronics, and particularly space exploration. In this work, we utilize the recently developed data set that combines the Solar Dynamics Observatory/Space-weather Helioseismic and Magnetic Imager Active Region Patches and the Solar and Heliospheric Observatory/Space-weather Michelson Doppler Imager Active Region Patches. We employ a suite of machine learning strategies, including support vector machines (SVMs) and regression models, to evaluate the predictive potential of this new data product for a forecast of post-solar flare SEP events. Our study indicates that despite the augmented volume of data, the prediction accuracy reaches 0.7 ± 0.1 (experimental setting), which aligns with but does not exceed these published benchmarks. A linear SVM model with training and testing configurations that mimic an operational setting (positive–negative imbalance) reveals a slight increase (+0.04 ± 0.05) in the accuracy of a 14 hr SEP forecast compared to previous studies. This outcome emphasizes the imperative for more sophisticated, physics-informed models to better understand the underlying processes leading to SEP events. 
    more » « less
  4. Additive manufacturing has become one of the forefront technologies in fabrication, enabling products impossible to manufacture before. Although many materials exist for additive manufacturing, most suffer from performance trade-offs. Current materials are designed with inefficient human-driven intuition-based methods, leaving them short of optimal solutions. We propose a machine learning approach to accelerating the discovery of additive manufacturing materials with optimal trade-offs in mechanical performance. A multiobjective optimization algorithm automatically guides the experimental design by proposing how to mix primary formulations to create better performing materials. The algorithm is coupled with a semiautonomous fabrication platform to substantially reduce the number of performed experiments and overall time to solution. Without prior knowledge of the primary formulations, the proposed methodology autonomously uncovers 12 optimal formulations and enlarges the discovered performance space 288 times after only 30 experimental iterations. This methodology could be easily generalized to other material design systems and enable automated discovery. 
    more » « less
  5. Abstract Alzheimer’s Disease (AD) is a progressive neurodegenerative disorder, posing a growing public health challenge. Traditional machine learning models for AD prediction have relied on single omics data or phenotypic assessments, limiting their ability to capture the disease’s molecular complexity and resulting in poor performance. Recent advances in high-throughput multi-omics have provided deeper biological insights. However, due to the scarcity of paired omics datasets, existing multi-omics AD prediction models rely on unpaired omics data, where different omics profiles are combined without being derived from the same biological sample, leading to biologically less meaningful pairings and causing less accurate predictions. To address these issues, we propose UnCOT-AD, a novel deep learning framework for Unpaired Cross-Omics Translation enabling effective multi-omics integration for AD prediction. Our method introduces the first-ever cross-omics translation model trained on unpaired omics datasets, using two coupled Variational Autoencoders and a novel cycle consistency mechanism to ensure accurate bidirectional translation between omics types. We integrate adversarial training to ensure that the generated omics profiles are biologically realistic. Moreover, we employ contrastive learning to capture the disease specific patterns in latent space to make the cross-omics translation more accurate and biologically relevant. We rigorously validate UnCOT-AD on both cross-omics translation and AD prediction tasks. Results show that UnCOT-AD empowers multi-omics based AD prediction by combining real omics profiles with corresponding omics profiles generated by our cross-omics translation module and achieves state-of-the-art performance in accuracy and robustness. Source code is available at https://github.com/abrarrahmanabir/UnCOT-AD 
    more » « less