NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Boltz-1: Democratizing Biomolecular Interaction Modeling

https://doi.org/10.1101/2024.11.19.624167

Wohlwend, Jeremy; Corso, Gabriele; Passaro, Saro; Reveiz, Mateo; Leidal, Ken; Swiderski, Wojtek; Portnoi, Tally; Chinn, Itamar; Silterra, Jacob; Jaakkola, Tommi; et al (November 2024, bioRxiv)

Abstract Understanding biomolecular interactions is fundamental to advancing fields like drug discovery and protein design. In this paper, we introduce Boltz-1, an open-source deep learning model incorporating innovations in model architecture, speed optimization, and data processing achieving AlphaFold3-level accuracy in predicting the 3D structures of biomolecular complexes. Boltz-1 demonstrates a performance on-par with state-of-the-art commercial models on a range of diverse benchmarks, setting a new benchmark for commercially accessible tools in structural biology. By releasing the training and inference code, model weights, datasets, and benchmarks under the MIT open license, we aim to foster global collaboration, accelerate discoveries, and provide a robust platform for advancing biomolecular modeling.
more » « less
Full Text Available
Blind protein-ligand docking with diffusion-based deep generative models

https://doi.org/10.1016/j.bpj.2022.11.937

Corso, Gabriele; Jing, Bowen; Stark, Hannes; Barzilay, Regina; Jaakkola, Tommi (February 2023, Biophysical Journal)

Full Text Available
Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

https://doi.org/10.1561/2200000115

Zhang, Xuan; Wang, Limei; Helwig, Jacob; Luo, Youzhi; Fu, Cong; Xie, Yaochen; Liu, Meng; Lin, Yuchao; Xu, Zhao; Yan, Keqiang; et al (January 2025, Foundations and Trends® in Machine Learning)

Full Text Available
Independent SE(3)-Equivariant Models for End-to-End Rigid Protein Docking

Ganea, Octavian-Eugen; Huang, Xinyuan; Bunne, Charlotte; Bian, Yatao; Barzilay, Regina; Jaakkola, Tommi; Krause, Andreas (January 2022, International Conference on Learning Representations)

Protein complex formation is a central problem in biology, being involved in most of the cell's processes, and essential for applications, e.g. drug design or protein engineering. We tackle rigid body protein-protein docking, i.e., computationally predicting the 3D structure of a protein-protein complex from the individual unbound structures, assuming no conformational change within the proteins happens during binding. We design a novel pairwise-independent SE(3)-equivariant graph matching network to predict the rotation and translation to place one of the proteins at the right docked position relative to the second protein. We mathematically guarantee a basic principle: the predicted complex is always identical regardless of the initial locations and orientations of the two structures. Our model, named EquiDock, approximates the binding pockets and predicts the docking poses using keypoint matching and alignment, achieved through optimal transport and a differentiable Kabsch algorithm. Empirically, we achieve significant running time improvements and often outperform existing docking software despite not relying on heavy candidate sampling, structure refinement, or templates.
more » « less
Full Text Available
Multi-Objective Molecule Generation using Interpretable Substructures

Jin, Wengong; Barzilay, Regina; Jaakkola, Tommi (January 2020, Proceedings of the 37th International Conference on Machine Learning)
null (Ed.)
Drug discovery aims to find novel compounds with specified chemical property profiles. In terms of generative modeling, the goal is to learn to sample molecules in the intersection of multiple property constraints. This task becomes increasingly challenging when there are many property constraints. We propose to offset this complexity by composing molecules from a vocabulary of substructures that we call molecular rationales. These rationales are identified from molecules as substructures that are likely responsible for each property of interest. We then learn to expand rationales into a full molecule using graph generative models. Our final generative model composes molecules as mixtures of multiple rationale completions, and this mixture is fine-tuned to preserve the properties of interest. We evaluate our model on various drug design tasks and demonstrate significant improvements over state-of-the-art baselines in terms of accuracy, diversity, and novelty of generated compounds.
more » « less
Full Text Available
Multi-Objective Molecule Generation using Interpretable Substructures

Jin, Wengong; Barzilay, Regina; Jaakkola, Tommi (January 2020, Proceedings of the 37th International Conference on Machine Learning)

Full Text Available
GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles

Ganea, Octavian-Eugen; Pattanaik, Lagnajit; Coley, Connor W.; Barzilay, Regina; Jensen, Klavs F.; Green, William H.; Jaakkola, Tommi S. (January 2021, Advances in neural information processing systems)

Full Text Available
GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles

Ganea, Octavian-Eugen; Pattanaik, Lagnajit; Coley, Connor W.; Barzilay, Regina; Jensen, Klavs F.; Green, William H.; Jaakkola, Tommi S. (January 2021, Advances in neural information processing systems)

Prediction of a molecule's 3D conformer ensemble from the molecular graph holds a key role in areas of cheminformatics and drug discovery. Existing generative models have several drawbacks including lack of modeling important molecular geometry elements (e.g. torsion angles), separate optimization stages prone to error accumulation, and the need for structure fine-tuning based on approximate classical force-fields or computationally expensive methods such as metadynamics with approximate quantum mechanics calculations at each geometry. We propose GeoMol--an end-to-end, non-autoregressive and SE(3)-invariant machine learning approach to generate distributions of low-energy molecular 3D conformers. Leveraging the power of message passing neural networks (MPNNs) to capture local and global graph information, we predict local atomic 3D structures and torsion angles, avoiding unnecessary over-parameterization of the geometric degrees of freedom (e.g. one angle per non-terminal bond). Such local predictions suffice both for the training loss computation, as well as for the full deterministic conformer assembly (at test time). We devise a non-adversarial optimal transport based loss function to promote diverse conformer generation. GeoMol predominantly outperforms popular open-source, commercial, or state-of-the-art machine learning (ML) models, while achieving significant speed-ups. We expect such differentiable 3D structure generators to significantly impact molecular modeling and related applications.
more » « less
Full Text Available
Inferring Which Medical TreatmentsWork from Reports of Clinical Trials

Lehman, Eric; DeYoung, Jay; Barzilay, Regina; Wallace, Byron C. (January 2019, Annual Conference of the North American Chapter of the Association for Computational Linguistics)

How do we know if a particular medical treatment actually works? Ideally one would consult all available evidence from relevant clinical trials. Unfortunately, such results are primarily disseminated in natural language scientific articles, imposing substantial burden on those trying to make sense of them. In this paper, we present a new task and corpus for making this unstructured evidence actionable. The task entails inferring reported findings from a full-text article describing a randomized controlled trial (RCT) with respect to a given intervention, comparator, and outcome of interest, e.g., inferring if an article provides evidence supporting the use of aspirin to reduce risk of stroke, as compared to placebo. We present a new corpus for this task comprising 10,000+ prompts coupled with fulltext articles describing RCTs. Results using a suite of models — ranging from heuristic (rule-based) approaches to attentive neural architectures — demonstrate the difficulty of the task, which we believe largely owes to the lengthy, technical input texts. To facilitate further work on this important, challenging problem we make the corpus, documentation, a website and leaderboard, and code for baselines and evaluation available at http: //evidence-inference.ebm-nlp.com/.
more » « less
Full Text Available

Search for: All records