On the use of real-world datasets for reaction yield prediction

Saebi, Mandana; Nan, Bozhao; Herr, John E.; Wahlers, Jessica; Guo, Zhichun; Zurański, Andrzej M.; Kogej, Thierry; Norrby, Per-Ola; Doyle, Abigail G.; Chawla, Nitesh V.; Wiest, Olaf

doi:10.1039/d2sc06041h

Citation Details

On the use of real-world datasets for reaction yield prediction

The lack of publicly available, large, and unbiased datasets is a key bottleneck for the application of machine learning (ML) methods in synthetic chemistry. Data from electronic laboratory notebooks (ELNs) could provide less biased, large datasets, but no such datasets have been made publicly available. The first real-world dataset from the ELNs of a large pharmaceutical company is disclosed and its relationship to high-throughput experimentation (HTE) datasets is described. For chemical yield predictions, a key task in chemical synthesis, an attributed graph neural network (AGNN) performs as well as or better than the best previous models on two HTE datasets for the Suzuki–Miyaura and Buchwald–Hartwig reactions. However, training the AGNN on an ELN dataset does not lead to a predictive model. The implications of using ELN data for training ML-based models are discussed in the context of yield predictions. more »

Award ID(s):: 1925607 2202693

PAR ID:: 10411601

Author(s) / Creator(s):: Saebi, Mandana; Nan, Bozhao; Herr, John E.; Wahlers, Jessica; Guo, Zhichun; Zurański, Andrzej M.; Kogej, Thierry; Norrby, Per-Ola; Doyle, Abigail G.; Chawla, Nitesh V.; Wiest, Olaf

Date Published:: 2023-03-13

Journal Name:: Chemical Science

ISSN:: 2041-6520

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1039/d2sc06041h

More Like this