Applying statistical modeling strategies to sparse datasets in synthetic chemistry

Haas, Brittany C; Kalyani, Dipannita; Sigman, Matthew S

doi:10.1126/sciadv.adt3013

Citation Details

Applying statistical modeling strategies to sparse datasets in synthetic chemistry

The application of statistical modeling in organic chemistry is emerging as a standard practice for probing structure-activity relationships and as a predictive tool for many optimization objectives. This review is aimed as a tutorial for those entering the area of statistical modeling in chemistry. We provide case studies to highlight the considerations and approaches that can be used to successfully analyze datasets in low data regimes, a common situation encountered given the experimental demands of organic chemistry. Statistical modeling hinges on the data (what is being modeled), descriptors (how data are represented), and algorithms (how data are modeled). Herein, we focus on how various reaction outputs (e.g., yield, rate, selectivity, solubility, stability, and turnover number) and data structures (e.g., binned, heavily skewed, and distributed) influence the choice of algorithm used for constructing predictive and chemically insightful statistical models. more »

Award ID(s):: 2202693

PAR ID:: 10570114

Author(s) / Creator(s):: Haas, Brittany C; Kalyani, Dipannita; Sigman, Matthew S

Publisher / Repository:: American Association for the Advancement of Science

Date Published:: 2025-01-01

Journal Name:: Science Advances

Volume:: 11

Issue:: 1

ISSN:: 2375-2548

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1126/sciadv.adt3013

More Like this