skip to main content

Search for: All records

Creators/Authors contains: "Simon, Cory M."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Aqueous, two-phase systems (ATPSs) may form upon mixing two solutions of independently water-soluble compounds. Many separation, purification, and extraction processes rely on ATPSs. Predicting the miscibility of solutions can accelerate and reduce the cost of the discovery of new ATPSs for these applications. Whereas previous machine learning approaches to ATPS prediction used physicochemical properties of each solute as a descriptor, in this work, we show how to impute missing miscibility outcomes directly from an incomplete collection of pairwise miscibility experiments. We use graph-regularized logistic matrix factorization (GR-LMF) to learn a latent vector of each solution from (i) the observed entries in the pairwise miscibility matrix and (ii) a graph where each node is a solution and edges are relationships indicating the general category of the solute (i.e., polymer, surfactant, salt, protein). For an experimental data set of the pairwise miscibility of 68 solutions from Peacock et al. [ACS Appl. Mater. Interfaces 2021, 13, 11449–11460], we find that GR-LMF more accurately predicts missing (im)miscibility outcomes of pairs of solutions than ordinary logistic matrix factorization and random forest classifiers that use physicochemical features of the solutes. GR-LMF obviates the need for features of the solutions and solutions to impute missing miscibility outcomes,more »but it cannot predict the miscibility of a new solution without some observations of its miscibility with other solutions in the training data set.« less
    Free, publicly-accessible full text available September 8, 2024
  2. Pesticides benefit agriculture by increasing crop yield, quality, and security. However, pesticides may inadvertently harm bees, which are valuable as pollinators. Thus, candidate pesticides in development pipelines must be assessed for toxicity to bees. Leveraging a dataset of 382 molecules with toxicity labels from honey bee exposure experiments, we train a support vector machine (SVM) to predict the toxicity of pesticides to honey bees. We compare two representations of the pesticide molecules: (i) a random walk feature vector listing counts of length- L walks on the molecular graph with each vertex- and edge-label sequence and (ii) the Molecular ACCess System (MACCS) structural key fingerprint (FP), a bit vector indicating the presence/absence of a list of pre-defined subgraph patterns in the molecular graph. We explicitly construct the MACCS FPs but rely on the fixed-length- L random walk graph kernel (RWGK) in place of the dot product for the random walk representation. The L-RWGK-SVM achieves an accuracy, precision, recall, and F1 score (mean over 2000 runs) of 0.81, 0.68, 0.71, and 0.69, respectively, on the test data set—with L = 4 being the mode optimal walk length. The MACCS-FP-SVM performs on par/marginally better than the L-RWGK-SVM, lends more interpretability, but varies moremore »in performance. We interpret the MACCS-FP-SVM by illuminating which subgraph patterns in the molecules tend to strongly push them toward the toxic/non-toxic side of the separating hyperplane.« less
  3. Mathematical models of the dynamics of infectious disease transmission are used to forecast epidemics and assess mitigation strategies. In this article, we highlight the analogy between the dynamics of disease transmission and chemical reaction kinetics while providing an exposition on the classic Susceptible–Infectious–Removed (SIR) epidemic model. Particularly, the SIR model resembles a dynamic model of a batch reactor carrying out an autocatalytic reaction with catalyst deactivation. This analogy between disease transmission and chemical reaction enables the exchange of ideas between epidemic and chemical kinetic modeling communities.