{"Abstract":["This data set contains 194778 quasireaction subgraphs extracted from CHO transition networks with 2-6 non-hydrogen atoms (CxHyOz, 2 <= x + z <= 6).<\/p>\n\nThe complete table of subgraphs (including file locations) is in CHO-6-atoms-subgraphs.csv file. The subgraphs are in GraphML format (http://graphml.graphdrawing.org) and are compressed using bzip2. All subgraphs are undirected and unweighted. The reactant and product nodes (initial and final) are labeled in the "type" node attribute. The nodes are represented as multi-molecule SMILES strings. The edges are labeled by the reaction rules in SMARTS representation. The forward and backward reading of the SMARTS string should be considered equivalent.<\/p>\n\nThe generation and analysis of this data set is described in\nD. Rappoport, Statistics and Bias-Free Sampling of Reaction Mechanisms from Reaction Network Models, 2023, submitted. Preprint at ChemrXiv, DOI: 10.26434/chemrxiv-2023-wltcr<\/p>\n\nSimulation parameters\n- CHO networks constructed using polar bond break/bond formation rule set for CHO.\n- High-energy nodes were excluded using the following rules:\n (i) more than 3 rings, (ii) triple and allene bonds in rings, (iii) double bonds at\n bridge atoms,(iv) double bonds in fused 3-membered rings.\n- Neutral nodes were defined as containing only neutral molecules.\n- Shortest path lengths were determined for all pairs of neutral nodes.\n- Pairs of neutral nodes with shortest-path length > 8 were excluded.\n- Additionally, pairs of neutral nodes connected only by shortest paths passing through\n additional neutral nodes (reducible paths) were excluded.<\/p>\n\nFor background and additional details, see paper above.<\/p>"],"Other":["This work was supported in part by the National Science Foundation under Grant No. CHE-2227112."]}
more »
« less
Statistics and Bias-Free Sampling of Reaction Mechanisms from Reaction Network Models
Selection bias is inevitable in manually curated computational reaction databases but can have a significant impact on generalizability of quantum chemical methods and machine learning models derived from these data sets. Here, we propose quasireaction subgraphs as a discrete, graph-based representation of reaction mechanisms that has a well-defined associated probability space and admits a similarity function using graph kernels. Quasireaction subgraphs are thus well suited for constructing representative or diverse data sets of reactions. Quasireaction subgraphs are defined as subgraphs of a network of formal bond breaks and bond formations (transition network) composed of all shortest paths between reactant and product nodes. However, due to their purely geometric construction, they do not guarantee that the corresponding reaction mechanisms are thermodynamically and kinetically feasible. As a result, a binary classification of feasible (reaction subgraphs) and infeasible (non-reactive subgraphs) must be applied after sampling. In this paper, we describe the construction and properties of quasireaction subgraphs and characterize the statistics of quasireaction subgraphs from CHO transition networks with up to six nonhydrogen atoms. We explore their clustering using Weisfeiler–Lehman graph kernels.
more »
« less
- Award ID(s):
- 2227112
- PAR ID:
- 10415320
- Date Published:
- Journal Name:
- The journal of physical chemistry A
- ISSN:
- 1520-5215
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Characterizing the reaction energies and barriers of reaction networks is central to catalyst development. However, heterogeneous catalytic surfaces pose several unique challenges to automatic reaction network characterization, including large sizes and open-ended reactant sets, that make ad hoc network construction the current state-of-the-art. Here, we show how automated network exploration algorithms can be adapted to the constraints of heterogeneous systems using ethylene oligomerization on silica-supported single-site Ga 3+ as a model system. Using only graph-based rules for exploring the network and elementary constraints based on activation energy and size for identifying network terminations, a comprehensive reaction network is generated and validated against standard methods. The algorithm (re)discovers the Ga-alkyl-centered Cossee-Arlman mechanism that is hypothesized to drive major product formation while also predicting several new pathways for producing alkanes and coke precursors. These results demonstrate that automated reaction exploration algorithms are rapidly maturing towards general purpose capability for exploratory catalytic applications.more » « less
-
Quasi-cliques are dense incomplete subgraphs of a graph that generalize the notion of cliques. Enumerating quasi-cliques from a graph is a robust way to detect densely connected structures with applications in bioinformatics and social network analysis. However, enumerating quasi-cliques in a graph is a challenging problem, even harder than the problem of enumerating cliques. We consider the enumeration of top- k degree-based quasi-cliques and make the following contributions: (1) we show that even the problem of detecting whether a given quasi-clique is maximal (i.e., not contained within another quasi-clique) is NP-hard. (2) We present a novel heuristic algorithm K ernel QC to enumerate the k largest quasi-cliques in a graph. Our method is based on identifying kernels of extremely dense subgraphs within a graph, followed by growing subgraphs around these kernels, to arrive at quasi-cliques with the required densities. (3) Experimental results show that our algorithm accurately enumerates quasi-cliques from a graph, is much faster than current state-of-the-art methods for quasi-clique enumeration (often more than three orders of magnitude faster), and can scale to larger graphs than current methods.more » « less
-
ter Beek, Maurice; Koutny, Maciej; Rozenberg, Grzegorz (Ed.)For a family of sets we consider elements that belong to the same sets within the family as companions. The global dynamics of a reactions system (as introduced by Ehrenfeucht and Rozenberg) can be represented by a directed graph, called a transition graph, which is uniquely determined by a one-out subgraph, called the 0-context graph. We consider the companion classes of the outsets of a transition graph and introduce a directed multigraph, called an essential motion, whose vertices are such companion classes. We show that all one-out graphs obtained from an essential motion represent 0-context graphs of reactions systems with isomorphic transition graphs. All such 0-context graphs are obtained from one another by swapping the outgoing edges of companion vertices.more » « less
-
Revealing the reaction path of UVC bond rupture in cyclic disulfides with ultrafast x-ray scatteringDisulfide bonds are ubiquitous molecular motifs that influence the tertiary structure and biological functions of many proteins. Yet, it is well known that the disulfide bond is photolabile when exposed to ultraviolet C (UVC) radiation. The deep-UV–induced S─S bond fragmentation kinetics on very fast timescales are especially pivotal to fully understand the photostability and photodamage repair mechanisms in proteins. In 1,2-dithiane, the smallest saturated cyclic molecule that mimics biologically active species with S─S bonds, we investigate the photochemistry upon 200-nm excitation by femtosecond time-resolved x-ray scattering in the gas phase using an x-ray free electron laser. In the femtosecond time domain, we find a very fast reaction that generates molecular fragments with one and two sulfur atoms. On picosecond and nanosecond timescales, a complex network of reactions unfolds that, ultimately, completes the sulfur dissociation from the parent molecule.more » « less
An official website of the United States government

